>> .. plain MC bot ..
>> The effect is that the bot is only "interested" in a narrow range of fairly
>> balanced games: if it gets too far behind, or too far ahead, its play
>> becomes fairly random. The reason being that small differences in
>> score may not sufficiently affect the win/lose ratio to be distinguishable
>> (it would take longer sequences of such better moves to change the
>> win/lose outcome), so good and bad moves are getting lumped
>> together in the evaluation.
>
> I once made the argument that MCTS bots would not play handicap games
> well based on this same argument.   I got a fairly strong reaction from the
> group on this.   Many said it would make no difference and that the really
> strong bots would play just as hard even if they were started with a
> technically lost game (from the handicap.)

Well, MCTS bots have the advantage of reasoning over sequences of
moves, not just single moves, so I would expect the effect to be less
pronounced there (but still existent, and possibly affecting performance
even if not strength). If the win-rate evaluation is mixed with enough
other information from the playouts (total score, ownership, patterns,
..), that might be sufficient to reduce the effect to negliblility, but it
seems worth checking that this is the case.

>> .. it reminds me of a suggestion
>> I made a long time ago for human games(*), which might be easier to adopt
>> in this context: when a player/bot is far enough ahead that even a double
>> move by its opponent will at best catch up, the stronger player can simply
>> pass. At the end of the game, any difference in the number of passes is
>> counted as handicap, and recorded together with the plain score/result.

Note that this suggestion should be fairly easy to implement (if anyone
with two bots of different strength, or one bot stronger than publicly
available opponents, would try this, I'd be very interested in the results!)
and is not meant to change playing strength but to get better information
from games between bots of different strengths.

> The solution is to simply play for the best total territory score.
> ..
> I think some bots have enough knowledge imposed on them that they don't
> revert to random-like play, even when it's clear who will win.

I'm all for getting more information out of playouts but, since the
win-rate approach is so successful, there has to be something
special about it. Perhaps it is just that it abstracts over just the
right amount of information to balance out the unwanted artifacts
in the more detailed playout evaluations, perhaps it is something
else. But it seems likely that one needs to find a weighted mix
of win-rate and more details for evaluation (if win-rate is decisive,
use that, otherwise give more weight to alternative measures).

There was a recent (CIG'2008) paper suggesting sigmoid
functions for combining final score and winrate, btw.

Claus





_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to