>
> http://computer-go.org/pipermail/computer-go/2009-June/018773.html
>
>
As I've often said something related to that (e.g. in
http://hal.inria.fr/inria-00369786/fr/ ) I'd like to be more precise. What
follows is for binary deterministic games, and I precise at the beginning
that this is not intended to make programs much stronger.

UCT (with exploration) is consistent, in the sense that asymptotically it
will find
an optimal move, if any winning move exists.

With no exploration, this is no more true, even with rave: there are cases
in which rave will only focus on one move and will simulate only this one,
in spite of a poor winning rate. However, a simple trick recovers
consistency:
use (nbWins+K)/(nbSimulations+2K) for example, or other regularization
tricks, if the weight of Rave decreases to 0 when the number of simulations
goes to infinity. However, if you have patterns with negative weights and
the score is biased by these patterns (possibly until a negative value),
then you loose consistency whenever the weight of the patterns decrease to 0
when the number of simulations of the corresponding move goes to 0.

Then, we might feel that we need more than consistency: we want consistency,
and we do not want to simulate all the tree infinitely often. This can be
done
with some specific rules, and in mogo we have added a simple patch which,
roughly, consists in simulating randomly the children nodes when the success
rate of a node is very low (below 30%). This ensures that the following
rules are enough for ensuring consistency:

               score(move) - winRate(move) --> 0 as nbSims(move) -->
infinity

as we optimize automatically patterns and parameters, the advantage is that
we can do it without any risk of loosing consistency.

More precisely, http://www.lri.fr/~teytaud/consistency.pdf (submitted; not a
lot of numerical experiments, it's for theory mainly - yet, for 500 000
simulations per move, there is a significant difference, and for a specific
position, it works well - there was absolutely no tuning for this).

I don't claim this will make a big change in Monte-Carlo Tree Search - just
a bit of theoretical understanding, and the pleasure of asymptotic
consistency - maybe it will make a real difference for very long thinking
time for which we can't use tuning due to the computational cost  :-)
and it's the first time I have more than 50% for a modification derived by
theoretical ideas :-)

Best regards,
Olivier
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to