There has been some talk here of using a zero exploration coefficient.
Does this literally mean using the win ratio (with one dummy win per
node) to decide paths through the MC tree? It seems that the best move
could easily be eliminated by a couple of bad runs.
Does this only work when
Yes, bad luck can be a problem.
Solutions:
1) RAVE/AMAF do bias good moves such that exploration take place anyway
2) Biased priors that initially forces many playouts for good
candidates, so that bad luck becomes less likely for moves that are
rated high using patterns or other means.
3)
There has been some talk here of using a zero exploration coefficient. Does
this literally mean using the win ratio (with one dummy win per node) to
decide paths through the MC tree? It seems that the best move could easily
be eliminated by a couple of bad runs.
Does this only work when using
2009/6/23 Olivier Teytaud teyt...@lri.fr:
By the way, the conditions for consistency in Astar, which is quite related
to Monte-Carlo Tree Search in my humble opinion, imply optimism in the sense
that the value must be overestimated. UCT/MCTS is really similar to Astar
without so-called close