[computer-go] Heuristics for MC/UCT with all-or-nothing payouts

2007-06-10 Thread Eric Boesch
The UCT heuristic of trying every child of a node once before trying any child twice is reasonable when the payoff distribution is unknown. Why try the lever that paid $5 a second time if there might be another lever that pays $1,000,000? But when the set of possible payoffs is known to be {1,

Re: [computer-go] Heuristics for MC/UCT with all-or-nothing payouts

2007-06-10 Thread Brian Slesinsky
With repeat-winners, if there is a move is seems flawless at first but some flaw is eventually found, there might be a rough transition once the flaw is identified, since there is no backup plan. It might make more sense to study two apparently flawless children equally until a flaw is found in