The UCT heuristic of trying every child of a node once before trying
any child twice is reasonable when the payoff distribution is unknown.
Why try the lever that paid $5 a second time if there might be another
lever that pays $1,000,000? But when the set of possible payoffs is
known to be {1,
With repeat-winners, if there is a move is seems flawless at first but
some flaw is eventually found, there might be a rough transition once
the flaw is identified, since there is no backup plan. It might make
more sense to study two apparently flawless children equally until a
flaw is found in