Re: [computer-go] Tweak to MCTS selection criterion
This is kind of interesting. Is anybody measuring their playout performance in real-time at the moment and performing this sort of computation, to check if overtaking the leading move is mathematically impossible? It's interesting with UCT because of the interplay between the time management algorithm and the exploration parameter. Suppose you are early in the game, and your time management algorithm says you should be spending 10 seconds on a move. After six seconds, because your parameter is skewed towards exploitation, you already have 90% more trials on the leading move than anything else, calculate that it cannot be overtaken and abort. Some things come to mind: - If this behaviour is happening consistently, i.e. you end up spending too little time on all your moves, is your exploitation parameter correct? There is a reason you use a time management algorithm to allocate a lot of time in the beginning. You may be doing pointless searches. - Would you rather exploit less in that case, thus spending your allotted time to do more exploration, or would you instead keep searching instead of aborting and reuse the tree for pondering and/or your follow-up move? Given that people spend a lot of time experimenting on good exploitation/exploration parameters, I suspect that the last option (obey the time management, continue searching, reuse the tree) is the better? Christian Don Dailey wrote: 2009/6/6 dhillism...@netscape.net mailto:dhillism...@netscape.net I think this is one of those design decisions that nobody takes on faith. We all wind up testing it both ways and in various combinations. An additional advantage of using the number of visits is that branches at the root become mathematically eliminated and can be pruned away. It often also allows the search to be stopped early. It can save a lot of time for forced moves. Let me see if I understand what you are saying here. If you have set a goal time for thinking of 10 seconds, and let's say 6 seconds have progressed, then it might be possible to stop the search early if you do the math and see that it's not possible for any other move to have more visits given an additional 4 seconds? So when there is a position that has only a single clearly best move, perhaps a capture that cannot wait or a necessary atari defense, then you can probably save as much (or close to) as half the thinking time on that move.Is this correct? So if this understanding is correct, then it still makes sense to do the pebbles test at this point and check to see if another move has a higher score before stopping the search.If the move really is forced and necessary, then the answer will be no and you can stop early. If there is a move that currently appears better but with a too small sample, then it seems foolish to stop early.If the move is a result of just a few lucky playouts, then it will quickly be revealed and you can still stop early. - Don - Dave Hillis -Original Message- From: Michael Williams michaelwilliam...@gmail.com mailto:michaelwilliam...@gmail.com To: computer-go computer-go@computer-go.org mailto:computer-go@computer-go.org Sent: Sat, 6 Jun 2009 5:07 pm Subject: Re: [computer-go] Tweak to MCTS selection criterion Another strategy to be considered is to not allow the thinking to cease until the maximum win rate and the maximum visit count agree on the same move. Obviously this requires some extra code to make sure you don't lose on time, etc. Brian Sheppard wrote: When a UCT search is completed, the usual selection criterion is choose the move that has the most trials. This is more stable than choosing the move that has the highest percentage of wins, since it is possible to have an unreliably high percentage if the number of trials is small. I have a small tweak to that criterion. Pebbles uses choose the move that has the most wins. This rule selects the same move as the conventional criterion in almost every case. The reason why Pebbles' rule is superior is revealed in the case where the moves differ. When Pebbles chooses a different move than the conventional criterion, it is because Pebbles move has more wins in fewer trials. When that happens, Pebbles move would inevitably become the move with the most trials if searching were to continue. So there is actually no downside. Of course, the upside is minor, too. For validation, Pebbles has been using both strategies on CGOS games. At present, the conventional selection strategy has won 341/498 = 68.47%. Pebbles strategy has won 415/583 = 71.18%. This isn't statistically conclusive or anything (0.7 standard deviations; we would need 4 to 8 times as many trials for strong
Re: [computer-go] Tweak to MCTS selection criterion
I think all principles I use in the time management for Valkyria came up in this thread more or less. 1) Valkyria selects move that has been searched the most. 2) It is given a base time for example 20 seconds early on on 9x9 CGOS 3) The base time is adjusted down if the program is winning big. 4) If the best winrate move and the most searched move is the same (moves that have been searched less than N times are ignored) the following can happen: 4a) If only one move has been searched more than N times it is played if it has been searched M times. 4b) If the best move have been searched 20% more than the second best then it plays the best move as soon as remaining time does not allow the second best move to catch up 5) If the move with the best winrate and the most searched move disagree the search will not until 3 times the basetime has elapsed. That on CGOS it will think for up to 1 minute on a single move. I do not prune moves because they cannot catch up. Mainly because the code is so complicated as it is. Also perhaps it is not necessary. If a move suddnly jumps in winrate the program will give it 3 times more time to finish it. In losing positions it always spend the maximum time, since the winrate drops for most searched move most of the time. I intentionally set the time managment to think a lot for the opening moves, because I noticed that Valkyria often lost games in the opening and not due to limitations in the playouts. It played on inferior hardware in the latest KGS tournament, and at least showed it can beat any program. On my todo list is some opening book system so it can save time on the opening moves. Quoting Christian Nentwich christ...@modeltwozero.com: This is kind of interesting. Is anybody measuring their playout performance in real-time at the moment and performing this sort of computation, to check if overtaking the leading move is mathematically impossible? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Tweak to MCTS selection criterion
2009/6/6 dhillism...@netscape.net I had the early stop rule and didn't know if anyone else had thought of it. But I never considered the pebbles rule, which somewhat conflicts with the early stop rule. But as I layed out above I think you could do both. This is probably one of those things that adds a little bit but is difficult to measure. - Don Hmm. Well, the way I remember it, I first heard the pebbles rule idea from you, Don. I checked my code to see if I could pinpoint a date, but no luck. This was a while ago. I could be wrong. I think you are remember this wrong. I don't remember ever telling anyone about the pebbles rule, but when I just heard about it seemed new to me. I usually remember ideas that I already had when expressed by someone else. I vaguely remember telling you about the rule to stop searching when I can calculate that it's impossible for any other move to overtake the winning move. But I'm not even sure of that. It probably makes sense to even do this speculatively if you don't push it too hard.Another thing that would have to be studied! - Don - Dave Hillis -- Wanna slim down for summer? Go to America Takes it Offhttp://www.aolhealth.com/diet/weight-loss-program/?ncid=emlcntusheal0001to learn how. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Tweak to MCTS selection criterion
Many Faces does this. This is kind of interesting. Is anybody measuring their playout performance in real-time at the moment and performing this sort of computation, to check if overtaking the leading move is mathematically impossible? Christian Don Dailey wrote: 2009/6/6 dhillism...@netscape.net mailto:dhillism...@netscape.net I think this is one of those design decisions that nobody takes on faith. We all wind up testing it both ways and in various combinations. An additional advantage of using the number of visits is that branches at the root become mathematically eliminated and can be pruned away. It often also allows the search to be stopped early. It can save a lot of time for forced moves. Let me see if I understand what you are saying here. If you have set a goal time for thinking of 10 seconds, and let's say 6 seconds have progressed, then it might be possible to stop the search early if you do the math and see that it's not possible for any other move to have more visits given an additional 4 seconds? So when there is a position that has only a single clearly best move, perhaps a capture that cannot wait or a necessary atari defense, then you can probably save as much (or close to) as half the thinking time on that move.Is this correct? So if this understanding is correct, then it still makes sense to do the pebbles test at this point and check to see if another move has a higher score before stopping the search.If the move really is forced and necessary, then the answer will be no and you can stop early. If there is a move that currently appears better but with a too small sample, then it seems foolish to stop early.If the move is a result of just a few lucky playouts, then it will quickly be revealed and you can still stop early. - Don - Dave Hillis -Original Message- From: Michael Williams michaelwilliam...@gmail.com mailto:michaelwilliam...@gmail.com To: computer-go computer-go@computer-go.org mailto:computer-go@computer-go.org Sent: Sat, 6 Jun 2009 5:07 pm Subject: Re: [computer-go] Tweak to MCTS selection criterion Another strategy to be considered is to not allow the thinking to cease until the maximum win rate and the maximum visit count agree on the same move. Obviously this requires some extra code to make sure you don't lose on time, etc. Brian Sheppard wrote: When a UCT search is completed, the usual selection criterion is choose the move that has the most trials. This is more stable than choosing the move that has the highest percentage of wins, since it is possible to have an unreliably high percentage if the number of trials is small. I have a small tweak to that criterion. Pebbles uses choose the move that has the most wins. This rule selects the same move as the conventional criterion in almost every case. The reason why Pebbles' rule is superior is revealed in the case where the moves differ. When Pebbles chooses a different move than the conventional criterion, it is because Pebbles move has more wins in fewer trials. When that happens, Pebbles move would inevitably become the move with the most trials if searching were to continue. So there is actually no downside. Of course, the upside is minor, too. For validation, Pebbles has been using both strategies on CGOS games. At present, the conventional selection strategy has won 341/498 = 68.47%. Pebbles strategy has won 415/583 = 71.18%. This isn't statistically conclusive or anything (0.7 standard deviations; we would need 4 to 8 times as many trials for strong statistical evidence). But Pebbles' strategy should be better by a small amount, and it has been, so I present it to you with confidence. Best, Brian ___ computer-go mailing list computer-go@computer-go.org mailto:computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org mailto:computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Wanna slim down for summer? Go to America Takes it Off http://www.aolhealth.com/diet/weight-loss- program/?ncid=emlcntusheal0001 to learn how. ___ computer-go mailing list computer-go@computer-go.org mailto:computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo
[computer-go] Tweak to MCTS selection criterion
check if overtaking the leading move is mathematically impossible? Yes. Pebbles does this. Please note that the discussion has veered into time control policy, which is not the subject of the original post. The original post discusses move selection policy: when your program stops, for whatever reason, which move should it select? Pebbles' policy is to select the move that has the most wins, rather than the most trials. Pebbles policy is a risk-free improvement over the standard policy of selecting the move that has the most trials. Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Tweak to MCTS selection criterion
When a UCT search is completed, the usual selection criterion is choose the move that has the most trials. This is more stable than choosing the move that has the highest percentage of wins, since it is possible to have an unreliably high percentage if the number of trials is small. I have a small tweak to that criterion. Pebbles uses choose the move that has the most wins. This rule selects the same move as the conventional criterion in almost every case. The reason why Pebbles' rule is superior is revealed in the case where the moves differ. When Pebbles chooses a different move than the conventional criterion, it is because Pebbles move has more wins in fewer trials. When that happens, Pebbles move would inevitably become the move with the most trials if searching were to continue. So there is actually no downside. Of course, the upside is minor, too. For validation, Pebbles has been using both strategies on CGOS games. At present, the conventional selection strategy has won 341/498 = 68.47%. Pebbles strategy has won 415/583 = 71.18%. This isn't statistically conclusive or anything (0.7 standard deviations; we would need 4 to 8 times as many trials for strong statistical evidence). But Pebbles' strategy should be better by a small amount, and it has been, so I present it to you with confidence. Best, Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Tweak to MCTS selection criterion
Another strategy to be considered is to not allow the thinking to cease until the maximum win rate and the maximum visit count agree on the same move. Obviously this requires some extra code to make sure you don't lose on time, etc. Brian Sheppard wrote: When a UCT search is completed, the usual selection criterion is choose the move that has the most trials. This is more stable than choosing the move that has the highest percentage of wins, since it is possible to have an unreliably high percentage if the number of trials is small. I have a small tweak to that criterion. Pebbles uses choose the move that has the most wins. This rule selects the same move as the conventional criterion in almost every case. The reason why Pebbles' rule is superior is revealed in the case where the moves differ. When Pebbles chooses a different move than the conventional criterion, it is because Pebbles move has more wins in fewer trials. When that happens, Pebbles move would inevitably become the move with the most trials if searching were to continue. So there is actually no downside. Of course, the upside is minor, too. For validation, Pebbles has been using both strategies on CGOS games. At present, the conventional selection strategy has won 341/498 = 68.47%. Pebbles strategy has won 415/583 = 71.18%. This isn't statistically conclusive or anything (0.7 standard deviations; we would need 4 to 8 times as many trials for strong statistical evidence). But Pebbles' strategy should be better by a small amount, and it has been, so I present it to you with confidence. Best, Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Tweak to MCTS selection criterion
On Sat, Jun 6, 2009 at 5:07 PM, Michael Williams michaelwilliam...@gmail.com wrote: Another strategy to be considered is to not allow the thinking to cease until the maximum win rate and the maximum visit count agree on the same move. Obviously this requires some extra code to make sure you don't lose on time, etc. I like the pebbles idea, but this is probably preferred and the pebbles rule is the one to use if time does not permit the extra work. Allocating extra thinking time is very wise in this situation because the situation is one that clearly needs to be resolved - a move that previously did not look good has emerged as potentially best, and perhaps the current move with most samples has dropped in score enough to trigger this event . This could be due to trouble on the horizon. - Don Brian Sheppard wrote: When a UCT search is completed, the usual selection criterion is choose the move that has the most trials. This is more stable than choosing the move that has the highest percentage of wins, since it is possible to have an unreliably high percentage if the number of trials is small. I have a small tweak to that criterion. Pebbles uses choose the move that has the most wins. This rule selects the same move as the conventional criterion in almost every case. The reason why Pebbles' rule is superior is revealed in the case where the moves differ. When Pebbles chooses a different move than the conventional criterion, it is because Pebbles move has more wins in fewer trials. When that happens, Pebbles move would inevitably become the move with the most trials if searching were to continue. So there is actually no downside. Of course, the upside is minor, too. For validation, Pebbles has been using both strategies on CGOS games. At present, the conventional selection strategy has won 341/498 = 68.47%. Pebbles strategy has won 415/583 = 71.18%. This isn't statistically conclusive or anything (0.7 standard deviations; we would need 4 to 8 times as many trials for strong statistical evidence). But Pebbles' strategy should be better by a small amount, and it has been, so I present it to you with confidence. Best, Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Tweak to MCTS selection criterion
I think this is?one of those?design decisions that nobody takes on faith. We all wind up testing it both ways and in various combinations. An additional advantage of using the number of visits is that branches at the root become mathematically eliminated and can be pruned away. It often also allows the search to be stopped early. It can save a lot of time for forced moves. - Dave Hillis -Original Message- From: Michael Williams michaelwilliam...@gmail.com To: computer-go computer-go@computer-go.org Sent: Sat, 6 Jun 2009 5:07 pm Subject: Re: [computer-go] Tweak to MCTS selection criterion Another strategy to be considered is to not allow the thinking to cease until the maximum win rate and the maximum visit count agree on the same move. Obviously this requires some extra code to make sure you don't lose on time, etc.? ? Brian Sheppard wrote:? When a UCT search is completed, the usual selection criterion is? choose the move that has the most trials. This is more stable? than choosing the move that has the highest percentage of wins,? since it is possible to have an unreliably high percentage if the? number of trials is small.? I have a small tweak to that criterion. Pebbles uses choose the? move that has the most wins. This rule selects the same move as? the conventional criterion in almost every case. The reason why? Pebbles' rule is superior is revealed in the case where the moves? differ.? When Pebbles chooses a different move than the conventional criterion,? it is because Pebbles move has more wins in fewer trials. When that? happens, Pebbles move would inevitably become the move with the most? trials if searching were to continue. So there is actually no downside.? Of course, the upside is minor, too.? For validation, Pebbles has been using both strategies on CGOS games.? At present, the conventional selection strategy has won 341/498 = 68.47%.? Pebbles strategy has won 415/583 = 71.18%. This isn't statistically? conclusive or anything (0.7 standard deviations; we would need 4 to 8? times as many trials for strong statistical evidence). But Pebbles'? strategy should be better by a small amount, and it has been, so I? present it to you with confidence.? Best,? Brian? ___? computer-go mailing list? computer...@computer-go.org? http://www.computer-go.org/mailman/listinfo/computer-go/? ? ___? computer-go mailing list? computer...@computer-go.org? http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/