subject:"\[computer\-go\] Tweak to MCTS selection criterion"

Re: [computer-go] Tweak to MCTS selection criterion

2009-06-07 Thread Christian Nentwich



This is kind of interesting. Is anybody measuring their playout 
performance in real-time at the moment and performing this sort of  
computation, to check if overtaking the leading move is mathematically 
impossible?


It's interesting with UCT because of the interplay between the time 
management algorithm and the exploration parameter. Suppose you are 
early in the game, and your time management algorithm says you should be 
spending 10 seconds on a move. After six seconds, because your parameter 
is skewed towards exploitation, you already have 90% more trials on the 
leading move than anything else, calculate that it cannot be overtaken 
and abort.


Some things come to mind:
- If this behaviour is happening consistently, i.e. you end up spending 
too little time on all your moves, is your exploitation parameter 
correct? There is a reason you use a time management algorithm to 
allocate a lot of time in the beginning. You may be doing pointless 
searches.
- Would you rather exploit less in that case, thus spending your 
allotted time to do more exploration, or would you instead keep 
searching instead of aborting and reuse the tree for pondering and/or 
your follow-up move?


Given that people spend a lot of time experimenting on good 
exploitation/exploration parameters, I suspect that the last option 
(obey the time management, continue searching, reuse the tree) is the 
better?


Christian


Don Dailey wrote:



2009/6/6 dhillism...@netscape.net mailto:dhillism...@netscape.net

I think this is one of those design decisions that nobody takes on
faith. We all wind up testing it both ways and in various
combinations.

An additional advantage of using the number of visits is that
branches at the root become mathematically eliminated and can be
pruned away. It often also allows the search to be stopped early.
It can save a lot of time for forced moves.


Let me see if I understand what you are saying here.   

If you have set a goal time for thinking of 10 seconds,  and let's say 
6 seconds have progressed,   then it might be possible to stop the 
search early if you do the math and see that it's not possible for any 
other move to have more visits given an additional 4 seconds?  

So when there is a position that has only a single clearly best move, 
perhaps a capture that cannot wait or a necessary atari defense,  then 
you can probably save as much (or close to) as half the thinking time 
on that move.Is this correct? 

So if this understanding is correct, then it still makes sense to do 
the pebbles test at this point and check to see if another move has 
a higher score before stopping the search.If the move really is 
forced and necessary,  then the answer will be no and you can stop 
early.  If there is a move that currently appears better but with a 
too small sample,  then it seems foolish to stop early.If the 
move is a result of just a few lucky playouts, then it will quickly be 
revealed and you can still stop early.


- Don



 


- Dave Hillis



-Original Message-
From: Michael Williams michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
To: computer-go computer-go@computer-go.org
mailto:computer-go@computer-go.org
Sent: Sat, 6 Jun 2009 5:07 pm
Subject: Re: [computer-go] Tweak to MCTS selection criterion

Another strategy to be considered is to not allow the thinking to
cease until the maximum win rate and the maximum visit count agree
on the same move. Obviously this requires some extra code to make
sure you don't lose on time, etc. 
 
Brian Sheppard wrote: 
 When a UCT search is completed, the usual selection criterion is 
 choose the move that has the most trials. This is more stable 
 than choosing the move that has the highest percentage of wins, 
 since it is possible to have an unreliably high percentage if the 
 number of trials is small. 
  I have a small tweak to that criterion. Pebbles uses choose the 
 move that has the most wins. This rule selects the same move as 
 the conventional criterion in almost every case. The reason why 
 Pebbles' rule is superior is revealed in the case where the moves 
 differ. 
  When Pebbles chooses a different move than the conventional
criterion, 
 it is because Pebbles move has more wins in fewer trials. When that 
 happens, Pebbles move would inevitably become the move with the
most 
 trials if searching were to continue. So there is actually no
downside. 
 Of course, the upside is minor, too. 
  For validation, Pebbles has been using both strategies on CGOS
games. 
 At present, the conventional selection strategy has won 341/498
= 68.47%. 
 Pebbles strategy has won 415/583 = 71.18%. This isn't statistically 
 conclusive or anything (0.7 standard deviations; we would need 4
to 8 
 times as many trials for strong

Re: [computer-go] Tweak to MCTS selection criterion

2009-06-07 Thread Magnus Persson

I think all principles I use in the time management for Valkyria came  
up in this thread more or less.


1) Valkyria selects move that has been searched the most.
2) It is given a base time for example 20 seconds early on on 9x9 CGOS
3) The base time is adjusted down if the program is winning big.
4) If the best winrate move and the most searched move is the same  
(moves that have been searched less than N times are ignored) the  
following can happen:
4a) If only one move has been searched more than N times it is played  
if it has been searched M times.
4b) If the best move have been searched 20% more than the second best  
then it plays the best move as soon as remaining time does not allow  
the second best move to catch up
5) If the move with the best winrate and the most searched move  
disagree the search will not until 3 times the basetime has elapsed.  
That on CGOS it will think for up to 1 minute on a single move.


I do not prune moves because they cannot catch up. Mainly because the  
code is so complicated as it is. Also perhaps it is not necessary. If  
a move suddnly jumps in winrate the program will give it 3 times more  
time to finish it. In losing positions it always spend the maximum  
time, since the winrate drops for most searched move most of the time.


I intentionally set the time managment to think a lot for the opening  
moves, because I noticed that Valkyria often lost games in the opening  
and not due to limitations in the playouts. It played on inferior  
hardware in the latest KGS tournament, and at least showed it can beat  
any program.


On my todo list is some opening book system so it can save time on the  
opening moves.


Quoting Christian Nentwich christ...@modeltwozero.com:



This is kind of interesting. Is anybody measuring their playout
performance in real-time at the moment and performing this sort of
computation, to check if overtaking the leading move is mathematically
impossible?


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

2009-06-07 Thread Don Dailey

2009/6/6 dhillism...@netscape.net

  I had the early stop rule and didn't know if anyone else had thought of
 it.   But I never considered the pebbles rule,
  which somewhat conflicts with the early stop rule.   But as I layed out
 above I think you could do both.

  This is probably one of those things that adds a little bit but is
 difficult to measure.

  - Don

 Hmm. Well, the way I remember it, I first heard the pebbles rule idea from
 you, Don. I checked my code to see if I could pinpoint a date, but no luck.
 This was a while ago. I could be wrong.


I think you are remember this wrong.  I don't remember ever telling anyone
about the pebbles rule,  but when I just heard about it seemed new to me.
I usually remember ideas that I already had when expressed by someone
else.

I vaguely remember telling you about the rule to stop searching when I can
calculate that it's impossible for any other move to overtake the winning
move.   But I'm not even sure of that.

It probably makes sense to even do this speculatively if you don't push it
too hard.Another thing that would have to be studied!

- Don





 - Dave Hillis

 --
 Wanna slim down for summer? Go to America Takes it 
 Offhttp://www.aolhealth.com/diet/weight-loss-program/?ncid=emlcntusheal0001to
  learn how.

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Tweak to MCTS selection criterion

2009-06-07 Thread David Fotland

Many Faces does this.

 
 This is kind of interesting. Is anybody measuring their playout
 performance in real-time at the moment and performing this sort of
 computation, to check if overtaking the leading move is mathematically
 impossible?
 
 
 Christian
 
 
 Don Dailey wrote:
 
 
  2009/6/6 dhillism...@netscape.net mailto:dhillism...@netscape.net
 
  I think this is one of those design decisions that nobody takes on
  faith. We all wind up testing it both ways and in various
  combinations.
 
  An additional advantage of using the number of visits is that
  branches at the root become mathematically eliminated and can be
  pruned away. It often also allows the search to be stopped early.
  It can save a lot of time for forced moves.
 
 
  Let me see if I understand what you are saying here.
 
  If you have set a goal time for thinking of 10 seconds,  and let's say
  6 seconds have progressed,   then it might be possible to stop the
  search early if you do the math and see that it's not possible for any
  other move to have more visits given an additional 4 seconds?
 
  So when there is a position that has only a single clearly best move,
  perhaps a capture that cannot wait or a necessary atari defense,  then
  you can probably save as much (or close to) as half the thinking time
  on that move.Is this correct?
 
  So if this understanding is correct, then it still makes sense to do
  the pebbles test at this point and check to see if another move has
  a higher score before stopping the search.If the move really is
  forced and necessary,  then the answer will be no and you can stop
  early.  If there is a move that currently appears better but with a
  too small sample,  then it seems foolish to stop early.If the
  move is a result of just a few lucky playouts, then it will quickly be
  revealed and you can still stop early.
 
  - Don
 
 
 
 
 
  - Dave Hillis
 
 
 
  -Original Message-
  From: Michael Williams michaelwilliam...@gmail.com
  mailto:michaelwilliam...@gmail.com
  To: computer-go computer-go@computer-go.org
  mailto:computer-go@computer-go.org
  Sent: Sat, 6 Jun 2009 5:07 pm
  Subject: Re: [computer-go] Tweak to MCTS selection criterion
 
  Another strategy to be considered is to not allow the thinking to
  cease until the maximum win rate and the maximum visit count agree
  on the same move. Obviously this requires some extra code to make
  sure you don't lose on time, etc.
 
  Brian Sheppard wrote:
   When a UCT search is completed, the usual selection criterion is
   choose the move that has the most trials. This is more stable
   than choosing the move that has the highest percentage of wins,
   since it is possible to have an unreliably high percentage if the
   number of trials is small.
I have a small tweak to that criterion. Pebbles uses choose the
   move that has the most wins. This rule selects the same move as
   the conventional criterion in almost every case. The reason why
   Pebbles' rule is superior is revealed in the case where the moves
   differ.
When Pebbles chooses a different move than the conventional
  criterion,
   it is because Pebbles move has more wins in fewer trials. When
that
   happens, Pebbles move would inevitably become the move with the
  most
   trials if searching were to continue. So there is actually no
  downside.
   Of course, the upside is minor, too.
For validation, Pebbles has been using both strategies on CGOS
  games.
   At present, the conventional selection strategy has won 341/498
  = 68.47%.
   Pebbles strategy has won 415/583 = 71.18%. This isn't
statistically
   conclusive or anything (0.7 standard deviations; we would need 4
  to 8
   times as many trials for strong statistical evidence). But
Pebbles'
   strategy should be better by a small amount, and it has been, so I
   present it to you with confidence.
Best,
   Brian
___
   computer-go mailing list
   computer-go@computer-go.org mailto:computer-go@computer-go.org
   http://www.computer-go.org/mailman/listinfo/computer-go/
  
  ___
  computer-go mailing list
  computer-go@computer-go.org mailto:computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 

  Wanna slim down for summer? Go to America Takes it Off
  http://www.aolhealth.com/diet/weight-loss-
 program/?ncid=emlcntusheal0001
  to learn how.
 
  ___
  computer-go mailing list
  computer-go@computer-go.org mailto:computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo

[computer-go] Tweak to MCTS selection criterion

2009-06-07 Thread Brian Sheppard

 check if overtaking the leading move is mathematically impossible?

Yes. Pebbles does this.

Please note that the discussion has veered into time control policy,
which is not the subject of the original post.

The original post discusses move selection policy: when your program
stops, for whatever reason, which move should it select? Pebbles' policy
is to select the move that has the most wins, rather than the most trials.

Pebbles policy is a risk-free improvement over the standard policy
of selecting the move that has the most trials.

Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Tweak to MCTS selection criterion

2009-06-06 Thread Brian Sheppard

When a UCT search is completed, the usual selection criterion is
choose the move that has the most trials. This is more stable
than choosing the move that has the highest percentage of wins,
since it is possible to have an unreliably high percentage if the
number of trials is small.

I have a small tweak to that criterion. Pebbles uses choose the
move that has the most wins. This rule selects the same move as
the conventional criterion in almost every case. The reason why
Pebbles' rule is superior is revealed in the case where the moves
differ.

When Pebbles chooses a different move than the conventional criterion,
it is because Pebbles move has more wins in fewer trials. When that
happens, Pebbles move would inevitably become the move with the most
trials if searching were to continue. So there is actually no downside.
Of course, the upside is minor, too.

For validation, Pebbles has been using both strategies on CGOS games.
At present, the conventional selection strategy has won 341/498 = 68.47%.
Pebbles strategy has won 415/583 = 71.18%. This isn't statistically
conclusive or anything (0.7 standard deviations; we would need 4 to 8
times as many trials for strong statistical evidence). But Pebbles'
strategy should be better by a small amount, and it has been, so I
present it to you with confidence.

Best,
Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

2009-06-06 Thread Michael Williams

Another strategy to be considered is to not allow the thinking to cease until the maximum win rate and the maximum visit count agree on the same move.
Obviously this requires some extra code to make sure you don't lose on time, etc.

Brian Sheppard wrote:

When a UCT search is completed, the usual selection criterion is
choose the move that has the most trials. This is more stable
than choosing the move that has the highest percentage of wins,
since it is possible to have an unreliably high percentage if the
number of trials is small.

I have a small tweak to that criterion. Pebbles uses choose the
move that has the most wins. This rule selects the same move as
the conventional criterion in almost every case. The reason why
Pebbles' rule is superior is revealed in the case where the moves
differ.

When Pebbles chooses a different move than the conventional criterion,
it is because Pebbles move has more wins in fewer trials. When that
happens, Pebbles move would inevitably become the move with the most
trials if searching were to continue. So there is actually no downside.
Of course, the upside is minor, too.

For validation, Pebbles has been using both strategies on CGOS games.
At present, the conventional selection strategy has won 341/498 = 68.47%.
Pebbles strategy has won 415/583 = 71.18%. This isn't statistically
conclusive or anything (0.7 standard deviations; we would need 4 to 8
times as many trials for strong statistical evidence). But Pebbles'
strategy should be better by a small amount, and it has been, so I
present it to you with confidence.

Best,
Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

2009-06-06 Thread Don Dailey

On Sat, Jun 6, 2009 at 5:07 PM, Michael Williams 
michaelwilliam...@gmail.com wrote:

 Another strategy to be considered is to not allow the thinking to cease
 until the maximum win rate and the maximum visit count agree on the same
 move. Obviously this requires some extra code to make sure you don't lose on
 time, etc.


I like the pebbles idea, but this is probably preferred and the pebbles rule
is the one to use if time does not permit the extra work.

Allocating extra thinking time is very wise in this situation because the
situation is one that clearly needs to be resolved - a move that previously
did not look good has emerged as potentially best, and perhaps the current
move with most samples has dropped in score enough to trigger this event .
This could be due to trouble on the horizon.

- Don






 Brian Sheppard wrote:

 When a UCT search is completed, the usual selection criterion is
 choose the move that has the most trials. This is more stable
 than choosing the move that has the highest percentage of wins,
 since it is possible to have an unreliably high percentage if the
 number of trials is small.

 I have a small tweak to that criterion. Pebbles uses choose the
 move that has the most wins. This rule selects the same move as
 the conventional criterion in almost every case. The reason why
 Pebbles' rule is superior is revealed in the case where the moves
 differ.

 When Pebbles chooses a different move than the conventional criterion,
 it is because Pebbles move has more wins in fewer trials. When that
 happens, Pebbles move would inevitably become the move with the most
 trials if searching were to continue. So there is actually no downside.
 Of course, the upside is minor, too.

 For validation, Pebbles has been using both strategies on CGOS games.
 At present, the conventional selection strategy has won 341/498 = 68.47%.
 Pebbles strategy has won 415/583 = 71.18%. This isn't statistically
 conclusive or anything (0.7 standard deviations; we would need 4 to 8
 times as many trials for strong statistical evidence). But Pebbles'
 strategy should be better by a small amount, and it has been, so I
 present it to you with confidence.

 Best,
 Brian

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

2009-06-06 Thread dhillismail

I think this is?one of those?design decisions that nobody takes on faith. We 
all wind up testing it both ways and in various combinations.

An additional advantage of using the number of visits is that branches at the 
root become mathematically eliminated and can be pruned away. It often also 
allows the search to be stopped early. It can save a lot of time for forced 
moves.

- Dave Hillis


-Original Message-
From: Michael Williams michaelwilliam...@gmail.com
To: computer-go computer-go@computer-go.org
Sent: Sat, 6 Jun 2009 5:07 pm
Subject: Re: [computer-go] Tweak to MCTS selection criterion


Another strategy to be considered is to not allow the thinking to cease until 
the maximum win rate and the maximum visit count agree on the same move. 
Obviously this requires some extra code to make sure you don't lose on time, 
etc.?
?
Brian Sheppard wrote:?
 When a UCT search is completed, the usual selection criterion is?
 choose the move that has the most trials. This is more stable?
 than choosing the move that has the highest percentage of wins,?
 since it is possible to have an unreliably high percentage if the?
 number of trials is small.?
  I have a small tweak to that criterion. Pebbles uses choose the?
 move that has the most wins. This rule selects the same move as?
 the conventional criterion in almost every case. The reason why?
 Pebbles' rule is superior is revealed in the case where the moves?
 differ.?
  When Pebbles chooses a different move than the conventional criterion,?
 it is because Pebbles move has more wins in fewer trials. When that?
 happens, Pebbles move would inevitably become the move with the most?
 trials if searching were to continue. So there is actually no downside.?
 Of course, the upside is minor, too.?
  For validation, Pebbles has been using both strategies on CGOS games.?
 At present, the conventional selection strategy has won 341/498 = 68.47%.?
 Pebbles strategy has won 415/583 = 71.18%. This isn't statistically?
 conclusive or anything (0.7 standard deviations; we would need 4 to 8?
 times as many trials for strong statistical evidence). But Pebbles'?
 strategy should be better by a small amount, and it has been, so I?
 present it to you with confidence.?
  Best,?
 Brian?
  ___?
 computer-go mailing list?
 computer...@computer-go.org?
 http://www.computer-go.org/mailman/listinfo/computer-go/?
 ?
___?
computer-go mailing list?
computer...@computer-go.org?
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

Re: [computer-go] Tweak to MCTS selection criterion

Re: [computer-go] Tweak to MCTS selection criterion

RE: [computer-go] Tweak to MCTS selection criterion

[computer-go] Tweak to MCTS selection criterion

[computer-go] Tweak to MCTS selection criterion

Re: [computer-go] Tweak to MCTS selection criterion

Re: [computer-go] Tweak to MCTS selection criterion

Re: [computer-go] Tweak to MCTS selection criterion

9 matches

Site Navigation

Mail list logo

Footer information