Re: [Computer-go] Possible idea - decay old simulations?

2017-07-23 Thread Brian Sheppard via Computer-go
Yes. This is a long-known phenomenon.

 

I was able to get improvements in Pebbles based on the idea of forgetting 
unsuccessful results. It has to be done somewhat carefully, because results 
propagate up the tree. But you can definitely make it work.

 

I recall a paper published on this basis. A paper presumably about CrazyStone: 
Efficient Selectivity and Backup Operators in

Monte-Carlo Tree Search. I see a paper called “Accelerated UCT and Its 
Application to

Two-Player Games”. Could be others.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
David Wu
Sent: Sunday, July 23, 2017 12:25 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Possible idea - decay old simulations?

 

I've been using Leela 0.10.0 for analysis quite often, and I've noticed 
something that might lead to an improvement for the search, and maybe also for 
other MCTS programs.

 

Sometimes, after putting hundreds of thousands of simulations into a few 
possible moves, Leela will appear to favor one, disliking the others for having 
clearly worse reported winrates. But then every once in a while, the winrate 
for one of the other disliked moves will start rising gradually, but very 
consistently.

 

When this happens, if I play down the variation for that move and look at the 
analysis window, I often find that Leela has discovered a new tactic. 
Specifically, I find a node in that subtree where one move has a greatly higher 
winrate than all the others, but does not have too many simulations yet, 
meaning Leela only just now found it. 

(Possibly it already has more simulations than any other single move, but the 
number of simulations of all of the other moves combined still significantly 
outweighs it).

 

Going back to the root, it's clear that if the new tactic has a high enough 
winrate, then the previously disliked move will eventually overtake the favored 
move. But it takes a long time, since the disliked move has a lot of bad 
simulations to outweigh - it's painful to watch the winrate creep up slowly but 
with high upward consistency, until it finally beats out the previously favored 
move.

 

I think there's a chance that the search could be improved by adding a small 
decay over time to the weight of old simulations. This would allow a move to be 
promoted a bit more rapidly with the discovery of a better tactic. You would 
probably want to set the decay over time so that the total weight over time 
still readily approaches infinity (e.g. a fixed exponential decay would 
probably be bad, that would bound the total weight by a constant), but perhaps 
a bit slower than linearly. 

 

Thinking about it from the multi-armed-bandit perspective, I think this also 
makes sense. The distribution of results from each child is nonstationary, 
because the subtree below the child is evolving over time. If they were 
stationary you would weight all historical simulations equally, but since they 
aren't, the more-recent results from a child should get a little bit more 
weight since they give you more information about the current performance of 
the child move.

 

Has anyone tried this sort of idea before?

 

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Possible idea - decay old simulations?

2017-07-23 Thread David Wu
I've been using Leela 0.10.0 for analysis quite often, and I've noticed
something that might lead to an improvement for the search, and maybe also
for other MCTS programs.

Sometimes, after putting hundreds of thousands of simulations into a few
possible moves, Leela will appear to favor one, disliking the others for
having clearly worse reported winrates. But then every once in a while, the
winrate for one of the other disliked moves will start rising gradually,
but very consistently.

When this happens, if I play down the variation for that move and look at
the analysis window, I often find that Leela has discovered a new tactic.
Specifically, I find a node in that subtree where one move has a greatly
higher winrate than all the others, but does not have too many simulations
yet, meaning Leela only just now found it.
(Possibly it already has more simulations than any other single move, but
the number of simulations of all of the other moves combined still
significantly outweighs it).

Going back to the root, it's clear that if the new tactic has a high enough
winrate, then the previously disliked move will eventually overtake the
favored move. But it takes a long time, since the disliked move has a lot
of bad simulations to outweigh - it's painful to watch the winrate creep up
slowly but with high upward consistency, until it finally beats out the
previously favored move.

I think there's a chance that the search could be improved by adding a
small decay over time to the weight of old simulations. This would allow a
move to be promoted a bit more rapidly with the discovery of a better
tactic. You would probably want to set the decay over time so that the
total weight over time still readily approaches infinity (e.g. a fixed
exponential decay would probably be bad, that would bound the total weight
by a constant), but perhaps a bit slower than linearly.

Thinking about it from the multi-armed-bandit perspective, I think this
also makes sense. The distribution of results from each child is
nonstationary, because the subtree below the child is evolving over time.
If they were stationary you would weight all historical simulations
equally, but since they aren't, the more-recent results from a child should
get a little bit more weight since they give you more information about the
current performance of the child move.

Has anyone tried this sort of idea before?
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go