Re: [Computer-go] alarming UCT behavior

2015-11-08 Thread Hideki Kato
I often found similar for Zen with enough memory.  I and Yamato's 
interpretation follows.  

Yamato tunes many parameters with relatively shorter time settings 
because a huge number of games are necessary for the tuning.  Assuming 
the prior bias and the average outcome from the simulations have to be 
balanced (using multiple parameters)  for strong play: patterns give 
good shapes but almost nothing for local fights and complicated L 
which should be solved by the simulations.  Yamato optimized these 
parameters at a shorter time setting and so the balance is broken (i.e., 
worse than optimum) at longer time settings.

Recently, Yamato tunes the parameters at as long time setting as 
possible.

Hideki

Dave Dyer: <20151106184850.13268e1...@computer-go.org>:
>
>Developing a UCT robot for a new game, I have encountered a
>surprising and alarming behavior:  the longer think time the
>robot is given, the worse the results.  That is, the same robot
>given 5 seconds per move defeats one give 30 seconds, or 180 seconds.
>
>I'm still investigating, but the proximate cause seems to be
>my limit on the size of the UCT tree.   As a memory conservation
>measure, I have a hard limit on the size of the stored tree. After
>the limit is reached, the robot continues running simulations, refining
>the outcomes based on the existing tree and random playouts below
>the leaf nodes.
>
>My intuition would be that the search would be less effective in this
>mode, but producing worse results (as measured by self-play) is 
>strongly counter intuitive.
>
>Does it apply to Go?  Maybe not, but it's at least an indicator
>that arbitrary decisions that "ought to" be ok can be very bad in
>practice.
>
>___
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
-- 
Hideki Kato 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] alarming UCT behavior

2015-11-06 Thread Dave Dyer

Developing a UCT robot for a new game, I have encountered a
surprising and alarming behavior:  the longer think time the
robot is given, the worse the results.  That is, the same robot
given 5 seconds per move defeats one give 30 seconds, or 180 seconds.

I'm still investigating, but the proximate cause seems to be
my limit on the size of the UCT tree.   As a memory conservation
measure, I have a hard limit on the size of the stored tree. After
the limit is reached, the robot continues running simulations, refining
the outcomes based on the existing tree and random playouts below
the leaf nodes.

My intuition would be that the search would be less effective in this
mode, but producing worse results (as measured by self-play) is 
strongly counter intuitive.

Does it apply to Go?  Maybe not, but it's at least an indicator
that arbitrary decisions that "ought to" be ok can be very bad in
practice.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Gonçalo Mendes Ferreira
That doesn't seem very realistic. I'd guess your prior values are 
accurate but the simulations are biased or not representative. Or you 
miss precision in your transition quality floating points. Or there's a 
bug related to being an adversarial problem and you didn't have the 
robots swap colors? :)


Do tell what it was when you discover the problem.

Gonçalo F.

On 06/11/2015 18:48, Dave Dyer wrote:

Developing a UCT robot for a new game, I have encountered a
surprising and alarming behavior:  the longer think time the
robot is given, the worse the results.  That is, the same robot
given 5 seconds per move defeats one give 30 seconds, or 180 seconds.

I'm still investigating, but the proximate cause seems to be
my limit on the size of the UCT tree.   As a memory conservation
measure, I have a hard limit on the size of the stored tree. After
the limit is reached, the robot continues running simulations, refining
the outcomes based on the existing tree and random playouts below
the leaf nodes.

My intuition would be that the search would be less effective in this
mode, but producing worse results (as measured by self-play) is
strongly counter intuitive.

Does it apply to Go?  Maybe not, but it's at least an indicator
that arbitrary decisions that "ought to" be ok can be very bad in
practice.



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Dave Dyer
At 10:59 AM 11/6/2015, Gonçalo Mendes Ferreira wrote:
>That doesn't seem very realistic.

This is with a well tested framework that's been used for 20+ games.
Whatever the ultimate resolution, the counter intuitive result that
triggered it stands alone; longer think times give worse results.

I haven't seen this in any other game - I routinely start with a quick
playing robot, and make a better robot by cranking up the time allowance.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Marc Landgraf
It is indeed very realistic and can be recreated in Go.

The issue is, that you are chopping of the tree at a fixed point and this
may heavily bias the the entire tree, if this point influences the playout.
Like imagine there is one big Atari in the entire game, but it can be
easily answered. If you have a single path in your tree that ends with said
Atari, while no other path ends with this Atari played (or the Atari is
already answered in the tree), this path will then dominate all other
pathes, because in this path the likelyhood of winning (by taking the
stones in Atari) is much bigger than in all other branches. But if you
would chop off the last move that branch suddenly the evaluation would be
correct. If you would not do any more playouts on this node, it would be
just one biased playout, not a big deal in the grand scale. If you would
allow the MCTS to continue further, it would also be able to refute the
Atari. But you are stopping right at the atari, and then pile on playouts
that make it seem work.


2015-11-06 19:59 GMT+01:00 Gonçalo Mendes Ferreira :

> That doesn't seem very realistic. I'd guess your prior values are accurate
> but the simulations are biased or not representative. Or you miss precision
> in your transition quality floating points. Or there's a bug related to
> being an adversarial problem and you didn't have the robots swap colors? :)
>
> Do tell what it was when you discover the problem.
>
> Gonçalo F.
>
>
> On 06/11/2015 18:48, Dave Dyer wrote:
>
>> Developing a UCT robot for a new game, I have encountered a
>> surprising and alarming behavior:  the longer think time the
>> robot is given, the worse the results.  That is, the same robot
>> given 5 seconds per move defeats one give 30 seconds, or 180 seconds.
>>
>> I'm still investigating, but the proximate cause seems to be
>> my limit on the size of the UCT tree.   As a memory conservation
>> measure, I have a hard limit on the size of the stored tree. After
>> the limit is reached, the robot continues running simulations, refining
>> the outcomes based on the existing tree and random playouts below
>> the leaf nodes.
>>
>> My intuition would be that the search would be less effective in this
>> mode, but producing worse results (as measured by self-play) is
>> strongly counter intuitive.
>>
>> Does it apply to Go?  Maybe not, but it's at least an indicator
>> that arbitrary decisions that "ought to" be ok can be very bad in
>> practice.
>>
>>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Gonçalo Mendes Ferreira
That's certainly possible, but assuming the tree is being limited on 
number of nodes, not constant depth, and the selection of equal quality 
candidates to explore is random, how likely is it to happen? The problem 
seems to be constant.


On 06/11/2015 19:29, Marc Landgraf wrote:

It is indeed very realistic and can be recreated in Go.

The issue is, that you are chopping of the tree at a fixed point and this
may heavily bias the the entire tree, if this point influences the playout.
Like imagine there is one big Atari in the entire game, but it can be
easily answered. If you have a single path in your tree that ends with said
Atari, while no other path ends with this Atari played (or the Atari is
already answered in the tree), this path will then dominate all other
pathes, because in this path the likelyhood of winning (by taking the
stones in Atari) is much bigger than in all other branches. But if you
would chop off the last move that branch suddenly the evaluation would be
correct. If you would not do any more playouts on this node, it would be
just one biased playout, not a big deal in the grand scale. If you would
allow the MCTS to continue further, it would also be able to refute the
Atari. But you are stopping right at the atari, and then pile on playouts
that make it seem work.


2015-11-06 19:59 GMT+01:00 Gonçalo Mendes Ferreira :


That doesn't seem very realistic. I'd guess your prior values are accurate
but the simulations are biased or not representative. Or you miss precision
in your transition quality floating points. Or there's a bug related to
being an adversarial problem and you didn't have the robots swap colors? :)

Do tell what it was when you discover the problem.

Gonçalo F.


On 06/11/2015 18:48, Dave Dyer wrote:


Developing a UCT robot for a new game, I have encountered a
surprising and alarming behavior:  the longer think time the
robot is given, the worse the results.  That is, the same robot
given 5 seconds per move defeats one give 30 seconds, or 180 seconds.

I'm still investigating, but the proximate cause seems to be
my limit on the size of the UCT tree.   As a memory conservation
measure, I have a hard limit on the size of the stored tree. After
the limit is reached, the robot continues running simulations, refining
the outcomes based on the existing tree and random playouts below
the leaf nodes.

My intuition would be that the search would be less effective in this
mode, but producing worse results (as measured by self-play) is
strongly counter intuitive.

Does it apply to Go?  Maybe not, but it's at least an indicator
that arbitrary decisions that "ought to" be ok can be very bad in
practice.




___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Dave Dyer

>But you are stopping right at the atari, and then pile on playouts that make 
>it seem work...

Yes, something like that may be the situation that turns the result. 

Suppose the tree stops at a point where there are two moves, a
blunder that leads to a quick end, and another move which leads
to an inconclusive continuation.   

If the tree can't grow, then random playouts will take the "blunder"
branch 50% of time, whereas if the tree were allowed to grow the
non-blunder branch would quickly dominate.

Overall, once the tree is frozen, some percentage of the leaf
nodes will be permanently giving bad feedback.

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Urban Hafner
This is expected. Search for the horizon effect related to computer go
(e.g. in Petr Baudis thesis http://pasky.or.cz/go/prace.pdf)

On Fri, Nov 6, 2015 at 8:48 PM, Gonçalo Mendes Ferreira 
wrote:

> That's certainly possible, but assuming the tree is being limited on
> number of nodes, not constant depth, and the selection of equal quality
> candidates to explore is random, how likely is it to happen? The problem
> seems to be constant.
>
>
> On 06/11/2015 19:29, Marc Landgraf wrote:
>
>> It is indeed very realistic and can be recreated in Go.
>>
>> The issue is, that you are chopping of the tree at a fixed point and this
>> may heavily bias the the entire tree, if this point influences the
>> playout.
>> Like imagine there is one big Atari in the entire game, but it can be
>> easily answered. If you have a single path in your tree that ends with
>> said
>> Atari, while no other path ends with this Atari played (or the Atari is
>> already answered in the tree), this path will then dominate all other
>> pathes, because in this path the likelyhood of winning (by taking the
>> stones in Atari) is much bigger than in all other branches. But if you
>> would chop off the last move that branch suddenly the evaluation would be
>> correct. If you would not do any more playouts on this node, it would be
>> just one biased playout, not a big deal in the grand scale. If you would
>> allow the MCTS to continue further, it would also be able to refute the
>> Atari. But you are stopping right at the atari, and then pile on playouts
>> that make it seem work.
>>
>>
>> 2015-11-06 19:59 GMT+01:00 Gonçalo Mendes Ferreira :
>>
>> That doesn't seem very realistic. I'd guess your prior values are accurate
>>> but the simulations are biased or not representative. Or you miss
>>> precision
>>> in your transition quality floating points. Or there's a bug related to
>>> being an adversarial problem and you didn't have the robots swap colors?
>>> :)
>>>
>>> Do tell what it was when you discover the problem.
>>>
>>> Gonçalo F.
>>>
>>>
>>> On 06/11/2015 18:48, Dave Dyer wrote:
>>>
>>> Developing a UCT robot for a new game, I have encountered a
 surprising and alarming behavior:  the longer think time the
 robot is given, the worse the results.  That is, the same robot
 given 5 seconds per move defeats one give 30 seconds, or 180 seconds.

 I'm still investigating, but the proximate cause seems to be
 my limit on the size of the UCT tree.   As a memory conservation
 measure, I have a hard limit on the size of the stored tree. After
 the limit is reached, the robot continues running simulations, refining
 the outcomes based on the existing tree and random playouts below
 the leaf nodes.

 My intuition would be that the search would be less effective in this
 mode, but producing worse results (as measured by self-play) is
 strongly counter intuitive.

 Does it apply to Go?  Maybe not, but it's at least an indicator
 that arbitrary decisions that "ought to" be ok can be very bad in
 practice.



> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>



-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

2015-11-06 Thread Dave Dyer

>I have seen this exact behavior when first experimenting with long thinking 
>times in Pachi. When you stop growing the tree, the algorithm degenerates to 
>"delayed" single-level Monte Carlo along the principal variations, with all 
>the MC-without-tree weaknesses.

The pathology definitely depends on the nature of the situation
at the leaves.   For example, one of my 'bots plays hex, and that
robot plays so quickly that the memory limit is reached almost 
immediately.  Yet, continuing to run simulations still improves
the results.  An experiment I just ran at 15 seconds per move, 
hit the memory limit in less than 1 second. If you stop the run after
1 second, the result is a lot weaker than the full 15 second run.

My take-away from this is that small, reasonable-sounding choices
can have huge unintended and unanticipated consequences.  Not really
news to anyone using UCT, but easy to forget.


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go