subject:"\[computer\-go\] How to design the stronger playout policy\?"

Re: [computer-go] How to design the stronger playout policy?

2008-01-10 Thread compgo123

I think the issue with MC or deterministic is following. With MC your value of 
the node can get more accurate with larger number of playouts. With a 
deterministic you don'thave such a mechanism here (there are other?mechanisms 
in UCT). 

DL

-Original Message-
From: Don Dailey <[EMAIL PROTECTED]>
To: computer-go 
Sent: Wed, 9 Jan 2008 3:46 pm
Subject: Re: [computer-go] How to design the stronger playout policy?

Mark Boon wrote:
>
> On 8-jan-08, at 17:04, Don Dailey wrote:
>
>> And yes, it slows down the play-outs.   Still, the play-outs seem to
>>
>> require a good bit of randomness - certainly they cannot be
>>
>> deterministic and it seems difficult to find the general principles that
>>
>> are important to the play-out policy.  
>>
>
> I was thinking about this and wanted to make sure I understand what
> you mean by 'cannot be deterministic'.
You could build a playing strategy that would always play the same exact
game from a particular position and this would be deterministic but then
the "Monte Carlo" component is completely missing from the
program.   You would still get some chaos from the tree itself since
you always expand a node once visited.   

I don't think this is fully understood so I could be wrong in my
statement - but it does seem likely that a certain amount of randomness
will improve the results. Attempts to make the play-outs stronger
and stronger have failed,  perhaps due to the fact that this makes them
less and less random,  or perhaps it's due to some other phenomenon.  

- Don

> For example a stone gets put into atari and it finds an escaping move.
> With plain playouts the stone gets captured anyway roughly 50% of the
> time. With heavy playouts it gives a weighting so that the chance of
> escaping becomes larger? Or does it always play the escaping move and
> just the rest of the playout is random?
>
> Mark
>
> 
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

More new features than ever.  Check out the new AOL Mail ! - 
http://webmail.aol.com
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-09 Thread Don Dailey



Mark Boon wrote:
>
> On 8-jan-08, at 17:04, Don Dailey wrote:
>
>> And yes, it slows down the play-outs.   Still, the play-outs seem to
>>
>> require a good bit of randomness - certainly they cannot be
>>
>> deterministic and it seems difficult to find the general principles that
>>
>> are important to the play-out policy.  
>>
>
> I was thinking about this and wanted to make sure I understand what
> you mean by 'cannot be deterministic'.
You could build a playing strategy that would always play the same exact
game from a particular position and this would be deterministic but then
the "Monte Carlo" component is completely missing from the
program.   You would still get some chaos from the tree itself since
you always expand a node once visited.   

I don't think this is fully understood so I could be wrong in my
statement - but it does seem likely that a certain amount of randomness
will improve the results. Attempts to make the play-outs stronger
and stronger have failed,  perhaps due to the fact that this makes them
less and less random,  or perhaps it's due to some other phenomenon.  

- Don


> For example a stone gets put into atari and it finds an escaping move.
> With plain playouts the stone gets captured anyway roughly 50% of the
> time. With heavy playouts it gives a weighting so that the chance of
> escaping becomes larger? Or does it always play the escaping move and
> just the rest of the playout is random?
>
> Mark
>
> 
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-09 Thread Mark Boon



On 8-jan-08, at 17:04, Don Dailey wrote:


And yes, it slows down the play-outs.   Still, the play-outs seem to
require a good bit of randomness - certainly they cannot be
deterministic and it seems difficult to find the general principles  
that

are important to the play-out policy.


I was thinking about this and wanted to make sure I understand what  
you mean by 'cannot be deterministic'.


For example a stone gets put into atari and it finds an escaping  
move. With plain playouts the stone gets captured anyway roughly 50%  
of the time. With heavy playouts it gives a weighting so that the  
chance of escaping becomes larger? Or does it always play the  
escaping move and just the rest of the playout is random?


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-08 Thread Magnus Persson


Quoting Don Dailey <[EMAIL PROTECTED]>:


And yes, it slows down the play-outs.   Still, the play-outs seem to
require a good bit of randomness - certainly they cannot be
deterministic and it seems difficult to find the general principles that
are important to the play-out policy.


Not all changes to the playouts slows down the program. If the  
playouts become shorter as a consequence the program can speed up.


-Magnus


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-08 Thread Don Dailey

That would be exciting seeing your team get involved with this Monte
Carlo stuff,  especially since you have some previous experience with
this.  

- Don
 

David Doshay wrote:
> I have been interested in monte-carlo approaches to Go since running
> my first MC simulations in magnetic phase transitions when I was in
> graduate school in the 1980's. What held me back, even when the latest
> crop of MC programs started winning against older stronger programs
> and my program SlugGo, is that in the physics simulations we know on
> theoretical grounds what the shape of the random distributions are, but
> in Go we do not. I was amazed at how well UCT helps get around the
> problem and still allows the use of nearly flat random distributions
> (flat except for a few hand tuned rules).
>
> Recent work on the ELO ratings of patterns comes very close to what I
> think is needed to move forward, and we are also working on ways to
> determine biased probability distributions that are appropriate for Go
> move selection.
>
> I have little doubt that such "weighted playout" (not really heavy or
> light)
> considerations will lead to the next major step in computer Go progress.
> The advantage of such a method is that it intrinsically matches the basic
> premise of MC: the right degree of randomness allows you to search the
> problem space appropriately.
>
> Cheers,
> David
>
>
>
> On 8, Jan 2008, at 11:04 AM, Don Dailey wrote:
>
>> I think Dave Hillis coined this term "heavy playouts."
>>
>> In the first programs the play-outs were uniformly random.   Any move
>> would get played with equal likelihood with the exception of eye-filling
>> moves which don't get played at all of course.
>>
>> But it was found that the program improves if the play-outs are somewhat
>> "managed".   So Dave starting calling the original formula "light
>> play-outs" and the slower but better method "heavy play-outs."
>>
>> And yes, it slows down the play-outs.   Still, the play-outs seem to
>> require a good bit of randomness - certainly they cannot be
>> deterministic and it seems difficult to find the general principles that
>> are important to the play-out policy.
>>
>> - Don
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-08 Thread David Doshay


I have been interested in monte-carlo approaches to Go since running
my first MC simulations in magnetic phase transitions when I was in
graduate school in the 1980's. What held me back, even when the latest
crop of MC programs started winning against older stronger programs
and my program SlugGo, is that in the physics simulations we know on
theoretical grounds what the shape of the random distributions are, but
in Go we do not. I was amazed at how well UCT helps get around the
problem and still allows the use of nearly flat random distributions
(flat except for a few hand tuned rules).

Recent work on the ELO ratings of patterns comes very close to what I
think is needed to move forward, and we are also working on ways to
determine biased probability distributions that are appropriate for Go
move selection.

I have little doubt that such "weighted playout" (not really heavy or  
light)

considerations will lead to the next major step in computer Go progress.
The advantage of such a method is that it intrinsically matches the  
basic

premise of MC: the right degree of randomness allows you to search the
problem space appropriately.

Cheers,
David



On 8, Jan 2008, at 11:04 AM, Don Dailey wrote:


I think Dave Hillis coined this term "heavy playouts."

In the first programs the play-outs were uniformly random.   Any move
would get played with equal likelihood with the exception of eye- 
filling

moves which don't get played at all of course.

But it was found that the program improves if the play-outs are  
somewhat

"managed".   So Dave starting calling the original formula "light
play-outs" and the slower but better method "heavy play-outs."

And yes, it slows down the play-outs.   Still, the play-outs seem to
require a good bit of randomness - certainly they cannot be
deterministic and it seems difficult to find the general principles  
that

are important to the play-out policy.

- Don


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-08 Thread Don Dailey

I think Dave Hillis coined this term "heavy playouts."  

In the first programs the play-outs were uniformly random.   Any move
would get played with equal likelihood with the exception of eye-filling
moves which don't get played at all of course.

But it was found that the program improves if the play-outs are somewhat
"managed".   So Dave starting calling the original formula "light
play-outs" and the slower but better method "heavy play-outs."  

And yes, it slows down the play-outs.   Still, the play-outs seem to
require a good bit of randomness - certainly they cannot be
deterministic and it seems difficult to find the general principles that
are important to the play-out policy.  

- Don

Mark Boon wrote:
>
> On 5-jan-08, at 11:48, Gian-Carlo Pascutto wrote:
>> >
>>> Would you explain the details of the playout policy?
>>
>> (1) Captures of groups that could not save themselves last move.
>> (2) Save groups in atari due to last move by capturing or extending.
>> (3) Patterns next to last move.
>> (4) Global moves.
>>
>
> Is this what is meant by 'heavy playout'?
>
> I suppose it slows down the playout considerably. Just keeping track
> of actual liberties instead of pseudo-liberties should probably be
> three times slower by itself. But in itself I think the idea is
> interesting. I've always felt programming Go would go towards several
> layers of playing engines. Each one a bit more sophisiticated built on
> top of a more primitive one.
>
> Mark
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-08 Thread Mark Boon



On 5-jan-08, at 11:48, Gian-Carlo Pascutto wrote:

>

Would you explain the details of the playout policy?


(1) Captures of groups that could not save themselves last move.
(2) Save groups in atari due to last move by capturing or extending.
(3) Patterns next to last move.
(4) Global moves.



Is this what is meant by 'heavy playout'?

I suppose it slows down the playout considerably. Just keeping track  
of actual liberties instead of pseudo-liberties should probably be  
three times slower by itself. But in itself I think the idea is  
interesting. I've always felt programming Go would go towards several  
layers of playing engines. Each one a bit more sophisiticated built  
on top of a more primitive one.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Yamato

Don Dailey wrote:
>Lazarus uses a system very simlar to the original MoGo policy as
>documented in the paper.   However I did find one significant
>improvement.I used Rémi's ELO system to rate patterns and I simply
>throw out moves which match the weakest patterns in the play-outs.In
>the tree,  I also throw out these moves but this is progressive.   
>Those moves still get explored but not near leaf nodes.  

It is interesting that many people say the similar thing.
I'll try to use ELO rating of patterns again.

>Programs that play games tend to be highly idiosyncratic.   What may
>work today may not work tomorrow or may work for me and not you.It
>depends on what is already in your program, what isn't, how you
>implemented it, etc.Some improvements work well with others and some
>do not cooperate with other changes. The whole process is a bit of a
>black art,  not just the play-outs.

I fully agree.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Gian-Carlo Pascutto


Yamato wrote:

I finally improved my playouts by using Remi's ELO system to learn a set 
of "interesting" patterns, and just randomly fiddling with the 
probabilities (compressing/expanding) until something improved my 
program in self-play with about +25%. Not a very satisfying method or an 
exceptional result. There could be some other magic combination that is 
even better, or maybe not.


I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.

>

Would you explain the details of the playout policy?


(1) Captures of groups that could not save themselves last move.
(2) Save groups in atari due to last move by capturing or extending.
(3) Patterns next to last move.
(4) Global moves.

I quantize the MM pattern scores to 0..255 by multipying them with a 
large constant and clipping. This causes the "very good" patterns to 
have close scores. I then use a threshold so I do not play the very bad 
patterns at all. The remaining moves are played with the probabilities 
indicated by the quantized values.


I also throw away very bad moves in phase (4) unless there are no 
alternatives. This gives a small but measurable improvement.


But now I believe all the above is actually flawed. With this system I 
will play bad saving moves even if there are great pattern moves. It 
might be that your ladder detection avoids these problems somewhat.


Considering the probabilities of all moves as Crazy Stone does avoids 
this problem.


I am now trying to get a similar effect without incrementally updating 
all urgencies.



Do you use only 3x3 patterns?


Yes.

I have not tried bigger ones. For size = 4 the tables would become 2 x 
16M. Might be worth a try.


--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Don Dailey

Lazarus uses a system very simlar to the original MoGo policy as
documented in the paper.   However I did find one significant
improvement.I used Rémi's ELO system to rate patterns and I simply
throw out moves which match the weakest patterns in the play-outs.In
the tree,  I also throw out these moves but this is progressive.   
Those moves still get explored but not near leaf nodes.  

It seems like I also found a minor improvement in giving captures a
higher priority at some point but I don't remember the details.  

I have not put a huge amount of energy into anything other than this.

Programs that play games tend to be highly idiosyncratic.   What may
work today may not work tomorrow or may work for me and not you.It
depends on what is already in your program, what isn't, how you
implemented it, etc.Some improvements work well with others and some
do not cooperate with other changes. The whole process is a bit of a
black art,  not just the play-outs.

The very biggest problem of all is how to test a change.   It's not so
difficult in the early stages where you are testing major improvements
and getting 100 ELO at a time.But when you get the point where you
refine a program,   you are talking about small but important
improvements.You are looking for 10 or 20 ELO and hoping to put
several of them together to get 100 or 200. But testing even 20 ELO
points takes thousands of games to get a reliable measure.  If we
could measure 5 ELO improvements in 5 minutes,  you can bet most of us
would be able to produce very strong programs.

Some of the top computer chess guys have testing systems that play
50,000 games or more to measure the value of small improvements, such as
a weight adjustment.   They play those games on several computers (or
cpu's)  and play relatively fast - I have heard of testing systems that
average a game per second per cpu at reasonably strong levels (they are
still playing at least master strength.)  

The problem is that if you test too fast, you are not really stressing a
monte carlo program or testing the sub-systems in the same way they
would be tested in real games. Monte Carlo relies on statistics
gathering and that seems to require games that last at least a few
seconds. So unless you have a large bank of workstations it's
difficult to get enough game to reliable measure small improvements -
since this requires tens of thousands of games.

- Don

Gian-Carlo Pascutto wrote:
> Yamato wrote:
>> I guess the current top programs have much better playout policy than
>> the classical MoGo-style one.
>>
>> The original policy of MoGo was,
>>
>> (1) If the last move is an Atari, plays one saving move randomly.
>> (2) If there are "interesting" moves in the 8 positions around the
>> last move, plays one randomly.
>> (3) If there are the moves capturing stones, plays one randomly.
>> (4) Plays one random move on the board.
>>
>> I (and maybe many others) use it with some improvements, however it
>> will be not enough to catch up the top programs.
>
> What improvements did you try? The obvious one I know are prioritizing
> saving and capturing moves by the size of the string.
>
> Zen appears quite strong on CGOS. Leela using the above system was
> certainly weaker.
>
>> Then I have tested a lot of change of probability distributions, but
>> it was very hard to improve the strength.
>>
>> Any comments?
>
> I had the same problem, i.e. it seems almost impossible to improve the
> MoGo system by having a different pattern set for "interesting" moves,
> or even by varying the probability of "interesting" moves by pattern
> score.
>
> I tried 2 things:
>
> a) I exctracted about 5000 positions with a known winner (determined
> by UCT) from CGOS games, and measured the Mean Square Error of the
> result fof my playouts against the known result (also described in one
> of the MoGo papers). Then I applied a genetic algorithm to optimize
> the playout patterns.
>
> This worked, in the sense that the MSE measured over the 5000
> positions dropped. However, it did not produce a stronger program! I
> found that somewhat shocking.
>
> It makes me doubt the value of the MSE measure.
>
> 2) I made a simple genetic algorithm that makes a random pool of a few
> hundred playout policites, picks 2 random parents and
> crossovers/mutates to 2 children, plays a 10 game match between the 2
> children with simulations = 100, and then keeps the winner.
>
> This did not produce anything interesting either. My best guess is
> that the match results are simply too random.
>
> So I did not found any way to automatically optimize the patterns.
>
> I finally improved my playouts by using Remi's ELO system to learn a
> set of "interesting" patterns, and just randomly fiddling with the
> probabilities (compressing/expanding) until something improved my
> program in self-play with about +25%. Not a very satisfying method or
> an exceptional result. There could be some other

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Yamato

Gian-Carlo Pascutto wrote:
>What improvements did you try? The obvious one I know are prioritizing 
>saving and capturing moves by the size of the string.
>
>Zen appears quite strong on CGOS. Leela using the above system was 
>certainly weaker.

I use the static ladder search in playouts. For example, if a move that
matched a 3x3 pattern is capturable in ladder, that is not interesting.
Of course such a rule makes a program slower, but I believe it is an
improvement.

>I finally improved my playouts by using Remi's ELO system to learn a set 
>of "interesting" patterns, and just randomly fiddling with the 
>probabilities (compressing/expanding) until something improved my 
>program in self-play with about +25%. Not a very satisfying method or an 
>exceptional result. There could be some other magic combination that is 
>even better, or maybe not.

I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.
Would you explain the details of the playout policy?
Do you use only 3x3 patterns?

>What is so frustrating is that the playouts are essentially black magic. 
>   I know of no way to automatically determine what is good and not 
>besides playing about 500 games between 2 strategies. The results are 
>very often completely counterintuitive. There is no systematic way to 
>improve.

Yes. In addition, the big problem is that testing policies is very time
consuming. I think at least 1000 games that use 3000 or more playouts
per move are needed to judge whether a change is good or bad.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Gian-Carlo Pascutto


Yamato wrote:

I guess the current top programs have much better playout policy than
the classical MoGo-style one.

The original policy of MoGo was,

(1) If the last move is an Atari, plays one saving move randomly.
(2) If there are "interesting" moves in the 8 positions around the
last move, plays one randomly.
(3) If there are the moves capturing stones, plays one randomly.
(4) Plays one random move on the board.

I (and maybe many others) use it with some improvements, however it
will be not enough to catch up the top programs.


What improvements did you try? The obvious one I know are prioritizing 
saving and capturing moves by the size of the string.


Zen appears quite strong on CGOS. Leela using the above system was 
certainly weaker.



Then I have tested a lot of change of probability distributions, but
it was very hard to improve the strength.

Any comments?


I had the same problem, i.e. it seems almost impossible to improve the 
MoGo system by having a different pattern set for "interesting" moves, 
or even by varying the probability of "interesting" moves by pattern score.


I tried 2 things:

a) I exctracted about 5000 positions with a known winner (determined by 
UCT) from CGOS games, and measured the Mean Square Error of the result 
fof my playouts against the known result (also described in one of the 
MoGo papers). Then I applied a genetic algorithm to optimize the playout 
patterns.


This worked, in the sense that the MSE measured over the 5000 positions 
dropped. However, it did not produce a stronger program! I found that 
somewhat shocking.


It makes me doubt the value of the MSE measure.

2) I made a simple genetic algorithm that makes a random pool of a few 
hundred playout policites, picks 2 random parents and crossovers/mutates 
to 2 children, plays a 10 game match between the 2 children with 
simulations = 100, and then keeps the winner.


This did not produce anything interesting either. My best guess is that 
the match results are simply too random.


So I did not found any way to automatically optimize the patterns.

I finally improved my playouts by using Remi's ELO system to learn a set 
of "interesting" patterns, and just randomly fiddling with the 
probabilities (compressing/expanding) until something improved my 
program in self-play with about +25%. Not a very satisfying method or an 
exceptional result. There could be some other magic combination that is 
even better, or maybe not.


I now got some improvement by "merging" the (1) (2) (3) in the MoGo 
system and using probabilities on that. It makes sense because the 
playouts wont try hopeless saving moves, for example.


What is so frustrating is that the playouts are essentially black magic. 
  I know of no way to automatically determine what is good and not 
besides playing about 500 games between 2 strategies. The results are 
very often completely counterintuitive. There is no systematic way to 
improve.


--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] How to design the stronger playout policy?

2008-01-04 Thread Yamato

I guess the current top programs have much better playout policy than
the classical MoGo-style one.

The original policy of MoGo was,

(1) If the last move is an Atari, plays one saving move randomly.
(2) If there are "interesting" moves in the 8 positions around the
last move, plays one randomly.
(3) If there are the moves capturing stones, plays one randomly.
(4) Plays one random move on the board.

I (and maybe many others) use it with some improvements, however it
will be not enough to catch up the top programs.
Crazy Stone uses a probability distribution of patterns from the
Bradeley-Terry Model. greenpeep uses similar patterns extracted from
the offline self-play.
Then I have tested a lot of change of probability distributions, but
it was very hard to improve the strength.

Any comments?

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

Re: [computer-go] How to design the stronger playout policy?

[computer-go] How to design the stronger playout policy?

14 matches

Site Navigation

Mail list logo

Footer information