RE: [computer-go] Reflections on a disaster

2009-05-21 Thread David Fotland
Yes.  Extra time goes to positions where the top move is not getting most of 
the playouts.

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of terry mcintyre
Sent: Wednesday, May 20, 2009 10:57 PM
To: computer-go
Subject: Re: [computer-go] Reflections on a disaster

 

Do you have any indication, which can be derived from the playouts, that a 
position might deserve an extra allotment of thinking time?

 

Terry McIntyre terrymcint...@yahoo.com

Any system of entrusting the government to judge and correct its own abuses is 
the same as appointing the accused criminal as his own judge and jury: don't 
expect many convictions.

 

-- Allen Thornton, Laws of the Jungle

 

 

  _  

From: David Fotland fotl...@smart-games.com
To: computer-go computer-go@computer-go.org
Sent: Wednesday, May 20, 2009 10:46:45 PM
Subject: RE: [computer-go] Reflections on a disaster

Many Faces’ static move generator suggests F1 as the first move to try.  Still 
it needs about 35K playouts before F1 is preferred.  For some unknown reason it 
likes H1 before that.  F1  at 35K playouts has a pretty low win rate, about 
35%, because the playouts can’t figure out the semeai.  It needs a million 
playouts before it gets to 90% confident on F1 (about 25 seconds).

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of Brian Sheppard
Sent: Wednesday, May 20, 2009 5:39 PM
To: computer-go@computer-go.org
Subject: [computer-go] Reflections on a disaster

 

The simplest problems give me new appreciation for the difficulties we face in 
programming this

maddening game. Here is an example, with X to play:

 

  1 2 3 4 5 6 7 8 9

A - X - - - - X - - 

B - - - - X X - X X 

C X - - - X - X O O 

D X X X O X X O O O 

E O O O X X X X O O 

F - X X O O O O - O 

G - X O O - - - O O 

H O O X X X X O - - 

J - O O X - O O - - 

 

This position is not that complicated, and many players would make the winning 
move (F1) without

thinking. After all, F1 captures the three-stone O string on the left, and 
saves the three-stone X string below.

 

But a little more thought reveals that F1 is forced. The real problem in this 
position is the five-stone X

string at bottom, which is locked in a semeai with the four-stone O string at 
bottom left. X is winning

that semeai by 3 liberties to 2, but X needs to fill G1 and then H1 to capture. 
Unfortunately, if X plays

G1 without playing F1 first, then G1 is self-atari and loses.

 

The bottom line is that the only winning sequence starts with F1. Otherwise, O 
fills in G6, G5, and J5

before X can fill in F1, G1, and J1.

 

Such a simple situation. Would you figure that a program rated 1995 on CGOS 
would have any

trouble with it? Well,…

 

What happens here is that Pebbles (as X) initially sees F1 as probably 
*losing*. Here are the dynamics:

 

1)  I have measured that encouraging Atari moves in the trials is 
self-defeating, so I don’t do it.

2)  Pebbles generates ladder plays in the trials, but only adjacent to the 
opponent’s last play. (This won’t help here.)

3)  There is an Atari bonus in the tree search, but the weight is small.

4)  A larger weight is placed on proximity to the opponent’s last move.

 

So here is the dynamic in the first 40 or so trials of F1: O will respond by 
running out of Atari at C4.

X will play adjacent to that play, because even though G1 gets the Atari bonus, 
playing C3 gets the

larger proximity bonus. O will rarely play J1 or G1, because these moves are 
not bonused. Eventually,

O will play G6 or G5 or J5. And then X goes truly wrong because of the 
proximity heuristic: X will make

a play “near” O’s last play, and this is disastrous because it often fills in 
X’s own liberty. O then responds

near X’s last play, which wins the semeai. So X loses the trial!

 

Of the first 40 trials, X is winning about 35%. Now, the problem is that in the 
rest of the variations, X does

great in the early going. This is because O tries to run out of Atari by 
playing F1, and then X captures with G1!

 

It takes many thousands of trials to prove that all of X’s possible plays have 
less than a 50% chance of

winning before attention returns to F1. Then F1 isn’t preferred until over 
60,000 trials have elapsed.

 

Here are a few reflections on this disaster:

 

1) Start on a positive note: this situation is very bad for the heuristics 
encoded in Pebbles, yet UCT

solves the problem anyway. Indeed, UCT provides us with a scalable strategy for 
*safely* encoding Go

knowledge into a search engine. UCT will solve the problem even if our initial 
impression is wrong.

 

2) It is possible (and tempting) to write code that sees through this sort of 
thing. But I have to wonder about

the scalability of that strategy. It takes a lot of time to create the code. 
And testing is an issue. Can we apply

machine learning to discover move ordering knowledge

Re: [computer-go] Reflections on a disaster

2009-05-21 Thread Darren Cook
 Do you have any indication, which can be derived from the playouts,
 that a position might deserve an extra allotment of thinking time?

I've a half-finished article, called Consistent PV Enchancement. This
was inspired by looking at the prime variation information that Many
Faces gives out (check View|Show Lookahead).

E.g. if the top move is 55% to white, black's reply is 45%, etc. But
nearer the bottom of the prime variation it has become 49% to white, 51%
to black, then something fishy is going on, and it needs more time to
investigate.

The article is still unfinished as I started to feel it was just a minor
tweak for time usage, rather than the big jump in strength I'd hoped
for. Often it shows a bad move choice half way down the prime variation,
but the conclusion at the start of the prime variation was still correct.

Darren

P.S. This was the case in one position I looked at today: almost the
last move in the P.V. suddenly switched from about 40% on previous moves
to 85% to black. But with 200,000 playouts, so quite a solid estimate. I
played the P.V. out to that point and it was quite right - good for
black. This was caused by a white mistake 3 moves before. Once I chose a
better move there it went back to being good for white, so the estimate
at the start of the prime variation seemed valid.


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Reflections on a disaster

2009-05-21 Thread David Fotland
The last moves in the PV are usually quite weak.  They don’t get a lot of 
playouts.

 -Original Message-
 From: computer-go-boun...@computer-go.org [mailto:computer-go-
 boun...@computer-go.org] On Behalf Of Darren Cook
 Sent: Wednesday, May 20, 2009 11:39 PM
 To: computer-go
 Subject: Re: [computer-go] Reflections on a disaster
 
  Do you have any indication, which can be derived from the playouts,
  that a position might deserve an extra allotment of thinking time?
 
 I've a half-finished article, called Consistent PV Enchancement. This
 was inspired by looking at the prime variation information that Many
 Faces gives out (check View|Show Lookahead).
 
 E.g. if the top move is 55% to white, black's reply is 45%, etc. But
 nearer the bottom of the prime variation it has become 49% to white, 51%
 to black, then something fishy is going on, and it needs more time to
 investigate.
 
 The article is still unfinished as I started to feel it was just a minor
 tweak for time usage, rather than the big jump in strength I'd hoped
 for. Often it shows a bad move choice half way down the prime variation,
 but the conclusion at the start of the prime variation was still correct.
 
 Darren
 
 P.S. This was the case in one position I looked at today: almost the
 last move in the P.V. suddenly switched from about 40% on previous moves
 to 85% to black. But with 200,000 playouts, so quite a solid estimate. I
 played the P.V. out to that point and it was quite right - good for
 black. This was caused by a white mistake 3 moves before. Once I chose a
 better move there it went back to being good for white, so the estimate
 at the start of the prime variation seemed valid.
 
 
 --
 Darren Cook, Software Researcher/Developer
 http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
 open source dictionary/semantic network)
 http://dcook.org/work/ (About me and my work)
 http://dcook.org/blogs.html (My blogs and articles)
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Reflections on a disaster

2009-05-21 Thread Magnus Persson
Hi, as usual Valkyria seems to handle this position well at the price  
of being a super slow program in general.


This is just one example of how it reacts.

After 100 simulations it treats F1 as the best almost always, having  
searched 30 to 100 times. Perahps 50-70 times is the most common result.



Valkyria handles this position well for two reasons. As soon as X  
plays F1, the semeai is defended properly, so the Winrate for O is  
just 5% and then drops down to 0%. The few wins for O comes from X  
randomly removing one of its own liberties in a dumb way at J5.  
Removing the liberty at G5 that makes an empty triangle is safely  
pruned since it is dominated by G6 which removes a liberty from the  
same block of O but gain a liberty. So one of the two gamelosing  
mistakes are pruned.


The hard part for Valkyria is to search F1 at all, and there is a  
special mechanism (or hack really...) to ensure that.


Valkyria computes AMAF win rates for all moves including those that  
are pruned or illegal in the position. What I noticed is that in cases  
of critical semeais the AMAF values of moves that are for example  
illegal can get very high since they only get legal when you win the  
semeai and thus win the game


Therefore Valkyria takes the AMAF values of moves that cannot be  
played yet, and try to find approach moves that will enable playing  
them and replaces the AMAF values of the approach move with the AMAF  
value of the illegal move if it is higher.


This is a costly computation so it is only done if the position has  
been visited several times.


Without this AMAF-hack Valkyria sometimes has a problem finding F1. It  
also seems to take a longer time to find F1 in all cases where does  
find it.


I have not yet tested the effect on the playing strength from this.

Best
Magnus

Quoting Brian Sheppard sheppar...@aol.com:


The simplest problems give me new appreciation for the difficulties we face
in programming this

maddening game. Here is an example, with X to play:



  1 2 3 4 5 6 7 8 9

A - X - - - - X - -

B - - - - X X - X X

C X - - - X - X O O

D X X X O X X O O O

E O O O X X X X O O

F - X X O O O O - O

G - X O O - - - O O

H O O X X X X O - -

J - O O X - O O - -






heuristics often produce the *wrong* move ordering, too. In that case there
is a loss of efficiency.


Yes this is the hard part. Note for example how Valkyria in this  
position will prune X at G5 becuase G6 must always be a better move.  
However, X J5 is not pruned because at the moment I have no simple way  
of proving that removing a liberty from the opponent is not a good  
move. The easiest way to prune that move would be to read the ladder  
it creates, still it might be confused with a nakade that sacrifices  
stones to kill and capture the surrounding stones so one need to make  
sure this is not the case so I will not do that in Valkyria right  
now it is just too complicated.


Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Reflections on a disaster

2009-05-21 Thread Michael Williams

Cool idea.

Magnus Persson wrote:
Valkyria computes AMAF win rates for all moves including those that are 
pruned or illegal in the position. What I noticed is that in cases of 
critical semeais the AMAF values of moves that are for example illegal 
can get very high since they only get legal when you win the semeai and 
thus win the game


Therefore Valkyria takes the AMAF values of moves that cannot be played 
yet, and try to find approach moves that will enable playing them and 
replaces the AMAF values of the approach move with the AMAF value of the 
illegal move if it is higher.


This is a costly computation so it is only done if the position has been 
visited several times.


Without this AMAF-hack Valkyria sometimes has a problem finding F1. It 
also seems to take a longer time to find F1 in all cases where does find 
it.


I have not yet tested the effect on the playing strength from this.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Reflections on a disaster

2009-05-21 Thread Rémi Coulom

Brian Sheppard wrote:


The simplest problems give me new appreciation for the difficulties we 
face in programming this


maddening game. Here is an example, with X to play:

 


  1 2 3 4 5 6 7 8 9

A - X - - - - X - -

B - - - - X X - X X

C X - - - X - X O O

D X X X O X X O O O

E O O O X X X X O O

F - X X O O O O - O

G - X O O - - - O O

H O O X X X X O - -

J - O O X - O O - -

 

This position is not that complicated, and many players would make the 
winning move (F1) without


thinking. After all, F1 captures the three-stone O string on the left, 
and saves the three-stone X string below.


Crazy Stone finds F1 immediately (200 playouts or so). That is because 
the playouts of Crazy Stone are intelligent enough to evaluate the 
semeai correctly.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Reflections on a disaster

2009-05-20 Thread David Fotland
Many Faces' static move generator suggests F1 as the first move to try.
Still it needs about 35K playouts before F1 is preferred.  For some unknown
reason it likes H1 before that.  F1  at 35K playouts has a pretty low win
rate, about 35%, because the playouts can't figure out the semeai.  It needs
a million playouts before it gets to 90% confident on F1 (about 25 seconds).

 

David

 

From: computer-go-boun...@computer-go.org
[mailto:computer-go-boun...@computer-go.org] On Behalf Of Brian Sheppard
Sent: Wednesday, May 20, 2009 5:39 PM
To: computer-go@computer-go.org
Subject: [computer-go] Reflections on a disaster

 

The simplest problems give me new appreciation for the difficulties we face
in programming this

maddening game. Here is an example, with X to play:

 

  1 2 3 4 5 6 7 8 9

A - X - - - - X - - 

B - - - - X X - X X 

C X - - - X - X O O 

D X X X O X X O O O 

E O O O X X X X O O 

F - X X O O O O - O 

G - X O O - - - O O 

H O O X X X X O - - 

J - O O X - O O - - 

 

This position is not that complicated, and many players would make the
winning move (F1) without

thinking. After all, F1 captures the three-stone O string on the left, and
saves the three-stone X string below.

 

But a little more thought reveals that F1 is forced. The real problem in
this position is the five-stone X

string at bottom, which is locked in a semeai with the four-stone O string
at bottom left. X is winning

that semeai by 3 liberties to 2, but X needs to fill G1 and then H1 to
capture. Unfortunately, if X plays

G1 without playing F1 first, then G1 is self-atari and loses.

 

The bottom line is that the only winning sequence starts with F1. Otherwise,
O fills in G6, G5, and J5

before X can fill in F1, G1, and J1.

 

Such a simple situation. Would you figure that a program rated 1995 on CGOS
would have any

trouble with it? Well,.

 

What happens here is that Pebbles (as X) initially sees F1 as probably
*losing*. Here are the dynamics:

 

1)  I have measured that encouraging Atari moves in the trials is
self-defeating, so I don't do it.

2)  Pebbles generates ladder plays in the trials, but only adjacent to the
opponent's last play. (This won't help here.)

3)  There is an Atari bonus in the tree search, but the weight is small.

4)  A larger weight is placed on proximity to the opponent's last move.

 

So here is the dynamic in the first 40 or so trials of F1: O will respond by
running out of Atari at C4.

X will play adjacent to that play, because even though G1 gets the Atari
bonus, playing C3 gets the

larger proximity bonus. O will rarely play J1 or G1, because these moves are
not bonused. Eventually,

O will play G6 or G5 or J5. And then X goes truly wrong because of the
proximity heuristic: X will make

a play near O's last play, and this is disastrous because it often fills
in X's own liberty. O then responds

near X's last play, which wins the semeai. So X loses the trial!

 

Of the first 40 trials, X is winning about 35%. Now, the problem is that in
the rest of the variations, X does

great in the early going. This is because O tries to run out of Atari by
playing F1, and then X captures with G1!

 

It takes many thousands of trials to prove that all of X's possible plays
have less than a 50% chance of

winning before attention returns to F1. Then F1 isn't preferred until over
60,000 trials have elapsed.

 

Here are a few reflections on this disaster:

 

1) Start on a positive note: this situation is very bad for the heuristics
encoded in Pebbles, yet UCT

solves the problem anyway. Indeed, UCT provides us with a scalable strategy
for *safely* encoding Go

knowledge into a search engine. UCT will solve the problem even if our
initial impression is wrong.

 

2) It is possible (and tempting) to write code that sees through this sort
of thing. But I have to wonder about

the scalability of that strategy. It takes a lot of time to create the code.
And testing is an issue. Can we apply

machine learning to discover move ordering knowledge? There are methods in
the literature already, but they

don't *scale*. Usually a finite pattern base is involved, or the cost of
pattern matching rises with the size of the

pattern set, or the knowledge gained cannot be proven to rise indefinitely.

 

3) Even if we do discover move ordering knowledge, is that sufficient? I
have doubts. It seems to me that

improving move ordering is a constant speed-up. That is, it doesn't provide
efficiency gains that increase

with increases in computer power. Specifically, the gain is bounded by the
number of trials required for

UCT-RAVE to discover the recommended moves. Granted, this can be a *lot* of
trials. But keep in mind that

heuristics often produce the *wrong* move ordering, too. In that case there
is a loss of efficiency.

 

4) This is just a puny 9x9 board with just two semeais, each of which is
between 2 and 6 moves long.

Things can get a lot more complicated than that, even on the small board