Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
 Being a computer scientist but new to go, i can grasp some of the theory.
 The question I was trying to get across was:
 
 In a game of self play, if both parties are employing only monte carlo,
 surely its not a good conceptual representation of a human, and if the
 reinforcement learning is based on random simulations wouldnt it be very
 weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread D Gilder
On Sunday 16 November 2008, Heikki Levanto wrote:
 On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
  Being a computer scientist but new to go, i can grasp some of the theory.
  The question I was trying to get across was:
 
  In a game of self play, if both parties are employing only monte carlo,
  surely its not a good conceptual representation of a human, and if the
  reinforcement learning is based on random simulations wouldnt it be very
  weak when playing a real human?

 Here is another amateur answering.

 The way I understand it, modern Monte Carlo programs do not even try to
 emulate a human player with a random player - obviously that would not
 work.

 What they do is that they build a quite traditional search tree starting
 from the current position. They use a random playout as a crude way to
 evaluate a position. Based on this evaluation, they decide which branch of
 the tree to expand.

 This is the way I understand the random playouts: If, in a given position,
 white is clearly ahead, he will win the game if both parts play perfect
 moves. He is also likely to win if both parts play reasonably good moves
 (say, like human amateurs), but there is a bit more of a chance that one
 player hits upon a good combination which the other misses, so the result
 is not quite as reliable. If the playouts are totally random, there is
 still a better chance for white to win, because both parts make equally bad
 moves. The results have much more variation, of course. So far it does not
 sound like a very good proposal, but things change if you consider the
 facts that we don't have perfecr oracles, and good humans are slow to play
 out a position, and can not be integrated into a computer program. Whereas
 random playouts can be done awfully fast, tens of thousands of playouts in
 a second. Averaging the reuslts gives a fair indication of who is more
 likely to win from that position, just what is needed to decide which part
 of the search tree to expand.

Do you know what use (if any) is made of the standard deviation of the 
results?


 The 'random' playouts are not totally random, they include a minimum of
 tactical rules (do not fill own eyes, do not pass as long as there are
 valid moves). Even this little will produce a few blind spots, moves that
 the playouts can not see, and systematically wrong results. Adding more
 go-specific knowledge can make the results much better (more likely to be
 right), but can also add some more blind spots. And it costs time, reducing
 the number of playouts the program can make.

 Hope that explains something of the mystery


 Regards

Heikki


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Claus Reinke
 In a game of self play, if both parties are employing only monte
 carlo, ... random simulations... wouldnt it be very weak...
 ... and some playing around I am clearly mistaken because its works
 quite well.
Yes, it doesn't make sense but it does indeed seem to work :-).

Plain Monte-Carlo bots *are* rather weak. But they work better than
might be expected, and they are fairly simple, which is extremely
motivating for potential Go programmers!-) Instead of having to learn
all of Go and a lot of AI before being able to start building anything,
one can start quickly, and then improve together with the bot. And
the current top programs claim plain Monte-Carlo bots as their remote
ancestors, so it isn't a dead end, either (though complete rewrites may
be needed somewhere along the path;-).

Another way to think about playouts is by comparing them with a
human player processing through the ranks:

1 only knowing the rules makes for very inefficient/slow/buggy play
2 playing lots of games quickly has often been recommended to get a
better feeling for the game; personally, I don't like fast games(*), but
at least the worst inadequacies tend to show up even if we only play
equally clueless opponents (probably because weaknesses aren't
evenly distributed, so that peaks in understanding in one player
happen to match weaknesses in understanding in the other)
3 human players learn over time, until all players in a group have
reached an equal level (or have gone as far as they can, with the
effort they can afford to put into the game)
4 now further progress depends on innovation in the existing pool
of players, on joining a different player pool, or on bringing in
other players, preferably players whose strengths expose
weaknesses in the current pool (similar effects can be achieved
by reading books or observing games)

If we allow the learning outcome in 2/3 to be represented as patterns
(wait a minute, I've fallen into that trap before, so I shouldn't do this,
I usually do this, but that would be faster - I wonder whether it'll work),
then random playouts can emulate a weak form of this process. Instead
of learning from lots of games over time, plain Monte-Carlo bots
extrapolate from lots of dumb games, everytime they consider their
next move.

Random games are really too dumb to be considered games, but bots
can play lots of them in a short time, so they can at least see trends (a
mixture of on-the-spot learning and very bad reading ahead). If we
assume that the quantity compensates somewhat for the quality, a
Monte-Carlo bot is a bit like beginners after lots of games who
always forget their lessons learned when the session ends.

They may have an idea of who is ahead and what is alive, and be able
to organise their games based on those ideas, but a stronger player (who
remembers his lessons) may be able to rip their ideas apart (you might
both think that group is alive but what do you do if I play here? and who
is ahead if you lose that group?). And then there is the scenario of strong
player visits club and humiliates local champion (you might think that
playing there kills that group, but what if they reply here?).

Adding patterns or other biases to make playouts heavier (both slower
and more relevant for evaluation) is similar to improving reading skills,
but with differences: bots already read to the end, so eliminating useless
moves doesn't improve the playout depth, it improves at best the density
of information to be gained from playouts (so you don't have to play as
many of them, or can get more out of those you do play); but playouts
are also still used for on-the-spot-learning, and biases have limited scope
for success, so one has to be careful not to ruin this aspect. Using a tree
on top of the playouts is not just a way to represent the knowledge
extracted from the playouts, but also helps to direct playout effort towards
relevant situations, and gives a place where traditional Go knowledge
can be built in without affecting the playouts directly (so they become
more of an evaluation function than a candidate move generator).

One problem is that the top programs tend to be fairly close in strength.
There are variations in method, so there is some further progress, but
how much of the human going-through-the-ranks evolution has been
captured in current top programs, I don't know. I suspect that it is less
than their ranks would indicate. There is certainly reading going on,
discussion, and inviting stronger players or different bots.

Oh, and if any of this seems to make sense, remember that I just made
it all up!-) You might want to consult these slides for overviews from
folks who have been in the business:


Old-fashioned Computer Go vs Monte-Carlo Go; by Bruno Bouzy,
Invited tutorial, IEEE 2007 Symposium on Computational Intelligence
in Games, CIG 07, Hawaï, USA, 1st April 2007.
Abstract:


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Hideki Kato
Hello Heikki,

Heikki Levanto: [EMAIL PROTECTED]:
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
 Being a computer scientist but new to go, i can grasp some of the theory.
 The question I was trying to get across was:
 
 In a game of self play, if both parties are employing only monte carlo,
 surely its not a good conceptual representation of a human, and if the
 reinforcement learning is based on random simulations wouldnt it be very
 weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

I believe CrazyStone's use of patterns does so and it seems 
successful.

Hideki

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sun, Nov 16, 2008 at 11:46:28AM +, D Gilder wrote:
  This is the way I understand the random playouts: If, in a given position,
  white is clearly ahead, he will win the game if both parts play perfect
  moves. He is also likely to win if both parts play reasonably good moves
  (say, like human amateurs), but there is a bit more of a chance that one
  player hits upon a good combination which the other misses, so the result
  is not quite as reliable. If the playouts are totally random, there is
  still a better chance for white to win, because both parts make equally bad
  moves. The results have much more variation, of course. So far it does not
  sound like a very good proposal, but things change if you consider the
  facts that we don't have perfecr oracles, and good humans are slow to play
  out a position, and can not be integrated into a computer program. Whereas
  random playouts can be done awfully fast, tens of thousands of playouts in
  a second. Averaging the reuslts gives a fair indication of who is more
  likely to win from that position, just what is needed to decide which part
  of the search tree to expand.
 
 Do you know what use (if any) is made of the standard deviation of the 
 results?

Now statistics isn't my strong point, but one of the first and most
successfull systems was called UCT for Upper Confidence Bound. It calculated
some sort of expected error, and added that to the winning ratio. Then it
expanded the branch that had the highest value. If that expansion was a win,
the error margin would get smaller, but the average result would get higher.
Perhaps some other branch would be tried next, but this one would still be a
good candidate. If the result was a loss, the average would drop, and so
would the error, so this move would become much less likely to be expanded.

I am sure someone who understands statistics will soon jump in and correct my
explanation :-)

  - Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson

Quoting Hideki Kato [EMAIL PROTECTED]:


Heikki Levanto: [EMAIL PROTECTED]:

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.


I believe CrazyStone's use of patterns does so and it seems
successful.


With Valkyria I try to follow two principles in heavy playouts.


1) In contact fights there are a lot of shapes that are played most of  
the time. Thus Valkyria checks each move played if there is an obvious  
local response to it. If so it plays it deterministcally. In many  
situations there are two or more such candidates and then it plays one  
of those moves.


2) In many positions the last move played does not trigger any obvious  
response, and then a random move is chosen uniformly


3) There are moves that are inferior 100% of the time both locally and  
globally. These moves are pruned if they are selected and a new random  
move is chosen as long as there are moves left to try.


I got hundreds of handcoded patterns for both 1 and 3. It would be too  
time consuming to test these patterns, so I use my knowledge and  
intuition (European 2 Dan) to simply decide what patterns to include.


So Valkyria has a lot of go knowledge, but mostly such knowledge that  
all go players have up to some strength such as perhaps 8-10 kyu. It  
has no knowledge about global matters. The beauty of MC-evaluation is  
that globally strong moves are most of the time evaluated better than  
globally weak moves. Heavy playouts removes noise from MC-evaluation  
and makes it more sensitive to the true value of moves. Still there  
are biases with all heavy playouts, but they are overcome with MC Tree  
Search (MCTS) that corrects mistakes in the evaluation recursively.


Here are my latest scaling experiment on 9x9 for Valkyria.

Valkyria plays 1150 random games per second on my 4 year old laptop.

This test is against gnugo 3.7.10 assumed to be Elo 1800. Most  
datapoints are based on 500 games. N sims means Valkyria playes N  
heavy playouts per move played. Winrates are in %.


N sims  WinRate Elo (rel Gnu)
47  7.4 1361
94  22  1580
188 37  1708
375 53  1821
750 69.91946
150081.22054
300088  2146
600092.62239
12000   94  2278
24000   97.22416
48000   97.42429

the heavy playouts of Valkyria needs just 375 random games per move to  
match gnugo using only 0.3 seconds per move. And even using only 47  
simulations per move it can still win.


So obviously the heavy playout code of Valkyria is much weaker ( Elo  
1361) than Gnugo and most human opponents, but compared to CGOS a lot  
of programs witho no knowledge are about the same level, although they  
uses 2000 simulations or more.


Searching efficiently using MCTS with AMAF it apparently can be made  
arbitrarily strong.


Hope this explains how both the nature of playouts and the MCTS  
contributes to the playing strength of a program.


Should one go heavy or light? I do not know, I feel that Valkyria is a  
little bit too slow on equivalent hardware against most top programs.  
On the other hand I think it could be tweaked and improved upon.  
Perhaps it can even be made faster by removing code that does not  
improve playing strength. And there is probably still room for adding  
code that improves strength without a noticable slowdown.


I just know that is a lot of hard work doing it the way I did it.

Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Thomas Wolf
On Sun, 16 Nov 2008, Claus Reinke wrote:

 ...
 better feeling for the game; personally, I don't like fast games(*), but
 ...

But there is this saying:

Play quick, lose quick, learn quick!   :)

Thomas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Don Dailey
The random playouts or even  heavy playouts are not intended to emulate
a human player.   Heikki is exactly right.   

It's a crude measurement of how good the position is.   The moves in a
random playout are horrible and so are the moves in a heavy playout.
In fact, improving them arbitrarily will probably hurt the playouts.
Random playouts work reasonably well because to a certain extent bad
moves cancel each other.   Chrilly called this mutual stupidity I
think.  

For example,  a group of stones may be weak and ultimately lost, but it
is still possible to defend that group if the attacker isn't diligent.
What happens in the playouts is that the attacker is NOT diligent, but
neither is the defender!   So the result comes out correct anyway.

Of course this is not reliable, but it's amazingly good at getting a
reasonable picture of what is weak, what is strong and who owns what.

You can improve that by adding some knowledge to the playouts,  but you
must do this with great care.   In my example above, let's say you add a
piece of knowledge that causes the defender to succeed.   You can argue
that the playouts play better go now,  but the conclusion that you
come to for the group we are talking about is now wrong.   

- Don



On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote:
 On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
  Being a computer scientist but new to go, i can grasp some of the theory.
  The question I was trying to get across was:
  
  In a game of self play, if both parties are employing only monte carlo,
  surely its not a good conceptual representation of a human, and if the
  reinforcement learning is based on random simulations wouldnt it be very
  weak when playing a real human?
 
 
 Here is another amateur answering.
 
 The way I understand it, modern Monte Carlo programs do not even try to
 emulate a human player with a random player - obviously that would not work.
 
 What they do is that they build a quite traditional search tree starting from
 the current position. They use a random playout as a crude way to evaluate a
 position. Based on this evaluation, they decide which branch of the tree to
 expand.
 
 This is the way I understand the random playouts: If, in a given position,
 white is clearly ahead, he will win the game if both parts play perfect
 moves. He is also likely to win if both parts play reasonably good moves
 (say, like human amateurs), but there is a bit more of a chance that one
 player hits upon a good combination which the other misses, so the result is
 not quite as reliable. If the playouts are totally random, there is still a
 better chance for white to win, because both parts make equally bad moves.
 The results have much more variation, of course. So far it does not sound
 like a very good proposal, but things change if you consider the facts that
 we don't have perfecr oracles, and good humans are slow to play out a
 position, and can not be integrated into a computer program. Whereas random
 playouts can be done awfully fast, tens of thousands of playouts in a second.
 Averaging the reuslts gives a fair indication of who is more likely to win
 from that position, just what is needed to decide which part of the search
 tree to expand.
 
 The 'random' playouts are not totally random, they include a minimum of
 tactical rules (do not fill own eyes, do not pass as long as there are valid
 moves). Even this little will produce a few blind spots, moves that the
 playouts can not see, and systematically wrong results. Adding more
 go-specific knowledge can make the results much better (more likely to be
 right), but can also add some more blind spots. And it costs time, reducing
 the number of playouts the program can make.
 
 Hope that explains something of the mystery
 
 
 Regards
 
Heikki
 


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson
Yes, Valkyria does a lot of ladder reading as well. Actually pattern  
matching in the case of Valkyria is a little unclear, it is a  
decision trees where the leaves are often procedure calls that looks  
at a larger portion of the board. The ladder code is called for  
various reasons in the tree.


Is 3.7.10 level 10 weaker than the default settings? I will run a test  
using 5000 playouts and 375 playouts to see if it makes a difference.


Also have you tried this version on CGOS9x9?

This version http://cgos.boardspace.net/9x9/cross/Valkyria3.2.6_C2.html
used a machine similar to yours and reached 3000 playouts per second  
using two threads. Your code is then 8 times faster and should be much  
stronger.



-Magnus



Quoting David Fotland [EMAIL PROTECTED]:


I thought Valkyria does local search (ladders) during the playouts.

Many Faces is lighter on the playouts.  I have 17 local 3x3 patterns, then
go to uniform random without filling eyes.

Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our
performance is similar.  I'm doing 23K playouts per second on my 2.2 GHz
Core 2 Duo, so my performance might be a little better, depending on the
specs of your old machine.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread David Fotland
I think I added a small capture bias, but it didn't make much difference.
Sorry, I forgot that it isn't quite pure random.  Before the uniform random,
if there is an enemy one liberty group on the board, with some small
probability, I capture it.  

A pattern includes don't cares and is matched in any orientation.  The 3x3
patterns are only matched adjacent to the last move (8 local places).

David

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:computer-go-
 [EMAIL PROTECTED] On Behalf Of Jason House
 Sent: Sunday, November 16, 2008 5:06 PM
 To: computer-go
 Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
 
 On Nov 16, 2008, at 11:18 AM, David Fotland [EMAIL PROTECTED]
 games.com wrote:
 
  I thought Valkyria does local search (ladders) during the playouts.
 
  Many Faces is lighter on the playouts.  I have 17 local 3x3
  patterns, then
  go to uniform random without filling eyes.
 
 
 No capture bias in the playouts? I thought that was a big strength
 boost.
 
 Out of curiosity, how do you count your patterns. For example, is it
 still one pattern if it includes a don't care? How about rotations/
 reflections of the same basic pattern?
 
 
 
 
  Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%,
  so our
  performance is similar.  I'm doing 23K playouts per second on my 2.2
  GHz
  Core 2 Duo, so my performance might be a little better, depending on
  the
  specs of your old machine.
 
  David
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:computer-go-
  [EMAIL PROTECTED] On Behalf Of Magnus Persson
  Sent: Sunday, November 16, 2008 5:45 AM
  To: computer-go@computer-go.org
  Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
 
  Quoting Hideki Kato [EMAIL PROTECTED]:
 
  Heikki Levanto: [EMAIL PROTECTED]:
  The way I understand it, modern Monte Carlo programs do not even
  try to
  emulate a human player with a random player - obviously that
  would not
  work.
 
  I believe CrazyStone's use of patterns does so and it seems
  successful.
 
  With Valkyria I try to follow two principles in heavy playouts.
 
 
  1) In contact fights there are a lot of shapes that are played most
  of
  the time. Thus Valkyria checks each move played if there is an
  obvious
  local response to it. If so it plays it deterministcally. In many
  situations there are two or more such candidates and then it plays
  one
  of those moves.
 
  2) In many positions the last move played does not trigger any
  obvious
  response, and then a random move is chosen uniformly
 
  3) There are moves that are inferior 100% of the time both locally
  and
  globally. These moves are pruned if they are selected and a new
  random
  move is chosen as long as there are moves left to try.
 
  I got hundreds of handcoded patterns for both 1 and 3. It would be
  too
  time consuming to test these patterns, so I use my knowledge and
  intuition (European 2 Dan) to simply decide what patterns to include.
 
  So Valkyria has a lot of go knowledge, but mostly such knowledge that
  all go players have up to some strength such as perhaps 8-10 kyu. It
  has no knowledge about global matters. The beauty of MC-evaluation is
  that globally strong moves are most of the time evaluated better than
  globally weak moves. Heavy playouts removes noise from MC-evaluation
  and makes it more sensitive to the true value of moves. Still there
  are biases with all heavy playouts, but they are overcome with MC
  Tree
  Search (MCTS) that corrects mistakes in the evaluation recursively.
 
  Here are my latest scaling experiment on 9x9 for Valkyria.
 
  Valkyria plays 1150 random games per second on my 4 year old laptop.
 
  This test is against gnugo 3.7.10 assumed to be Elo 1800. Most
  datapoints are based on 500 games. N sims means Valkyria playes N
  heavy playouts per move played. Winrates are in %.
 
  N simsWinRateElo (rel Gnu)
  477.41361
  94221580
  188371708
  375531821
  75069.91946
  150081.22054
  3000882146
  600092.62239
  12000942278
  2400097.22416
  4800097.42429
 
  the heavy playouts of Valkyria needs just 375 random games per move
  to
  match gnugo using only 0.3 seconds per move. And even using only 47
  simulations per move it can still win.
 
  So obviously the heavy playout code of Valkyria is much weaker ( Elo
  1361) than Gnugo and most human opponents, but compared to CGOS a lot
  of programs witho no knowledge are about the same level, although
  they
  uses 2000 simulations or more.
 
  Searching efficiently using MCTS with AMAF it apparently can be made
  arbitrarily strong.
 
  Hope this explains how both the nature of playouts and the MCTS
  contributes to the playing strength of a program.
 
  Should one go heavy or light? I do not know, I feel that Valkyria
  is a
  little bit too slow on equivalent hardware against most top programs.
  On the other hand I think 

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon
Some months ago I did several experiments with using tactics and  
patterns in playouts. Generally I found a big boost in strength using  
tactics. I also found a boost in strength using patterns but with a  
severe diminishing return after a certain number and even becoming  
detrimental when using large number of patterns (1,000s to 10,000s).  
Since I was using a generalized pattern-matcher, using it slowed  
things down considerably. Although it played a lot better with the  
same number of playouts, if I compared MC playouts with patterns to a  
MC playout without patterns using the same amount of CPU time the  
gain was not so obvious. Since most of the gain in strength was  
gained by just a few patterns I concluded just as David that it was  
probably better to just use a handful of hard-coded patterns during  
playouts.


I only recently started to do real experiments with hard-coded  
patterns and so far my results are rather inconclusive. I found when  
mixing different things it's not always clear what contributes to any  
increased strength observed. So I'm still in the process of trying to  
dissect what is actually contributing where. I found for example that  
a lot of the increased level of play using patterns does not come  
from using them during playouts but comes from the effect they have  
on move-exploration. I don't know if this is due to my particular way  
of implementing MC playouts in combination with UCT search, but moves  
matching a pattern (usually) automatically make it first in the tree- 
expansion as well and generally I can say so far I'm observing that  
most of the increase in level comes from the selection during  
exploration and only in small part from the selection during simulation.


For example, in one particular experiment using just 5 patterns I saw  
a win-rate of 65% against the same program not using patterns (with  
the same number of playouts). But when not using the patterns during  
exploration saw the win-rate drop to just 55%.


I still have a lot of testing to do and it's too early to draw any  
hard conclusions. But I think it's worthwhile trying to distinguish  
where the strength is actually gained. Better yet, finding out  
exactly 'why' it gained strength, because with MC playouts I often  
find results during testing highly counter-intuitive, occasionally to  
the point of being (seemingly) nonsensical.


I also think what Don was proposing with his reference-bot could be  
interesting. Trying to make it play around ELO 1700 on CGOS just  
using 5,000 (light) playouts. I don't know if it's possible, but I  
think it's a fruitful exercise. At a time where most people are  
looking at using more and more hardware to increase playing-strength,  
knowing what plays best at the other end of the spectrum is valuable  
as well. With that I mean, finding what plays best using severely  
constrained resources.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.
- George

On Sun, Nov 16, 2008 at 10:39 PM, Mark Boon [EMAIL PROTECTED] wrote:
 Some months ago I did several experiments with using tactics and patterns in
 playouts. Generally I found a big boost in strength using tactics. I also
 found a boost in strength using patterns but with a severe diminishing
 return after a certain number and even becoming detrimental when using large
 number of patterns (1,000s to 10,000s). Since I was using a generalized
 pattern-matcher, using it slowed things down considerably. Although it
 played a lot better with the same number of playouts, if I compared MC
 playouts with patterns to a MC playout without patterns using the same
 amount of CPU time the gain was not so obvious. Since most of the gain in
 strength was gained by just a few patterns I concluded just as David that it
 was probably better to just use a handful of hard-coded patterns during
 playouts.

 I only recently started to do real experiments with hard-coded patterns and
 so far my results are rather inconclusive. I found when mixing different
 things it's not always clear what contributes to any increased strength
 observed. So I'm still in the process of trying to dissect what is actually
 contributing where. I found for example that a lot of the increased level of
 play using patterns does not come from using them during playouts but comes
 from the effect they have on move-exploration. I don't know if this is due
 to my particular way of implementing MC playouts in combination with UCT
 search, but moves matching a pattern (usually) automatically make it first
 in the tree-expansion as well and generally I can say so far I'm observing
 that most of the increase in level comes from the selection during
 exploration and only in small part from the selection during simulation.

 For example, in one particular experiment using just 5 patterns I saw a
 win-rate of 65% against the same program not using patterns (with the same
 number of playouts). But when not using the patterns during exploration saw
 the win-rate drop to just 55%.

 I still have a lot of testing to do and it's too early to draw any hard
 conclusions. But I think it's worthwhile trying to distinguish where the
 strength is actually gained. Better yet, finding out exactly 'why' it gained
 strength, because with MC playouts I often find results during testing
 highly counter-intuitive, occasionally to the point of being (seemingly)
 nonsensical.

 I also think what Don was proposing with his reference-bot could be
 interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000
 (light) playouts. I don't know if it's possible, but I think it's a fruitful
 exercise. At a time where most people are looking at using more and more
 hardware to increase playing-strength, knowing what plays best at the other
 end of the spectrum is valuable as well. With that I mean, finding what
 plays best using severely constrained resources.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some  
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
I look forward to hearing more!  Happy testing.
- George

On Sun, Nov 16, 2008 at 11:53 PM, Mark Boon [EMAIL PROTECTED] wrote:

 On 17-nov-08, at 02:42, George Dahl wrote:

 So you say that: ...I'm observing that most of the increase in level
 comes from the selection during exploration and only in small part
 from the selection during simulation., could you elaborate at all?
 This is very interesting.  That almost suggests it might be fruitful
 to use the patterns in the tree, but keep lighter playouts.

 That's exactly what it's suggesting. But as I said, I need to do some more
 testing to make a hard case for that.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Michael Williams

It seems move selection in the playouts should be very random at first and more 
deterministic toward the end of the playout.  Has anyone tried that?


Mark Boon wrote:


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some 
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/