Re: Re[4]: [computer-go] UCT vs MC

2007-02-22 Thread Sylvain Gelly

One more question. Line 23 states: for i:=node.size()-2 to 0 do. The leaf node 
should be stored in node[node.size()-1], so why do we start at node.size()-2? 
Is it not necessary to update the value of the leaf node?


i:=node.size()-1 would be better you're right :-).

Experiments made by people in this list (Don if I remember correctly),
showed that you even don't have to create the leaf if the parent node
has less than a few simulations. 100 simulations seemed to be already
safe, and much faster.
So the modification in the algorithm, is to stop the descend when the
number of simulations of the node is  threshold rather than ending
on a unseen node.

Sylvain


-Original Message-
From: Don Dailey [EMAIL PROTECTED]
To: Dmitry Kamenetsky [EMAIL PROTECTED], computer-go 
computer-go@computer-go.org
Date: Wed, 21 Feb 2007 12:54:43 -0500
Subject: Re: Re[2]: [computer-go] UCT vs MC


 On Wed, 2007-02-21 at 16:56 +0300, Dmitry Kamenetsky wrote:
  Thank you for your answer. However, I am even more confused now. I
  understand that - is for negamax, but I don't understand why it
  became 1-. I am trying to implement your algorithm and I just want
  to know what lines 7, 16 and 26 should be?

 I'm not sure this is what you are looking for, but in negamax,  scores
 can be negative or positive.   The scores are always adjusted so that
 you can view positive numbers as good and negative as bad from the
 point of view you are referencing.   So to get the score from the
 other
 point of view you simple negate it.

 But in UCT, we don't deal with negative numbers.  A score is between
 0 and 1,  so 0.001 is almost losing and 0.999 is almost winning for
 example.

 To change 0.99 to the other players point of view in this system, where
 scores must be between 0 and 1,  you must negate it and add 1.   So 0.99
 becomes:   1 - 0.99 = 0.01

 I hope that is what you are asking about and  that this explains it.

 - Don




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re[2]: [computer-go] UCT vs MC

2007-02-21 Thread Dmitry Kamenetsky
Thank you for your answer. However, I am even more confused now. I understand 
that - is for negamax, but I don't understand why it became 1-. I am trying 
to implement your algorithm and I just want to know what lines 7, 16 and 26 
should be?


-Original Message-
From: Sylvain Gelly [EMAIL PROTECTED]
To: Dmitry Kamenetsky [EMAIL PROTECTED]
Date: Wed, 21 Feb 2007 11:03:08 +0100
Subject: Re: [computer-go] UCT vs MC

 
 Hello Dmitry,
 
 
  Your code says that the value is backed up by sum and negation (line 26,
   value := -value).  But I don't see any negative values in your sample
  tree,
   or values greater than one.  How do you actually back up values to the
   root?
  Sorry, it is value := 1-value. Thank you for pointing out the mistake.
 
  I am confused about value. What is it actually storing? I thought
  node[i].value stores the number of wins (for Black) for node i. Then why
  some of the values in Figure 1 not integer?
 
  If line 26 is now value := 1-value, then should some of the other lines
  also change? For example should line 7 be updateValue(node,
  1-node[i].value), and line 16 be else v[i]:= (1-node.childNode
  [i].value)/node.childNode[i].nb+sqrt(...)?
 
 
 You're right there were some confusion :-).
 In fact it is very simple. The - is here because it is negamax and not
 minimax, so that you can always take the max of the value (but the value is
 negated every 2 levels). The value stored then corresponds to the value of
 the player to play in the node.
 It seems that node[i].value indeed keeps the number of wins for the player
 to play in node i. the 1- does not exist.
 In Figure 1, it is an example of UCT in general case, where the reward is
 not always in [0,1]. And the values displayed in the nodes are the averages.
 So that explains the non integers and the values not in [0,1].
 
 
 
  Can you also update all the changes in your report? Thank you.
 
 
 I'll try to find sometime to do that. Can't tell it will be soon though.
 
 Regards,
 Sylvain
 
 
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: Re[2]: [computer-go] UCT vs MC

2007-02-21 Thread Sylvain Gelly


Thank you for your answer. However, I am even more confused now. I
understand that - is for negamax, but I don't understand why it became
1-. I am trying to implement your algorithm and I just want to know what
lines 7, 16 and 26 should be?



It became a 1- because I said a mistake while answering. The 1 would
have been here only to keep values always between 0 and 1 (instead of [0,1]
if black or [-1,0] if white), IF value was the average win and not the
total win. So my fault, sorry :-/.

Is that make things clearer?

Sylvain




-Original Message-

From: Sylvain Gelly [EMAIL PROTECTED]
To: Dmitry Kamenetsky [EMAIL PROTECTED]
Date: Wed, 21 Feb 2007 11:03:08 +0100
Subject: Re: [computer-go] UCT vs MC


 Hello Dmitry,


  Your code says that the value is backed up by sum and negation (line
26,
   value := -value).  But I don't see any negative values in your
sample
  tree,
   or values greater than one.  How do you actually back up values to
the
   root?
  Sorry, it is value := 1-value. Thank you for pointing out the
mistake.
 
  I am confused about value. What is it actually storing? I thought
  node[i].value stores the number of wins (for Black) for node i. Then
why
  some of the values in Figure 1 not integer?
 
  If line 26 is now value := 1-value, then should some of the other
lines
  also change? For example should line 7 be updateValue(node,
  1-node[i].value), and line 16 be else v[i]:= (1-node.childNode
  [i].value)/node.childNode[i].nb+sqrt(...)?


 You're right there were some confusion :-).
 In fact it is very simple. The - is here because it is negamax and not
 minimax, so that you can always take the max of the value (but the value
is
 negated every 2 levels). The value stored then corresponds to the value
of
 the player to play in the node.
 It seems that node[i].value indeed keeps the number of wins for the
player
 to play in node i. the 1- does not exist.
 In Figure 1, it is an example of UCT in general case, where the reward
is
 not always in [0,1]. And the values displayed in the nodes are the
averages.
 So that explains the non integers and the values not in [0,1].



  Can you also update all the changes in your report? Thank you.


 I'll try to find sometime to do that. Can't tell it will be soon though.

 Regards,
 Sylvain



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: Re[2]: [computer-go] UCT vs MC

2007-02-21 Thread Don Dailey
On Wed, 2007-02-21 at 16:56 +0300, Dmitry Kamenetsky wrote:
 Thank you for your answer. However, I am even more confused now. I
 understand that - is for negamax, but I don't understand why it
 became 1-. I am trying to implement your algorithm and I just want
 to know what lines 7, 16 and 26 should be?

I'm not sure this is what you are looking for, but in negamax,  scores
can be negative or positive.   The scores are always adjusted so that
you can view positive numbers as good and negative as bad from the
point of view you are referencing.   So to get the score from the
other
point of view you simple negate it.

But in UCT, we don't deal with negative numbers.  A score is between
0 and 1,  so 0.001 is almost losing and 0.999 is almost winning for
example.

To change 0.99 to the other players point of view in this system, where
scores must be between 0 and 1,  you must negate it and add 1.   So 0.99
becomes:   1 - 0.99 = 0.01  

I hope that is what you are asking about and  that this explains it.

- Don
 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] UCT vs MC

2007-02-20 Thread Dmitry Kamenetsky
Hi Sylvain,

 Your code says that the value is backed up by sum and negation (line 26,
 value := -value).  But I don't see any negative values in your sample tree,
 or values greater than one.  How do you actually back up values to the
 root?
Sorry, it is value := 1-value. Thank you for pointing out the mistake.

I am confused about value. What is it actually storing? I thought node[i].value 
stores the number of wins (for Black) for node i. Then why some of the values 
in Figure 1 not integer?

If line 26 is now value := 1-value, then should some of the other lines also 
change? For example should line 7 be updateValue(node, 1-node[i].value), and 
line 16 be else v[i]:= 
(1-node.childNode[i].value)/node.childNode[i].nb+sqrt(...)?

Can you also update all the changes in your report? Thank you.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT vs MC

2007-01-02 Thread Magnus Persson

Quoting [EMAIL PROTECTED]:


I'm curious about the full width depth and the principal variation depth to
compare UCT wilth alpha-beta.
The comparison is not so easy to do I think, because using MC as an 
evaluation

function for alpha beta, you have to do several simulations for one
evaluation and take the average. So the question is how many simulations to
do (tradeoff false alpha-beta cuts/depth)? The right should be the
number which makes the stronger player. I did not made such experiments.
Perhaps someone did?


My old program Viking5 used alphabeta with monte carlo evaluation and 
alpha-beta

search. Valkyria uses UCT and similar code for Monte Carlo evaluation. One
cannot compared these programs directly since they do not share code. But I
would guess that alkyria would search the principle variation to 
20-100% deeper
depth than Viking5 would. The cost is that Valkyria might sometimes get 
stuck on

the second best move. In the opening the difference is probably small becuase
UCT searches quite uniform, but if there is fighting where critical stones are
very unstable the seacrh can get very deep if there are many forcing moves. I
know at least one 9x9 position where Valkyria can search forced 
sequences to 15

ply, but normally perhaps it might get 4-5 meaningful nodes deep where Viking5
would get 3 ply.

It is also tricky to compare the principal eval of UCT with alpha-beta because
near the leaf the number of visits to the nodes are less than what could be
considered sound MC-eval. When I print out principal variations I stop as
soon as the number of visits for a node is less than 1000. This means that the
moves in the pruned principle variation is of high quality.

For alpha-beta a similar effect can be seen by choosing different amounts of
simulations for the eval. Normally a few 100 simulations is necessary but one
could make 1 simulation eval that spits out very deep evals that unfortunately
is almost random.

-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] UCT vs MC

2007-01-01 Thread David Fotland
Thanks.  So it seems that doing as many random games as possible is not the
ideal approach.

In UCT, I suppose the equivalent of the principal variation would be the
path from the root that always visits the child with the highest number of
simulations.  When you make a move with 70,000 simulations, how deep is this
UCT_principal variation?  

I expect that after this many simulations the UCT tree will include every
position in some number of ply from the root.  Deeper in the tree it will
not visit every child.  Typically, how many ply in the UCT tree is full
width?

I'm curious about the full width depth and the principal variation depth to
compare UCT wilth alpha-beta.

It's great to see a new approach to computer go that works so well.  Thanks
for sharing your work.

David

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Monday, January 01, 2007 3:56 AM
 To: computer-go
 Subject: Re: [computer-go] UCT vs MC
 
 
 Hello and happy new year!
 
 
  I have some questions about your paper...
 Whouah that's a lot of questions :). I'll try to answer well.
 
 
  I thought that the Monte Carlo evaluation of a position is done by 
  making many random games from that position, and averaging the 
  win/loss result. So the evaluation of a position would be a number 
  between 0 and 1.  I thought several thousand random games would be 
  used for one evaluation.
 No, we consider one evaluation as one simulation, so the 
 evaluation function 
 is a bernoulli random variable. Now you are mainly interested in its 
 expectation, and it is why you can think making a lot of 
 simulations and 
 averaging, trying to approximate the expectation by the 
 empirical average. 
 
  In your paper, say that each UCT leaf node is evaluated by 
 exactly one 
  random game (simulation), with a result of 0 or 1.  Is this true?
 Yes it is true. We try having more that one simulation per 
 leaf node, and: -with the same number of nodes, the 
 improvement is very small; -with the same number of total 
 simulations the level is much weaker.
 
 This seems counter intuitive, but in fact it is not. At each 
 simulation we add 
 a node. So a node often visited will have a lot of 
 descendants, so will 
 average a lot of simulations. The key idea of UCT is that the 
 value in a node 
 is the average of its children's values weigthed by the 
 frequency of visits 
 for its children.
 
  I think
  you say Mogo does 70,000 random games per move.  Does this 
 mean that the
  UCT tree for a move has 70,000 nodes?  When you say 70,000 
 games per move,
  does that mean total game move made, or game per node evaluation?
 That means per MoGo's move. So yes, UCT tree for a move has 
 7 nodes. It is 
 the total number of simulations.
 I use the same count for the CGOS versions of MoGo. 
 MoGo_xxx_10k uses 1 
 simulations for each move, or if you prefer, 1 nodes in 
 its tree. That 
 means that if the CGOS game finished after 40 MoGo moves, 
 then MoGo has 
 computed 400 000 random simulations for the complete game. 
 (There is also a 
 5000 simulations per move version on CGOS).
 
 
  How many simulations (random games with patterns) does Mogo do per 
  second?
 On a P4 3.4Ghz, 4500 in 9x9, 1000 in 19x19.
 
  How do you back up values in the UCT tree?  There are values in the 
  example tree, but I can't see how they are calculated.
 
 As in the UCT algorithm. For each node for the root to the 
 leaf of the current 
 sequence you simply add the 0/1 result to a variable, and 1 
 to the count of 
 the number of simulations.
 
  Your code says that the value is backed up by sum and 
 negation (line 
  26, value := -value).  But I don't see any negative values in your 
  sample tree, or values greater than one.  How do you 
 actually back up 
  values to the root?
 Sorry, it is value := 1-value. Thank you for pointing out the mistake.
 
 
  on page 5 you say that UCB1-TUNED performs better and you use that 
  formula. In the code for the algorithm, you use UCB (line 
 16).  Which 
  is correct?
 
 Since the beginning we used UCB1-TUNED and it performed 
 better. Now with all 
 other improvements, and with a fine parameters tuning the 
 difference is very 
 small. UCB1-TUNED has the advantage that it does not need a 
 parameters tuning 
 to performs well in Go (the famous sqrt(1/10) constant Remi 
 Coulom posted in 
 this list).
 
  In your paper you show win rates against GnuGo of about 
 50%, depending 
  on the parameters.  The current Mogo beats GnuGo over 90%.  What 
  changed?  Are you doing more simulations, or do you have more go 
  knowledge in your patterns?
 
 The results near 50% was with the uniform random simulations. 
 The 80% is with 
 the improved simulations. In the current MoGo there are new 
 improvements not 
 yet published. Currently, against gnugo 3.6 level 8 with 
 7 simulations, 
 the result is 92.5%.
 MoGo which plays on tournaments makes more than

Re: [computer-go] UCT vs MC

2007-01-01 Thread Don Dailey
I seem to remember someone on this group a couple of years ago or so
saying that there won't be a 1 Dan 9x9 player anytime soon.   I don't 
remember the exact quote or who said it.   I'm looking through the 
archives but I can't find it.  I would not name the person even when
I do, but it gives me a strong feeling of Deja Vue.   

Chrilly probably remembers when the strongest chess computers were
about beginner strength despite furious attempts to make them play
strongly.

- Don






On Mon, 2007-01-01 at 01:33 +0100, John Tromp wrote:
 
 On 1/1/07, David Fotland [EMAIL PROTECTED] wrote:
 In your paper you show win rates against GnuGo of about 50%,
 depending on
 the parameters.  The current Mogo beats GnuGo over 90%.  What
 changed?  Are
 you doing more simulations, or do you have more go knowledge
 in your 
 patterns?  Does Mogo have an opening book?
 
 I spent most of yesterday on KGS playtesting MoGo on 9x9 with 30 min
 total thinking time.
 The experience was quite unlike any other program I've played on 9x9
 in the past. 
 
 As I wrote to Sylvain in a private email:
  I had a lot of fun playing MoGo today. In the first game, it played
 some nonsensical moves 
 and I got a totally won position, but MoGo turned out to be very
 inventive and led me into
 a trap:(
 That was not the last game I was to lose to MoGo. I found it much more
 challenging than
 any other program I played. It is quite resourceful. And in one game
 you'll see it play 
 a beautiful tesuji. This really makes me feel like it's only a matter
 of time till MC programs
 can challenge professionals on 9x9.
 
 I enclose 2 of the games I played. In the first, MoGo is quite
 enterprising in the opening, 
 with moves like e6. It would be very hard for an evaluation function
 to appreciate the
 potential w has for territory after black c8. But MoGo correctly
 assesses that w will
 control the right half of the board. Furthermore, it very nicely
 punishes blacks mistake 
 of playing f5 prematurely with a beautiful tesuji at d3.
 
 In the other game, Mogo plays a different atari on the 6th move,
 leading to a very
 different game. It shows good timing in playing b7 when the right
 group can fend 
 for itself and plays a nice probe at e3 to determine its followup.
 Apparently it sees
 that g2 is sente on the d2 group, preventing black from a killing
 attempt at h5.
 
 It makes a mistakeat move 30 with f7 though. Playing 
 a4 c4 a2 b3 a3 d4 a6 b3 e8 instead would have given it a win.
 Later testing with MoGo showed that it indeed was unlucky to choose
 f7,
 and prefers e8 with a bit more search.
 
 I feel that the shodan level go 9x9 programs have arrived... 
 
 regards,
 -John
 
 
 
 
 
 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] UCT vs MC

2007-01-01 Thread Don Dailey
On Mon, 2007-01-01 at 20:10 +0100, [EMAIL PROTECTED] wrote:
 
  I'm curious about the full width depth and the principal variation
 depth to
  compare UCT wilth alpha-beta.
 The comparison is not so easy to do I think, because using MC as an
 evaluation 
 function for alpha beta, you have to do several simulations for one 
 evaluation and take the average. So the question is how many
 simulations to 
 do (tradeoff false alpha-beta cuts/depth)? The right should be
 the 
 number which makes the stronger player. I did not made such
 experiments. 
 Perhaps someone did?

I did something similar.   But it takes a huge amount of horsepower to
zero in on the right value.

What seems to be the case is that you need less simulations as you
search deeper.   Of course it's always better to do more simulations
but the question is the trade-off of whether it's better to go 1 ply
deeper doing less,  or one ply less doing more. 

Unfortunately, to get the right answer requires a lot of work.  There
are other variables too such as how much cheating should you do.   You
can seriously reduce the number of simulations you do at end nodes by
stopping early when it appears unlikely your score will fall within
the alpha/beta window.You can do this by asking the question, what
would be my score if the next N games were all wins or losses?

I also discovered that it is not efficient doing too few simulations.
If you are not doing enough,  doubling the number of simulations only
increases the search effort slightly,  or in some cases it improves
it.   This is almost certainly because of move ordering issues,  very
difficult to get good move ordering with a fuzzy evaluator.   

I have a theory on how to make straight alpha beta work with monte
carlo evaluations at end nodes.   You want to use monte carlo
as an evaluation function, but you want it applied to quiet positions.
I think you need to take your end node positions before you apply monte
carlo as an evaluation and do something similar to the following:

   1. Identify clearly dead and live groups.

   2. Create a proxy position that is similar to the position you want
  to evaluate, but has been fixed-up to be less confusing to the
  monte carlo simulations.

  This could involve placing stones so that living groups are not
  touched in the simulation.  It might involve artificially killing
  dead groups so that monte carlo is not distracted killing them.

  A veto list involves identifying moves that cannot possibly help
  and telling the monte carlo simulation about these moves.   Even
  if you know the move can't possible help in the next 4 or 5 moves,
  it might be a major boost of quality to simulation.

  It might involve move suggestions, veto-tables, etc. to help the
  monte carlo search.The idea is to do anything that makes the
  final position more quiet and monte carlo search more relevant
  and doing it safely - only what you can be sure of.

   3. And your monte carlo search should have some intelligence built
  in like Mogo does,  it's not 100 percent random.

This is just an idea that hasn't been tried or tested as far as I know,
something to be taken with a grain of salt! 

- Don




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/