Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-07 Thread Erik van der Werf
Hmm, sounds like I should experiment some more. Thanks!

Erik


On Thu, Feb 7, 2008 at 1:11 AM, Hideki Kato <[EMAIL PROTECTED]> wrote:
> Hi Erik,
>
>  My program is very based on MoGo's report and the paper.  Yes, I
>  used FPU of 1.15.
>
>  -Hideki
>
>  Erik van der Werf: <[EMAIL PROTECTED]>:
>
>
> >Hi Hideki,
>  >
>  >Your results look similar to those of Mogo as reported in their icml
>  >paper. When you ran this experiment, did you use anything like FPU or
>  >progressive widening, or did you use Levente's original design which
>  >always selects unvisited moves first?
>  >
>  >Regards,
>  >Erik
>  >
>  >
>  >On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote:
>  >> I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
>  >>  komi 7.5, 3000 playouts/move, 2000 games match:
>  >>
>  >>  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
>  >>  With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)
>  >>
>  >>  Though this includes some other improvements, most come from RAVE.
>  >>  Unlike MoGo, my best 'K' was 1000.
>  >>
>  >>  Following is my implementation of RAVE for GGMC v2r6.
>  >>  1) Each playout returns the score and all moves with colors played.
>  >>  2) While back-propagating the value (degitized score), computes the
>  >>  mean and the variance according to UCB1 and do the same for RAVE
>  >>  seperatelly.  For RAVE, the values of all (legal) moves, except played
>  >>  one, in a node are updated.
>  >>  3) In the computation of values for RAVE, the point is that there
>  >>  appeares three colors (as someone, I remember GCP, mentioned before).
>  >>  If the players' colors aren't the same then skip.  Count the value as
>  >>  is or negate (1 - score, for me), depending on the color of the player
>  >>  at the position and the color for the score.
>  >>  4) Before back-propagating the value of each playout, I setup a color
>  >>  table for all intersections of the board for speed-up, in fact
>  >>  (initialized with EMPTY). That is, fill the board (table[move] =
>  >>  color) by tracing the moves and the colors returned by the playout
>  >>  forward (from leaf node to end of the game). Then, by tracing the
>  >>  path from root to the leaf node, clear the table[move] (table[move] =
>  >>  EMPTY), in order to avoid duplicate counting with UCB1.
>  >>  5) While descending the tree, merge the values come from UCB1 and
>  >>  RAVE with 'K' according to the formula in the paper.
>  >>
>  >>  #Though I'm writing this by reading my source code, this description
>  >>  may include some errors.
>  >>
>  >>  Hope this helps,
>  >>
>  >> Hideki
>  >>
>  >>  Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>  >>
>  >> >> I also implemented RAVE in Mango. There was a few points of 
> improvements
>  >>  >> (around 60 Elo points with gnugo as reference), but as much as in the
>  >>  >> paper of Gelly and Silver :( (around 250 Elo points if I remember 
> well)
>  >>  >>
>  >>  >> It might be that the effect of RAVE depends a lot on the simulation
>  >>  >> strategy. Indeed, sometimes my RAVE was playing very good moves but 
> also
>  >>  >> very bad ones.
>  >>  >
>  >>  >I don't think the simulation strategy is the key.
>  >>  >
>  >>  >I suspect the improvement is largest when you don't do progressive 
> widening.
>  >>  >
>  >>  >Nevertheless it would be quite interesting to see the implementation
>  >>  >details of ggmc's RAVE. RAVE performance is quite dependent on exact
>  >>  >implementation and parameters.
>  >>  --
>  >>
>  >> [EMAIL PROTECTED] (Kato)
>  >>  ___
>  >>
>  >>
>  >> computer-go mailing list
>  >>  computer-go@computer-go.org
>  >>  http://www.computer-go.org/mailman/listinfo/computer-go/
>  >>
>  >___
>  >computer-go mailing list
>  >computer-go@computer-go.org
>  >http://www.computer-go.org/mailman/listinfo/computer-go/
>  --
>  [EMAIL PROTECTED] (Kato)
>  ___
>  computer-go mailing list
>  computer-go@computer-go.org
>  http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato
Hi Erik,

My program is very based on MoGo's report and the paper.  Yes, I 
used FPU of 1.15.

-Hideki

Erik van der Werf: <[EMAIL PROTECTED]>:
>Hi Hideki,
>
>Your results look similar to those of Mogo as reported in their icml
>paper. When you ran this experiment, did you use anything like FPU or
>progressive widening, or did you use Levente's original design which
>always selects unvisited moves first?
>
>Regards,
>Erik
>
>
>On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote:
>> I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
>>  komi 7.5, 3000 playouts/move, 2000 games match:
>>
>>  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
>>  With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)
>>
>>  Though this includes some other improvements, most come from RAVE.
>>  Unlike MoGo, my best 'K' was 1000.
>>
>>  Following is my implementation of RAVE for GGMC v2r6.
>>  1) Each playout returns the score and all moves with colors played.
>>  2) While back-propagating the value (degitized score), computes the
>>  mean and the variance according to UCB1 and do the same for RAVE
>>  seperatelly.  For RAVE, the values of all (legal) moves, except played
>>  one, in a node are updated.
>>  3) In the computation of values for RAVE, the point is that there
>>  appeares three colors (as someone, I remember GCP, mentioned before).
>>  If the players' colors aren't the same then skip.  Count the value as
>>  is or negate (1 - score, for me), depending on the color of the player
>>  at the position and the color for the score.
>>  4) Before back-propagating the value of each playout, I setup a color
>>  table for all intersections of the board for speed-up, in fact
>>  (initialized with EMPTY). That is, fill the board (table[move] =
>>  color) by tracing the moves and the colors returned by the playout
>>  forward (from leaf node to end of the game). Then, by tracing the
>>  path from root to the leaf node, clear the table[move] (table[move] =
>>  EMPTY), in order to avoid duplicate counting with UCB1.
>>  5) While descending the tree, merge the values come from UCB1 and
>>  RAVE with 'K' according to the formula in the paper.
>>
>>  #Though I'm writing this by reading my source code, this description
>>  may include some errors.
>>
>>  Hope this helps,
>>
>> Hideki
>>
>>  Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>>
>> >> I also implemented RAVE in Mango. There was a few points of improvements
>>  >> (around 60 Elo points with gnugo as reference), but as much as in the
>>  >> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>>  >>
>>  >> It might be that the effect of RAVE depends a lot on the simulation
>>  >> strategy. Indeed, sometimes my RAVE was playing very good moves but also
>>  >> very bad ones.
>>  >
>>  >I don't think the simulation strategy is the key.
>>  >
>>  >I suspect the improvement is largest when you don't do progressive 
>> widening.
>>  >
>>  >Nevertheless it would be quite interesting to see the implementation
>>  >details of ggmc's RAVE. RAVE performance is quite dependent on exact
>>  >implementation and parameters.
>>  --
>>
>> [EMAIL PROTECTED] (Kato)
>>  ___
>>
>>
>> computer-go mailing list
>>  computer-go@computer-go.org
>>  http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>___
>computer-go mailing list
>computer-go@computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Sylvain Gelly
Hi Erik,

> In the ICML version of UCT without RAVE, you did not use your First
> Play Urgency, right?
>
> I think that using FPU has an effect similar to what others reported
> with their progressive widening. From what I've seen it looks like
> plain UCT, without FPU or progressive widening, has more to gain from
> RAVE.
>
> Am I wrong to assume that the strongest version of Mogo before you
> introduced RAVE used something like FPU or progressive widening?

You are right, but the effect of terms which were before is by far
negligeable. It may definitely be possible that stronger use of
knowledge like the progressive widening make the effect of RAVE much
less interesting. It was not our case at that time. I am also not sure
that it is the main reason of failure for people who tried and report
a small improvement. Of course, there can be so many reasons that it
is difficult to find out without looking at the implementation in
details :/.

Best,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf
Hi Hideki,

Your results look similar to those of Mogo as reported in their icml
paper. When you ran this experiment, did you use anything like FPU or
progressive widening, or did you use Levente's original design which
always selects unvisited moves first?

Regards,
Erik


On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote:
> I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9,
>  komi 7.5, 3000 playouts/move, 2000 games match:
>
>  Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
>  With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)
>
>  Though this includes some other improvements, most come from RAVE.
>  Unlike MoGo, my best 'K' was 1000.
>
>  Following is my implementation of RAVE for GGMC v2r6.
>  1) Each playout returns the score and all moves with colors played.
>  2) While back-propagating the value (degitized score), computes the
>  mean and the variance according to UCB1 and do the same for RAVE
>  seperatelly.  For RAVE, the values of all (legal) moves, except played
>  one, in a node are updated.
>  3) In the computation of values for RAVE, the point is that there
>  appeares three colors (as someone, I remember GCP, mentioned before).
>  If the players' colors aren't the same then skip.  Count the value as
>  is or negate (1 - score, for me), depending on the color of the player
>  at the position and the color for the score.
>  4) Before back-propagating the value of each playout, I setup a color
>  table for all intersections of the board for speed-up, in fact
>  (initialized with EMPTY). That is, fill the board (table[move] =
>  color) by tracing the moves and the colors returned by the playout
>  forward (from leaf node to end of the game). Then, by tracing the
>  path from root to the leaf node, clear the table[move] (table[move] =
>  EMPTY), in order to avoid duplicate counting with UCB1.
>  5) While descending the tree, merge the values come from UCB1 and
>  RAVE with 'K' according to the formula in the paper.
>
>  #Though I'm writing this by reading my source code, this description
>  may include some errors.
>
>  Hope this helps,
>
> Hideki
>
>  Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>
> >> I also implemented RAVE in Mango. There was a few points of improvements
>  >> (around 60 Elo points with gnugo as reference), but as much as in the
>  >> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>  >>
>  >> It might be that the effect of RAVE depends a lot on the simulation
>  >> strategy. Indeed, sometimes my RAVE was playing very good moves but also
>  >> very bad ones.
>  >
>  >I don't think the simulation strategy is the key.
>  >
>  >I suspect the improvement is largest when you don't do progressive widening.
>  >
>  >Nevertheless it would be quite interesting to see the implementation
>  >details of ggmc's RAVE. RAVE performance is quite dependent on exact
>  >implementation and parameters.
>  --
>
> [EMAIL PROTECTED] (Kato)
>  ___
>
>
> computer-go mailing list
>  computer-go@computer-go.org
>  http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato

Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>Hideki Kato wrote:
>
>> 4) Before back-propagating the value of each playout, I setup a color 
>> table for all intersections of the board for speed-up, in fact 
>> (initialized with EMPTY). That is, fill the board (table[move] = 
>> color) by tracing the moves and the colors returned by the playout 
>> forward (from leaf node to end of the game). Then, by tracing the 
>> path from root to the leaf node, clear the table[move] (table[move] = 
>> EMPTY), in order to avoid duplicate counting with UCB1.
>
>I don't understand this. What and how would you be double counting?

I mean the values of such moves are updated in both UCB and RAVE.  
That is, the moves in the path are updated by UCB and all moves in 
the nodes in the path are updated by RAVE.  As UCB values and RAVE 
values will be averaged later, perhaps I thought not updating the 
values of such moves in RAVE would be _natural_.

As this code was written last Oct. and I've been working on other 
staffs, I'm not sure I remember the idea correctly.  But I believe 
this improved some.

-Hideki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Jason House
On Feb 6, 2008 11:39 AM, Gian-Carlo Pascutto <[EMAIL PROTECTED]> wrote:

> Hideki Kato wrote:
>
> > 4) Before back-propagating the value of each playout, I setup a color
> > table for all intersections of the board for speed-up, in fact
> > (initialized with EMPTY). That is, fill the board (table[move] =
> > color) by tracing the moves and the colors returned by the playout
> > forward (from leaf node to end of the game). Then, by tracing the
> > path from root to the leaf node, clear the table[move] (table[move] =
> > EMPTY), in order to avoid duplicate counting with UCB1.
>
> I don't understand this. What and how would you be double counting?


Let's a set of stones played in the playout get captured and then the void
gets filled again.  The key is to count only the first move at a set board
position.  If you don't, you can end up counting a particular move more than
once.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Gian-Carlo Pascutto

Hideki Kato wrote:

4) Before back-propagating the value of each playout, I setup a color 
table for all intersections of the board for speed-up, in fact 
(initialized with EMPTY). That is, fill the board (table[move] = 
color) by tracing the moves and the colors returned by the playout 
forward (from leaf node to end of the game). Then, by tracing the 
path from root to the leaf node, clear the table[move] (table[move] = 
EMPTY), in order to avoid duplicate counting with UCB1.


I don't understand this. What and how would you be double counting?

--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf
Hi Sylvain,

On Wed, Feb 6, 2008 at 4:41 PM, Sylvain Gelly <[EMAIL PROTECTED]> wrote:
>  Do you want to come? ;)

Well, I have a nice job, but one can always try to make me an offer I
can't refuse ;-)

> I just can't do too many things at the same  time.

Sounds familiar...

>  > Well, since you say the improvement is marginal on 9x9 then I think we
>  > are actually in agreement. I also get an improvement, but it's just
>  > not that much. When I wrote 'spectacular' I meant the reported jump
>  > from 25% to over 55% winrate against gnugo.
>
>  I am sorry, I was unclear. When I said marginal in 9x9, I was talking
>  of the differences between the current MoGo (well, at least when I
>  left it 6 months ago :)) and the algorithm described in the ICML
>  paper. The change is only on how to balance the two values you get
>  (UCT and Rave).
>  Rave in 9x9, as described in the paper, gives a big jump in
>  performance, and the numbers reported in the paper are accurate: those
>  are computed with many thousands games against gnugo and I carefully
>  did the experiments multiple times.
>  In addition, Hideki for example report the same order of improvements.
>  I have to point out that it is really easy to make a mistake in the
>  updates making Rave much less interesting. I am definitely not saying
>  that you or anyone else made a mistake, but it can just happen,
>  sometimes :).

In the ICML version of UCT without RAVE, you did not use your First
Play Urgency, right?

I think that using FPU has an effect similar to what others reported
with their progressive widening. From what I've seen it looks like
plain UCT, without FPU or progressive widening, has more to gain from
RAVE.

Am I wrong to assume that the strongest version of Mogo before you
introduced RAVE used something like FPU or progressive widening?

Best,
Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Sylvain Gelly
Hi Erik,

> Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-)
I like it very much thanks. Do you want to come? ;). I miss computer
go and all of you, but I read the list (not all) time to time, so I
have some remembers :). I just can't do too many things at the same
time.

> Well, since you say the improvement is marginal on 9x9 then I think we
> are actually in agreement. I also get an improvement, but it's just
> not that much. When I wrote 'spectacular' I meant the reported jump
> from 25% to over 55% winrate against gnugo.

I am sorry, I was unclear. When I said marginal in 9x9, I was talking
of the differences between the current MoGo (well, at least when I
left it 6 months ago :)) and the algorithm described in the ICML
paper. The change is only on how to balance the two values you get
(UCT and Rave).
Rave in 9x9, as described in the paper, gives a big jump in
performance, and the numbers reported in the paper are accurate: those
are computed with many thousands games against gnugo and I carefully
did the experiments multiple times.
In addition, Hideki for example report the same order of improvements.
I have to point out that it is really easy to make a mistake in the
updates making Rave much less interesting. I am definitely not saying
that you or anyone else made a mistake, but it can just happen,
sometimes :).

> >  That could be an explanation, but there are two points:
> >  - the prior you put on top of Rave often avoid to first sample 1-1,
> >  and even when you do, you very often loose just 1 playout because of
> >  the UCT value you get right away.
>
> Yes, using more prior knowledge will probably reduce the problem.

You don't even need "more", already save atari, atari, hane and cut
gives you enough in many cases to avoid 1-1. Of course more good
knowledge is better :). My point is just that you don't need so much
to avoid to try first 1-1. And even without prior knowledge that is
not terrible. If you see at the numbers in the paper, adding this very
simple prior knowledge does not improve so much.


> >  - I never observed a big discrepancy between the number of Rave
> >  samples for each move.
>
> I guess this is because your playout policy is more uniform than mine?
> The problem tends to disappear with uniform random playouts.
> My program has some hard-reject patterns to discard moves that are
> strictly inferior to adjacent alternatives, so in some situations I
> can easily get a large difference between the number of Rave samples
> for each move.

It is definitely a sensible hypothesis.

Best,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato
I found some data.  GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, 
komi 7.5, 3000 playouts/move, 2000 games match:

Without RAVE:   winning rate was 23.1 +- 0.9% (-209 +- 9 ELO)
With RAVE:  winning rate was 65.3 +- 1.1% (+110 +- 8 ELO)

Though this includes some other improvements, most come from RAVE.  
Unlike MoGo, my best 'K' was 1000.

Following is my implementation of RAVE for GGMC v2r6.
1) Each playout returns the score and all moves with colors played.
2) While back-propagating the value (degitized score), computes the 
mean and the variance according to UCB1 and do the same for RAVE 
seperatelly.  For RAVE, the values of all (legal) moves, except played 
one, in a node are updated.
3) In the computation of values for RAVE, the point is that there 
appeares three colors (as someone, I remember GCP, mentioned before).  
If the players' colors aren't the same then skip.  Count the value as 
is or negate (1 - score, for me), depending on the color of the player 
at the position and the color for the score.
4) Before back-propagating the value of each playout, I setup a color 
table for all intersections of the board for speed-up, in fact 
(initialized with EMPTY). That is, fill the board (table[move] = 
color) by tracing the moves and the colors returned by the playout 
forward (from leaf node to end of the game). Then, by tracing the 
path from root to the leaf node, clear the table[move] (table[move] = 
EMPTY), in order to avoid duplicate counting with UCB1.
5) While descending the tree, merge the values come from UCB1 and 
RAVE with 'K' according to the formula in the paper.

#Though I'm writing this by reading my source code, this description 
may include some errors.

Hope this helps,
Hideki

Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>> I also implemented RAVE in Mango. There was a few points of improvements
>> (around 60 Elo points with gnugo as reference), but as much as in the
>> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>>
>> It might be that the effect of RAVE depends a lot on the simulation
>> strategy. Indeed, sometimes my RAVE was playing very good moves but also
>> very bad ones.
>
>I don't think the simulation strategy is the key.
>
>I suspect the improvement is largest when you don't do progressive widening.
>
>Nevertheless it would be quite interesting to see the implementation
>details of ggmc's RAVE. RAVE performance is quite dependent on exact
>implementation and parameters.
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf
Hi Sylvain,

Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-)


On Wed, Feb 6, 2008 at 2:20 PM, Sylvain Gelly <[EMAIL PROTECTED]> wrote:
>  > (1) They compared Rave to plain UCT. If they would have compared it to
>  > a more sophisticated implementation (like the best Mogo before Rave)
>  > they probably could not have shown a spectacular improvement.
>
>  The best Mogo before Rave was very close to plain UCT with the
>  sequence-like simulations. And indeed we exactly compared the best
>  Mogo before and after Rave. There is a table (I don't remember which
>  number), which show the incremental improvements from plain UCT, to
>  Rave, passing by plain UCT with sequence-like simulations. All
>  experiments have been done with MoGo's code, all other parts of the
>  code staying constant. There were not "secret part" of MoGo disabled
>  to make the improvement of Rave more interesting.
>
>  One discrepancy between our results and the one some of you observe,
>  as Gian-Carlo stated, is likely to come from the parameters and detail
>  of implementation. We heavily tuned those parameters and details
>  against gnugo, and that makes quite a big difference. I chatted more
>  closely with some of you about details and it did make a difference.
>  Maybe some of you can share what made a change, if you want.
>
>  Note as well that the current implementation of MoGo (not the one at
>  the time of the ICML paper) use a different tradeoff between UCT and
>  Rave value, thanks to an idea of David Silver, which brought
>  improvements in 19x19 (where the Rave values are the most useful),
>  while it was marginal (still better) in 9x9. But anyway we here are
>  talking about 9x9, so it can't explain what you are talking about.

Well, since you say the improvement is marginal on 9x9 then I think we
are actually in agreement. I also get an improvement, but it's just
not that much. When I wrote 'spectacular' I meant the reported jump
from 25% to over 55% winrate against gnugo. I only got such an
improvement when I first dumbed down my implementation to a plain UCT
without a good move-ordering (and always expanding all unvisited nodes
first).


>  > (2) () Depending on the playout
>
> > policy, adding an upper confidence bound to the rave values can push
>  > some terrible bad moves up (like playing on 1-1). The reason seems to
>  > be that such moves are normally sampled very infrequently (so the UCB
>  > will be higher), and when they are selected (...)
>
>  That could be an explanation, but there are two points:
>  - the prior you put on top of Rave often avoid to first sample 1-1,
>  and even when you do, you very often loose just 1 playout because of
>  the UCT value you get right away.

Yes, using more prior knowledge will probably reduce the problem.


>  - I never observed a big discrepancy between the number of Rave
>  samples for each move.

I guess this is because your playout policy is more uniform than mine?
The problem tends to disappear with uniform random playouts.
My program has some hard-reject patterns to discard moves that are
strictly inferior to adjacent alternatives, so in some situations I
can easily get a large difference between the number of Rave samples
for each move.

Best,
Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Sylvain Gelly
Hi Erik,

> (1) They compared Rave to plain UCT. If they would have compared it to
> a more sophisticated implementation (like the best Mogo before Rave)
> they probably could not have shown a spectacular improvement.

The best Mogo before Rave was very close to plain UCT with the
sequence-like simulations. And indeed we exactly compared the best
Mogo before and after Rave. There is a table (I don't remember which
number), which show the incremental improvements from plain UCT, to
Rave, passing by plain UCT with sequence-like simulations. All
experiments have been done with MoGo's code, all other parts of the
code staying constant. There were not "secret part" of MoGo disabled
to make the improvement of Rave more interesting.

One discrepancy between our results and the one some of you observe,
as Gian-Carlo stated, is likely to come from the parameters and detail
of implementation. We heavily tuned those parameters and details
against gnugo, and that makes quite a big difference. I chatted more
closely with some of you about details and it did make a difference.
Maybe some of you can share what made a change, if you want.

Note as well that the current implementation of MoGo (not the one at
the time of the ICML paper) use a different tradeoff between UCT and
Rave value, thanks to an idea of David Silver, which brought
improvements in 19x19 (where the Rave values are the most useful),
while it was marginal (still better) in 9x9. But anyway we here are
talking about 9x9, so it can't explain what you are talking about.

> (2) () Depending on the playout
> policy, adding an upper confidence bound to the rave values can push
> some terrible bad moves up (like playing on 1-1). The reason seems to
> be that such moves are normally sampled very infrequently (so the UCB
> will be higher), and when they are selected (...)

That could be an explanation, but there are two points:
- the prior you put on top of Rave often avoid to first sample 1-1,
and even when you do, you very often loose just 1 playout because of
the UCT value you get right away.
- I never observed a big discrepancy between the number of Rave
samples for each move.

Best,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Erik van der Werf
Hi Guillaume,

I think we talked about this before, but others may be interested as
well. In my opinion the ICML paper on Rave has several weaknesses.
It's been a while since I read the paper, but here are some I
remember:

(1) They compared Rave to plain UCT. If they would have compared it to
a more sophisticated implementation (like the best Mogo before Rave)
they probably could not have shown a spectacular improvement.

(2) The paper suggest that you should add an upper confidence bound to
the rave values. Although they didn't actually make this explicit,
because IIRC the paper failed to provide an actual value for the
constant c, I guess most people naturally assume it to be positive.
However, Rave is already a greedy heuristic, so if anything you should
probably subtract the confidence bound. Depending on the playout
policy, adding an upper confidence bound to the rave values can push
some terrible bad moves up (like playing on 1-1). The reason seems to
be that such moves are normally sampled very infrequently (so the UCB
will be higher), and when they are selected (e.g. for a big capture
deep in the playout) they correlate more with winning (so the value
will also be higher) without generalizing to earlier positions.

Best,
Erik



On Wed, Feb 6, 2008 at 10:47 AM, Chaslot G (MICC)
<[EMAIL PROTECTED]> wrote:
> I also implemented RAVE in Mango. There was a few points of improvements 
> (around 60 Elo points with gnugo as reference), but as much as in the paper 
> of Gelly and Silver :( (around 250 Elo points if I remember well)
>
>  It might be that the effect of RAVE depends a lot on the simulation 
> strategy. Indeed, sometimes my RAVE was playing very good moves but also very 
> bad ones.
>
>  Guillaume
>
>  -Original Message-
>  From: [EMAIL PROTECTED] on behalf of Magnus Persson
>  Sent: Wed 06/02/2008 00:42
>  To: computer-go@computer-go.org
>  Subject: Re: [computer-go] More UCT / Monte-Carlo questions
>
>  Quoting Gunnar Farnebäck <[EMAIL PROTECTED]>:
>
>  > I have never managed to implement RAVE successfully. It made my
>  > program significantly slower but no stronger even at a fixed number of
>  > simulations.
>
>  I get a small effect from RAVE, My rationalisation is that if the
>  program is rich with other features to improve performance RAVE may
>  not add that much.
>
>  -Magnus
>
>
>  ___
>  computer-go mailing list
>  computer-go@computer-go.org
>  http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
> ___
>  computer-go mailing list
>  computer-go@computer-go.org
>  http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Hideki Kato
You are right.  I don't do progressive widening.

-Hideki

Gian-Carlo Pascutto: <[EMAIL PROTECTED]>:
>> I also implemented RAVE in Mango. There was a few points of improvements
>> (around 60 Elo points with gnugo as reference), but as much as in the
>> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>>
>> It might be that the effect of RAVE depends a lot on the simulation
>> strategy. Indeed, sometimes my RAVE was playing very good moves but also
>> very bad ones.
>
>I don't think the simulation strategy is the key.
>
>I suspect the improvement is largest when you don't do progressive widening.
>
>Nevertheless it would be quite interesting to see the implementation
>details of ggmc's RAVE. RAVE performance is quite dependent on exact
>implementation and parameters.
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Gian-Carlo Pascutto
> I also implemented RAVE in Mango. There was a few points of improvements
> (around 60 Elo points with gnugo as reference), but as much as in the
> paper of Gelly and Silver :( (around 250 Elo points if I remember well)
>
> It might be that the effect of RAVE depends a lot on the simulation
> strategy. Indeed, sometimes my RAVE was playing very good moves but also
> very bad ones.

I don't think the simulation strategy is the key.

I suspect the improvement is largest when you don't do progressive widening.

Nevertheless it would be quite interesting to see the implementation
details of ggmc's RAVE. RAVE performance is quite dependent on exact
implementation and parameters.

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)

2008-02-06 Thread Chaslot G (MICC)
I also implemented RAVE in Mango. There was a few points of improvements 
(around 60 Elo points with gnugo as reference), but as much as in the paper of 
Gelly and Silver :( (around 250 Elo points if I remember well)

It might be that the effect of RAVE depends a lot on the simulation strategy. 
Indeed, sometimes my RAVE was playing very good moves but also very bad ones.

Guillaume

-Original Message-
From: [EMAIL PROTECTED] on behalf of Magnus Persson
Sent: Wed 06/02/2008 00:42
To: computer-go@computer-go.org
Subject: Re: [computer-go] More UCT / Monte-Carlo questions
 
Quoting Gunnar Farnebäck <[EMAIL PROTECTED]>:

> I have never managed to implement RAVE successfully. It made my
> program significantly slower but no stronger even at a fixed number of
> simulations.

I get a small effect from RAVE, My rationalisation is that if the  
program is rich with other features to improve performance RAVE may  
not add that much.

-Magnus


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

<>___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/