Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hmm, sounds like I should experiment some more. Thanks! Erik On Thu, Feb 7, 2008 at 1:11 AM, Hideki Kato <[EMAIL PROTECTED]> wrote: > Hi Erik, > > My program is very based on MoGo's report and the paper. Yes, I > used FPU of 1.15. > > -Hideki > > Erik van der Werf: <[EMAIL PROTECTED]>: > > > >Hi Hideki, > > > >Your results look similar to those of Mogo as reported in their icml > >paper. When you ran this experiment, did you use anything like FPU or > >progressive widening, or did you use Levente's original design which > >always selects unvisited moves first? > > > >Regards, > >Erik > > > > > >On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote: > >> I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, > >> komi 7.5, 3000 playouts/move, 2000 games match: > >> > >> Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) > >> With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) > >> > >> Though this includes some other improvements, most come from RAVE. > >> Unlike MoGo, my best 'K' was 1000. > >> > >> Following is my implementation of RAVE for GGMC v2r6. > >> 1) Each playout returns the score and all moves with colors played. > >> 2) While back-propagating the value (degitized score), computes the > >> mean and the variance according to UCB1 and do the same for RAVE > >> seperatelly. For RAVE, the values of all (legal) moves, except played > >> one, in a node are updated. > >> 3) In the computation of values for RAVE, the point is that there > >> appeares three colors (as someone, I remember GCP, mentioned before). > >> If the players' colors aren't the same then skip. Count the value as > >> is or negate (1 - score, for me), depending on the color of the player > >> at the position and the color for the score. > >> 4) Before back-propagating the value of each playout, I setup a color > >> table for all intersections of the board for speed-up, in fact > >> (initialized with EMPTY). That is, fill the board (table[move] = > >> color) by tracing the moves and the colors returned by the playout > >> forward (from leaf node to end of the game). Then, by tracing the > >> path from root to the leaf node, clear the table[move] (table[move] = > >> EMPTY), in order to avoid duplicate counting with UCB1. > >> 5) While descending the tree, merge the values come from UCB1 and > >> RAVE with 'K' according to the formula in the paper. > >> > >> #Though I'm writing this by reading my source code, this description > >> may include some errors. > >> > >> Hope this helps, > >> > >> Hideki > >> > >> Gian-Carlo Pascutto: <[EMAIL PROTECTED]>: > >> > >> >> I also implemented RAVE in Mango. There was a few points of > improvements > >> >> (around 60 Elo points with gnugo as reference), but as much as in the > >> >> paper of Gelly and Silver :( (around 250 Elo points if I remember > well) > >> >> > >> >> It might be that the effect of RAVE depends a lot on the simulation > >> >> strategy. Indeed, sometimes my RAVE was playing very good moves but > also > >> >> very bad ones. > >> > > >> >I don't think the simulation strategy is the key. > >> > > >> >I suspect the improvement is largest when you don't do progressive > widening. > >> > > >> >Nevertheless it would be quite interesting to see the implementation > >> >details of ggmc's RAVE. RAVE performance is quite dependent on exact > >> >implementation and parameters. > >> -- > >> > >> [EMAIL PROTECTED] (Kato) > >> ___ > >> > >> > >> computer-go mailing list > >> computer-go@computer-go.org > >> http://www.computer-go.org/mailman/listinfo/computer-go/ > >> > >___ > >computer-go mailing list > >computer-go@computer-go.org > >http://www.computer-go.org/mailman/listinfo/computer-go/ > -- > [EMAIL PROTECTED] (Kato) > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, My program is very based on MoGo's report and the paper. Yes, I used FPU of 1.15. -Hideki Erik van der Werf: <[EMAIL PROTECTED]>: >Hi Hideki, > >Your results look similar to those of Mogo as reported in their icml >paper. When you ran this experiment, did you use anything like FPU or >progressive widening, or did you use Levente's original design which >always selects unvisited moves first? > >Regards, >Erik > > >On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote: >> I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, >> komi 7.5, 3000 playouts/move, 2000 games match: >> >> Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) >> With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) >> >> Though this includes some other improvements, most come from RAVE. >> Unlike MoGo, my best 'K' was 1000. >> >> Following is my implementation of RAVE for GGMC v2r6. >> 1) Each playout returns the score and all moves with colors played. >> 2) While back-propagating the value (degitized score), computes the >> mean and the variance according to UCB1 and do the same for RAVE >> seperatelly. For RAVE, the values of all (legal) moves, except played >> one, in a node are updated. >> 3) In the computation of values for RAVE, the point is that there >> appeares three colors (as someone, I remember GCP, mentioned before). >> If the players' colors aren't the same then skip. Count the value as >> is or negate (1 - score, for me), depending on the color of the player >> at the position and the color for the score. >> 4) Before back-propagating the value of each playout, I setup a color >> table for all intersections of the board for speed-up, in fact >> (initialized with EMPTY). That is, fill the board (table[move] = >> color) by tracing the moves and the colors returned by the playout >> forward (from leaf node to end of the game). Then, by tracing the >> path from root to the leaf node, clear the table[move] (table[move] = >> EMPTY), in order to avoid duplicate counting with UCB1. >> 5) While descending the tree, merge the values come from UCB1 and >> RAVE with 'K' according to the formula in the paper. >> >> #Though I'm writing this by reading my source code, this description >> may include some errors. >> >> Hope this helps, >> >> Hideki >> >> Gian-Carlo Pascutto: <[EMAIL PROTECTED]>: >> >> >> I also implemented RAVE in Mango. There was a few points of improvements >> >> (around 60 Elo points with gnugo as reference), but as much as in the >> >> paper of Gelly and Silver :( (around 250 Elo points if I remember well) >> >> >> >> It might be that the effect of RAVE depends a lot on the simulation >> >> strategy. Indeed, sometimes my RAVE was playing very good moves but also >> >> very bad ones. >> > >> >I don't think the simulation strategy is the key. >> > >> >I suspect the improvement is largest when you don't do progressive >> widening. >> > >> >Nevertheless it would be quite interesting to see the implementation >> >details of ggmc's RAVE. RAVE performance is quite dependent on exact >> >implementation and parameters. >> -- >> >> [EMAIL PROTECTED] (Kato) >> ___ >> >> >> computer-go mailing list >> computer-go@computer-go.org >> http://www.computer-go.org/mailman/listinfo/computer-go/ >> >___ >computer-go mailing list >computer-go@computer-go.org >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, > In the ICML version of UCT without RAVE, you did not use your First > Play Urgency, right? > > I think that using FPU has an effect similar to what others reported > with their progressive widening. From what I've seen it looks like > plain UCT, without FPU or progressive widening, has more to gain from > RAVE. > > Am I wrong to assume that the strongest version of Mogo before you > introduced RAVE used something like FPU or progressive widening? You are right, but the effect of terms which were before is by far negligeable. It may definitely be possible that stronger use of knowledge like the progressive widening make the effect of RAVE much less interesting. It was not our case at that time. I am also not sure that it is the main reason of failure for people who tried and report a small improvement. Of course, there can be so many reasons that it is difficult to find out without looking at the implementation in details :/. Best, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Hideki, Your results look similar to those of Mogo as reported in their icml paper. When you ran this experiment, did you use anything like FPU or progressive widening, or did you use Levente's original design which always selects unvisited moves first? Regards, Erik On Wed, Feb 6, 2008 at 3:42 PM, Hideki Kato <[EMAIL PROTECTED]> wrote: > I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, > komi 7.5, 3000 playouts/move, 2000 games match: > > Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) > With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) > > Though this includes some other improvements, most come from RAVE. > Unlike MoGo, my best 'K' was 1000. > > Following is my implementation of RAVE for GGMC v2r6. > 1) Each playout returns the score and all moves with colors played. > 2) While back-propagating the value (degitized score), computes the > mean and the variance according to UCB1 and do the same for RAVE > seperatelly. For RAVE, the values of all (legal) moves, except played > one, in a node are updated. > 3) In the computation of values for RAVE, the point is that there > appeares three colors (as someone, I remember GCP, mentioned before). > If the players' colors aren't the same then skip. Count the value as > is or negate (1 - score, for me), depending on the color of the player > at the position and the color for the score. > 4) Before back-propagating the value of each playout, I setup a color > table for all intersections of the board for speed-up, in fact > (initialized with EMPTY). That is, fill the board (table[move] = > color) by tracing the moves and the colors returned by the playout > forward (from leaf node to end of the game). Then, by tracing the > path from root to the leaf node, clear the table[move] (table[move] = > EMPTY), in order to avoid duplicate counting with UCB1. > 5) While descending the tree, merge the values come from UCB1 and > RAVE with 'K' according to the formula in the paper. > > #Though I'm writing this by reading my source code, this description > may include some errors. > > Hope this helps, > > Hideki > > Gian-Carlo Pascutto: <[EMAIL PROTECTED]>: > > >> I also implemented RAVE in Mango. There was a few points of improvements > >> (around 60 Elo points with gnugo as reference), but as much as in the > >> paper of Gelly and Silver :( (around 250 Elo points if I remember well) > >> > >> It might be that the effect of RAVE depends a lot on the simulation > >> strategy. Indeed, sometimes my RAVE was playing very good moves but also > >> very bad ones. > > > >I don't think the simulation strategy is the key. > > > >I suspect the improvement is largest when you don't do progressive widening. > > > >Nevertheless it would be quite interesting to see the implementation > >details of ggmc's RAVE. RAVE performance is quite dependent on exact > >implementation and parameters. > -- > > [EMAIL PROTECTED] (Kato) > ___ > > > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Gian-Carlo Pascutto: <[EMAIL PROTECTED]>: >Hideki Kato wrote: > >> 4) Before back-propagating the value of each playout, I setup a color >> table for all intersections of the board for speed-up, in fact >> (initialized with EMPTY). That is, fill the board (table[move] = >> color) by tracing the moves and the colors returned by the playout >> forward (from leaf node to end of the game). Then, by tracing the >> path from root to the leaf node, clear the table[move] (table[move] = >> EMPTY), in order to avoid duplicate counting with UCB1. > >I don't understand this. What and how would you be double counting? I mean the values of such moves are updated in both UCB and RAVE. That is, the moves in the path are updated by UCB and all moves in the nodes in the path are updated by RAVE. As UCB values and RAVE values will be averaged later, perhaps I thought not updating the values of such moves in RAVE would be _natural_. As this code was written last Oct. and I've been working on other staffs, I'm not sure I remember the idea correctly. But I believe this improved some. -Hideki -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
On Feb 6, 2008 11:39 AM, Gian-Carlo Pascutto <[EMAIL PROTECTED]> wrote: > Hideki Kato wrote: > > > 4) Before back-propagating the value of each playout, I setup a color > > table for all intersections of the board for speed-up, in fact > > (initialized with EMPTY). That is, fill the board (table[move] = > > color) by tracing the moves and the colors returned by the playout > > forward (from leaf node to end of the game). Then, by tracing the > > path from root to the leaf node, clear the table[move] (table[move] = > > EMPTY), in order to avoid duplicate counting with UCB1. > > I don't understand this. What and how would you be double counting? Let's a set of stones played in the playout get captured and then the void gets filled again. The key is to count only the first move at a set board position. If you don't, you can end up counting a particular move more than once. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hideki Kato wrote: 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. I don't understand this. What and how would you be double counting? -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Sylvain, On Wed, Feb 6, 2008 at 4:41 PM, Sylvain Gelly <[EMAIL PROTECTED]> wrote: > Do you want to come? ;) Well, I have a nice job, but one can always try to make me an offer I can't refuse ;-) > I just can't do too many things at the same time. Sounds familiar... > > Well, since you say the improvement is marginal on 9x9 then I think we > > are actually in agreement. I also get an improvement, but it's just > > not that much. When I wrote 'spectacular' I meant the reported jump > > from 25% to over 55% winrate against gnugo. > > I am sorry, I was unclear. When I said marginal in 9x9, I was talking > of the differences between the current MoGo (well, at least when I > left it 6 months ago :)) and the algorithm described in the ICML > paper. The change is only on how to balance the two values you get > (UCT and Rave). > Rave in 9x9, as described in the paper, gives a big jump in > performance, and the numbers reported in the paper are accurate: those > are computed with many thousands games against gnugo and I carefully > did the experiments multiple times. > In addition, Hideki for example report the same order of improvements. > I have to point out that it is really easy to make a mistake in the > updates making Rave much less interesting. I am definitely not saying > that you or anyone else made a mistake, but it can just happen, > sometimes :). In the ICML version of UCT without RAVE, you did not use your First Play Urgency, right? I think that using FPU has an effect similar to what others reported with their progressive widening. From what I've seen it looks like plain UCT, without FPU or progressive widening, has more to gain from RAVE. Am I wrong to assume that the strongest version of Mogo before you introduced RAVE used something like FPU or progressive widening? Best, Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, > Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-) I like it very much thanks. Do you want to come? ;). I miss computer go and all of you, but I read the list (not all) time to time, so I have some remembers :). I just can't do too many things at the same time. > Well, since you say the improvement is marginal on 9x9 then I think we > are actually in agreement. I also get an improvement, but it's just > not that much. When I wrote 'spectacular' I meant the reported jump > from 25% to over 55% winrate against gnugo. I am sorry, I was unclear. When I said marginal in 9x9, I was talking of the differences between the current MoGo (well, at least when I left it 6 months ago :)) and the algorithm described in the ICML paper. The change is only on how to balance the two values you get (UCT and Rave). Rave in 9x9, as described in the paper, gives a big jump in performance, and the numbers reported in the paper are accurate: those are computed with many thousands games against gnugo and I carefully did the experiments multiple times. In addition, Hideki for example report the same order of improvements. I have to point out that it is really easy to make a mistake in the updates making Rave much less interesting. I am definitely not saying that you or anyone else made a mistake, but it can just happen, sometimes :). > > That could be an explanation, but there are two points: > > - the prior you put on top of Rave often avoid to first sample 1-1, > > and even when you do, you very often loose just 1 playout because of > > the UCT value you get right away. > > Yes, using more prior knowledge will probably reduce the problem. You don't even need "more", already save atari, atari, hane and cut gives you enough in many cases to avoid 1-1. Of course more good knowledge is better :). My point is just that you don't need so much to avoid to try first 1-1. And even without prior knowledge that is not terrible. If you see at the numbers in the paper, adding this very simple prior knowledge does not improve so much. > > - I never observed a big discrepancy between the number of Rave > > samples for each move. > > I guess this is because your playout policy is more uniform than mine? > The problem tends to disappear with uniform random playouts. > My program has some hard-reject patterns to discard moves that are > strictly inferior to adjacent alternatives, so in some situations I > can easily get a large difference between the number of Rave samples > for each move. It is definitely a sensible hypothesis. Best, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
I found some data. GGMC Go v2r6, against GNU Go 3.7.10 level 10, 9x9, komi 7.5, 3000 playouts/move, 2000 games match: Without RAVE: winning rate was 23.1 +- 0.9% (-209 +- 9 ELO) With RAVE: winning rate was 65.3 +- 1.1% (+110 +- 8 ELO) Though this includes some other improvements, most come from RAVE. Unlike MoGo, my best 'K' was 1000. Following is my implementation of RAVE for GGMC v2r6. 1) Each playout returns the score and all moves with colors played. 2) While back-propagating the value (degitized score), computes the mean and the variance according to UCB1 and do the same for RAVE seperatelly. For RAVE, the values of all (legal) moves, except played one, in a node are updated. 3) In the computation of values for RAVE, the point is that there appeares three colors (as someone, I remember GCP, mentioned before). If the players' colors aren't the same then skip. Count the value as is or negate (1 - score, for me), depending on the color of the player at the position and the color for the score. 4) Before back-propagating the value of each playout, I setup a color table for all intersections of the board for speed-up, in fact (initialized with EMPTY). That is, fill the board (table[move] = color) by tracing the moves and the colors returned by the playout forward (from leaf node to end of the game). Then, by tracing the path from root to the leaf node, clear the table[move] (table[move] = EMPTY), in order to avoid duplicate counting with UCB1. 5) While descending the tree, merge the values come from UCB1 and RAVE with 'K' according to the formula in the paper. #Though I'm writing this by reading my source code, this description may include some errors. Hope this helps, Hideki Gian-Carlo Pascutto: <[EMAIL PROTECTED]>: >> I also implemented RAVE in Mango. There was a few points of improvements >> (around 60 Elo points with gnugo as reference), but as much as in the >> paper of Gelly and Silver :( (around 250 Elo points if I remember well) >> >> It might be that the effect of RAVE depends a lot on the simulation >> strategy. Indeed, sometimes my RAVE was playing very good moves but also >> very bad ones. > >I don't think the simulation strategy is the key. > >I suspect the improvement is largest when you don't do progressive widening. > >Nevertheless it would be quite interesting to see the implementation >details of ggmc's RAVE. RAVE performance is quite dependent on exact >implementation and parameters. -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Sylvain, Thanks for your reply! How do you like your new job? Do you miss CompGo? ;-) On Wed, Feb 6, 2008 at 2:20 PM, Sylvain Gelly <[EMAIL PROTECTED]> wrote: > > (1) They compared Rave to plain UCT. If they would have compared it to > > a more sophisticated implementation (like the best Mogo before Rave) > > they probably could not have shown a spectacular improvement. > > The best Mogo before Rave was very close to plain UCT with the > sequence-like simulations. And indeed we exactly compared the best > Mogo before and after Rave. There is a table (I don't remember which > number), which show the incremental improvements from plain UCT, to > Rave, passing by plain UCT with sequence-like simulations. All > experiments have been done with MoGo's code, all other parts of the > code staying constant. There were not "secret part" of MoGo disabled > to make the improvement of Rave more interesting. > > One discrepancy between our results and the one some of you observe, > as Gian-Carlo stated, is likely to come from the parameters and detail > of implementation. We heavily tuned those parameters and details > against gnugo, and that makes quite a big difference. I chatted more > closely with some of you about details and it did make a difference. > Maybe some of you can share what made a change, if you want. > > Note as well that the current implementation of MoGo (not the one at > the time of the ICML paper) use a different tradeoff between UCT and > Rave value, thanks to an idea of David Silver, which brought > improvements in 19x19 (where the Rave values are the most useful), > while it was marginal (still better) in 9x9. But anyway we here are > talking about 9x9, so it can't explain what you are talking about. Well, since you say the improvement is marginal on 9x9 then I think we are actually in agreement. I also get an improvement, but it's just not that much. When I wrote 'spectacular' I meant the reported jump from 25% to over 55% winrate against gnugo. I only got such an improvement when I first dumbed down my implementation to a plain UCT without a good move-ordering (and always expanding all unvisited nodes first). > > (2) () Depending on the playout > > > policy, adding an upper confidence bound to the rave values can push > > some terrible bad moves up (like playing on 1-1). The reason seems to > > be that such moves are normally sampled very infrequently (so the UCB > > will be higher), and when they are selected (...) > > That could be an explanation, but there are two points: > - the prior you put on top of Rave often avoid to first sample 1-1, > and even when you do, you very often loose just 1 playout because of > the UCT value you get right away. Yes, using more prior knowledge will probably reduce the problem. > - I never observed a big discrepancy between the number of Rave > samples for each move. I guess this is because your playout policy is more uniform than mine? The problem tends to disappear with uniform random playouts. My program has some hard-reject patterns to discard moves that are strictly inferior to adjacent alternatives, so in some situations I can easily get a large difference between the number of Rave samples for each move. Best, Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Erik, > (1) They compared Rave to plain UCT. If they would have compared it to > a more sophisticated implementation (like the best Mogo before Rave) > they probably could not have shown a spectacular improvement. The best Mogo before Rave was very close to plain UCT with the sequence-like simulations. And indeed we exactly compared the best Mogo before and after Rave. There is a table (I don't remember which number), which show the incremental improvements from plain UCT, to Rave, passing by plain UCT with sequence-like simulations. All experiments have been done with MoGo's code, all other parts of the code staying constant. There were not "secret part" of MoGo disabled to make the improvement of Rave more interesting. One discrepancy between our results and the one some of you observe, as Gian-Carlo stated, is likely to come from the parameters and detail of implementation. We heavily tuned those parameters and details against gnugo, and that makes quite a big difference. I chatted more closely with some of you about details and it did make a difference. Maybe some of you can share what made a change, if you want. Note as well that the current implementation of MoGo (not the one at the time of the ICML paper) use a different tradeoff between UCT and Rave value, thanks to an idea of David Silver, which brought improvements in 19x19 (where the Rave values are the most useful), while it was marginal (still better) in 9x9. But anyway we here are talking about 9x9, so it can't explain what you are talking about. > (2) () Depending on the playout > policy, adding an upper confidence bound to the rave values can push > some terrible bad moves up (like playing on 1-1). The reason seems to > be that such moves are normally sampled very infrequently (so the UCB > will be higher), and when they are selected (...) That could be an explanation, but there are two points: - the prior you put on top of Rave often avoid to first sample 1-1, and even when you do, you very often loose just 1 playout because of the UCT value you get right away. - I never observed a big discrepancy between the number of Rave samples for each move. Best, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
Hi Guillaume, I think we talked about this before, but others may be interested as well. In my opinion the ICML paper on Rave has several weaknesses. It's been a while since I read the paper, but here are some I remember: (1) They compared Rave to plain UCT. If they would have compared it to a more sophisticated implementation (like the best Mogo before Rave) they probably could not have shown a spectacular improvement. (2) The paper suggest that you should add an upper confidence bound to the rave values. Although they didn't actually make this explicit, because IIRC the paper failed to provide an actual value for the constant c, I guess most people naturally assume it to be positive. However, Rave is already a greedy heuristic, so if anything you should probably subtract the confidence bound. Depending on the playout policy, adding an upper confidence bound to the rave values can push some terrible bad moves up (like playing on 1-1). The reason seems to be that such moves are normally sampled very infrequently (so the UCB will be higher), and when they are selected (e.g. for a big capture deep in the playout) they correlate more with winning (so the value will also be higher) without generalizing to earlier positions. Best, Erik On Wed, Feb 6, 2008 at 10:47 AM, Chaslot G (MICC) <[EMAIL PROTECTED]> wrote: > I also implemented RAVE in Mango. There was a few points of improvements > (around 60 Elo points with gnugo as reference), but as much as in the paper > of Gelly and Silver :( (around 250 Elo points if I remember well) > > It might be that the effect of RAVE depends a lot on the simulation > strategy. Indeed, sometimes my RAVE was playing very good moves but also very > bad ones. > > Guillaume > > -Original Message- > From: [EMAIL PROTECTED] on behalf of Magnus Persson > Sent: Wed 06/02/2008 00:42 > To: computer-go@computer-go.org > Subject: Re: [computer-go] More UCT / Monte-Carlo questions > > Quoting Gunnar Farnebäck <[EMAIL PROTECTED]>: > > > I have never managed to implement RAVE successfully. It made my > > program significantly slower but no stronger even at a fixed number of > > simulations. > > I get a small effect from RAVE, My rationalisation is that if the > program is rich with other features to improve performance RAVE may > not add that much. > > -Magnus > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
You are right. I don't do progressive widening. -Hideki Gian-Carlo Pascutto: <[EMAIL PROTECTED]>: >> I also implemented RAVE in Mango. There was a few points of improvements >> (around 60 Elo points with gnugo as reference), but as much as in the >> paper of Gelly and Silver :( (around 250 Elo points if I remember well) >> >> It might be that the effect of RAVE depends a lot on the simulation >> strategy. Indeed, sometimes my RAVE was playing very good moves but also >> very bad ones. > >I don't think the simulation strategy is the key. > >I suspect the improvement is largest when you don't do progressive widening. > >Nevertheless it would be quite interesting to see the implementation >details of ggmc's RAVE. RAVE performance is quite dependent on exact >implementation and parameters. -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
> I also implemented RAVE in Mango. There was a few points of improvements > (around 60 Elo points with gnugo as reference), but as much as in the > paper of Gelly and Silver :( (around 250 Elo points if I remember well) > > It might be that the effect of RAVE depends a lot on the simulation > strategy. Indeed, sometimes my RAVE was playing very good moves but also > very bad ones. I don't think the simulation strategy is the key. I suspect the improvement is largest when you don't do progressive widening. Nevertheless it would be quite interesting to see the implementation details of ggmc's RAVE. RAVE performance is quite dependent on exact implementation and parameters. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] More UCT / Monte-Carlo questions (Effect of rave)
I also implemented RAVE in Mango. There was a few points of improvements (around 60 Elo points with gnugo as reference), but as much as in the paper of Gelly and Silver :( (around 250 Elo points if I remember well) It might be that the effect of RAVE depends a lot on the simulation strategy. Indeed, sometimes my RAVE was playing very good moves but also very bad ones. Guillaume -Original Message- From: [EMAIL PROTECTED] on behalf of Magnus Persson Sent: Wed 06/02/2008 00:42 To: computer-go@computer-go.org Subject: Re: [computer-go] More UCT / Monte-Carlo questions Quoting Gunnar Farnebäck <[EMAIL PROTECTED]>: > I have never managed to implement RAVE successfully. It made my > program significantly slower but no stronger even at a fixed number of > simulations. I get a small effect from RAVE, My rationalisation is that if the program is rich with other features to improve performance RAVE may not add that much. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ <>___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/