Re: [computer-go] Monte-Carlo Go simulation
On 9, Feb 2007, at 4:40 AM, Sylvain Gelly wrote: Alain's point, that knowledge can both help narrow the search to "good" moves and at the same time steer you away from the best move is absolutely true in SlugGo's case. I completely agree with that. However can we agree that we want a better player in a whole, and not only better in some particular positions? So perhaps, I think, behind far from the best move, while playing always good moves is already good no? Sylvain Absolutely. I notice that when SlugGo makes moves that a professional said "look quite playable" a huge mistake is going to happen very soon. Making the best move from the point of view of a really strong player is also NOT what I want SlugGo to do. SlugGo has no concept of what the implications are and what the required followup moves will be. It is clearly better for SlugGo to make many 90% moves in a row than to have it try to make any 100% moves. There is a much better chance of it finding the correct followup to slightly lower quality moves. Cheers, David ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Alain's point, that knowledge can both help narrow the search to "good" moves and at the same time steer you away from the best move is absolutely true in SlugGo's case. I completely agree with that. However can we agree that we want a better player in a whole, and not only better in some particular positions? So perhaps, I think, behind far from the best move, while playing always good moves is already good no? Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
> I think we have to start defining what the bias. For me the bias is > the difference between the expected value of the outcomes of playouts > by the simulation player and the "real minimax value". In this > definition the uniform random simulation player is VERY biased and > gnugo much less. OK, by i used "bias" in common sense, to mean that the "strong simulator" has preferences for some moves, and doesn't consider them equally, or worse doesn't consider some moves. Ok you are talking about "bias" on the moves, I was talking about bias on the Monte-Carlo simulations outcomes (difference between the expectation of the random variable and the real value you want to estimate). So you are more talking about difference with the uniform distribution on moves. I think what we care about is the MC outcomes. Particular moves played by the simulation player do not matter. So it will miss some good points due to its knowledge, whereas the random player will find the move. But we don't care about the random player "finding" or not the move. If the random player plays with probability 1/100 the good move, and also does not find the good answers afterwards, it is not clear how it changes the expectation of the outcomes. > > Even if it is obviously much stronger than a random player, it would give > > wrong result if used as a simulation player. > Hum, are you sure? I m 100% sure of this :-) May I be 99% sure that you should not be 100% sure of this? ;-) I think that without having empirical evidences, we can't be 100% sure... > I think that GnuGo with randomisation, (and much > faster of course) would make a very good simulation player (much > better than any existing simulation player). Even with randomization, GNU Go considers only a few dozen of possible moves, and makes systematic errors. You can be epsilon greedy if you want to avoid systematic errors. Some times ago Rémi Coulom asked for "positions illustrating computer stupidity" (2006-11-22) http://computer-go.org/pipermail/computer-go/2006-November/007107.html and GNU Go provided some nice examples where its (wrong/misunderstood) >knowledge induces a failure in play. I bet we can find much more positions where uniform random would give the wrong answer with high probabilty, isn't it? Furthermore, we are not talking about having a perfect player. There will always have particular positions where computer sucks, as I am sure we can find positions where human sucks. > I understand all these counter examples, I just think that it is more > complicated than that. I fully agree. Good, we only have to find how to do better then ;-). Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Le jeudi 8 février 2007 22:09, Sylvain Gelly a écrit : > > It seems i was ambiguous: I was speaking of the simulation player too. > > What i meant is a random simulation player is not biased, whereas a "better" > > simulation player is biased by its knowledge, and thus can give wrong > > evaluation of a position. > I think we have to start defining what the bias. For me the bias is > the difference between the expected value of the outcomes of playouts > by the simulation player and the "real minimax value". In this > definition the uniform random simulation player is VERY biased and > gnugo much less. OK, by i used "bias" in common sense, to mean that the "strong simulator" has preferences for some moves, and doesn't consider them equally, or worse doesn't consider some moves. So it will miss some good points due to its knowledge, whereas the random player will find the move. > > > A trivial example is GNU Go: its analyze is "sometimes" wrong. > Of course, if not computer go would be solved :-). > > > Even if it is obviously much stronger than a random player, it would give > > wrong result if used as a simulation player. > Hum, are you sure? I m 100% sure of this :-) > I think that GnuGo with randomisation, (and much > faster of course) would make a very good simulation player (much > better than any existing simulation player). Even with randomization, GNU Go considers only a few dozen of possible moves, and makes systematic errors. Some times ago Rémi Coulom asked for "positions illustrating computer stupidity" (2006-11-22) http://computer-go.org/pipermail/computer-go/2006-November/007107.html and GNU Go provided some nice examples where its (wrong/misunderstood) knowledge induces a failure in play. One very impressive was GNU GO 3.6 not invading where obviously it is possible to invade (Steven Clark 2006-11-27) http://computer-go.org/pipermail/computer-go/2006-November/007184.html > But a weaker player than GnuGo can make an even better simulation player. yes. > > > David Doshay experiments with SlugGo showed that > > searching very deep/wide does not improve a lot the strength of the engine, > > which is bound by the underlying weaknesses of GNU Go. > Yes, this a similar non trivial result. I think there are more > existing experimental and theoritical analysis of this, though. > Perhaps such an analysis already exist for MC also, it is just that I > don't know. > > > Or maybe i just understood nothing of what you explained ;) > It was not really "explanations", just thoughts. I have no the > solution, just think that it is an interesting question, and that it > may be discussed. May be from a strong explanation of this phenomenon > could come new ideas. > > I understand all these counter examples, I just think that it is more > complicated than that. > > Sylvain I fully agree. Alain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
I think that the bias Alain meant is the choice of moves that control the branching factor. If I understand correctly, this can happen differently in two places in MoGo: once in the branching below a node in the UCT tree, and either the same or differently in the random playouts. In some ways this is like SlugGo, where branching at the first level (our move) may or may not be determined in the same way as at the next (their guessed reply). If SlugGo is set for multiple levels of branching in the lookahead we do the same thing but from the other perspective. But for deeper linear lookahead things are different, just like in your random playouts. Alain's point, that knowledge can both help narrow the search to "good" moves and at the same time steer you away from the best move is absolutely true in SlugGo's case. This is the primary reason we have always wanted to have multiple Go engines making move suggestions, not just multiple instantiations of the same engine like we have now. But we could get up and running faster with one engine, so that is where we are now. Hopefully not much longer ... Cheers, David On 8, Feb 2007, at 2:09 PM, Sylvain Gelly wrote: It seems i was ambiguous: I was speaking of the simulation player too. What i meant is a random simulation player is not biased, whereas a "better" simulation player is biased by its knowledge, and thus can give wrong evaluation of a position. I think we have to start defining what the bias. For me the bias is the difference between the expected value of the outcomes of playouts by the simulation player and the "real minimax value". In this definition the uniform random simulation player is VERY biased and gnugo much less. A trivial example is GNU Go: its analyze is "sometimes" wrong. Of course, if not computer go would be solved :-). Even if it is obviously much stronger than a random player, it would give wrong >result if used as a simulation player. Hum, are you sure? I think that GnuGo with randomisation, (and much faster of course) would make a very good simulation player (much better than any existing simulation player). But a weaker player than GnuGo can make an even better simulation player. David Doshay experiments with SlugGo showed that searching very deep/wide does not improve a lot the strength of the engine, which is bound by the underlying weaknesses of GNU Go. Yes, this a similar non trivial result. I think there are more existing experimental and theoritical analysis of this, though. Perhaps such an analysis already exist for MC also, it is just that I don't know. Or maybe i just understood nothing of what you explained ;) It was not really "explanations", just thoughts. I have no the solution, just think that it is an interesting question, and that it may be discussed. May be from a strong explanation of this phenomenon could come new ideas. I understand all these counter examples, I just think that it is more complicated than that. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
It seems i was ambiguous: I was speaking of the simulation player too. What i meant is a random simulation player is not biased, whereas a "better" simulation player is biased by its knowledge, and thus can give wrong evaluation of a position. I think we have to start defining what the bias. For me the bias is the difference between the expected value of the outcomes of playouts by the simulation player and the "real minimax value". In this definition the uniform random simulation player is VERY biased and gnugo much less. A trivial example is GNU Go: its analyze is "sometimes" wrong. Of course, if not computer go would be solved :-). Even if it is obviously much stronger than a random player, it would give wrong >result if used as a simulation player. Hum, are you sure? I think that GnuGo with randomisation, (and much faster of course) would make a very good simulation player (much better than any existing simulation player). But a weaker player than GnuGo can make an even better simulation player. David Doshay experiments with SlugGo showed that searching very deep/wide does not improve a lot the strength of the engine, which is bound by the underlying weaknesses of GNU Go. Yes, this a similar non trivial result. I think there are more existing experimental and theoritical analysis of this, though. Perhaps such an analysis already exist for MC also, it is just that I don't know. Or maybe i just understood nothing of what you explained ;) It was not really "explanations", just thoughts. I have no the solution, just think that it is an interesting question, and that it may be discussed. May be from a strong explanation of this phenomenon could come new ideas. I understand all these counter examples, I just think that it is more complicated than that. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Le jeudi 8 février 2007 20:12, Sylvain Gelly a écrit : > > One simple explaination could be that a random player shamelessly tries > > "all" > > moves (very bad ones but also very nice tesuji) whereas the "stronger" > > player > > is restricted by its knowledge and will always miss some kind of moves. > > Here we are not speeking about the pruning in the tree, but the > simulation player. The tree must explore every move, to avoid missing > important ones. However we totally don't care if all possible games > can or not be played by the simulation player. What we care about is > the expectation of the wins by self play. > If the simulation player sometimes play meaningful sequences but with > a very small probability, then it has very little influence on the > expectation. > It seems i was ambiguous: I was speaking of the simulation player too. What i meant is a random simulation player is not biased, whereas a "better" simulation player is biased by its knowledge, and thus can give wrong evaluation of a position. A trivial example is GNU Go: its analyze is "sometimes" wrong. Even if it is obviously much stronger than a random player, it would give wrong result if used as a simulation player. David Doshay experiments with SlugGo showed that searching very deep/wide does not improve a lot the strength of the engine, which is bound by the underlying weaknesses of GNU Go. Or maybe i just understood nothing of what you explained ;) Alain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
One simple explaination could be that a random player shamelessly tries "all" moves (very bad ones but also very nice tesuji) whereas the "stronger" player is restricted by its knowledge and will always miss some kind of moves. Here we are not speeking about the pruning in the tree, but the simulation player. The tree must explore every move, to avoid missing important ones. However we totally don't care if all possible games can or not be played by the simulation player. What we care about is the expectation of the wins by self play. If the simulation player sometimes play meaningful sequences but with a very small probability, then it has very little influence on the expectation. It seems to me that the explanation may be more complicated. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Le jeudi 8 février 2007 17:06, Sylvain Gelly a écrit : > Hello, > > > Is there any known (by theory or tests) function of how much a increase > > in the strength of the simulation policy increases the strength of the > > MC/UCT Program as a whole? > > I think that is a very interesting question. > In our work on MoGo we found that there could be a decrease of the > strength of the MC/UCT program while using a stronger simulation > policy. It is why in MoGo it is more the "sequence idea", than the > "strength idea". Our best simulation policy is quite weak compared to > others we tested. > But we have further experiments, in a work with David Silver from the > university of Alberta. We found out that the relation "strong > simulation policy" <=> "strong MC program" is wrong at a much larger > scale. So the "intransivity" is true even with much much stronger > simulation policies. > One simple explaination could be that a random player shamelessly tries "all" moves (very bad ones but also very nice tesuji) whereas the "stronger" player is restricted by its knowledge and will always miss some kind of moves. Things alike were reported by D Doshay and Sluggo, which is limited by the underlying gnugo, no matter how deep/wide it searches. Alain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Go simulation
Le jeudi 8 février 2007 17:06, Sylvain Gelly a écrit : > Hello, > > > Is there any known (by theory or tests) function of how much a increase > > in the strength of the simulation policy increases the strength of the > > MC/UCT Program as a whole? > > I think that is a very interesting question. > In our work on MoGo we found that there could be a decrease of the > strength of the MC/UCT program while using a stronger simulation > policy. It is why in MoGo it is more the "sequence idea", than the > "strength idea". Our best simulation policy is quite weak compared to > others we tested. > But we have further experiments, in a work with David Silver from the > university of Alberta. We found out that the relation "strong > simulation policy" <=> "strong MC program" is wrong at a much larger > scale. So the "intransivity" is true even with much much stronger > simulation policies. > One simple explaination could be that a random player shamelessly tries "all" moves (very bad ones but also very nice tesuji) whereas the "stronger" player is restricted by its knowledge and will always miss some kind of moves. Things alike were reported by D Doshay and Sluggo, which is limited by the underlying gnugo, no matter how deep/wide it searches. Alain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/