Re: Re[4]: [computer-go] UCT vs MC
One more question. Line 23 states: for i:=node.size()-2 to 0 do. The leaf node should be stored in node[node.size()-1], so why do we start at node.size()-2? Is it not necessary to update the value of the leaf node? i:=node.size()-1 would be better you're right :-). Experiments made by people in this list (Don if I remember correctly), showed that you even don't have to create the leaf if the parent node has less than a few simulations. 100 simulations seemed to be already safe, and much faster. So the modification in the algorithm, is to stop the descend when the number of simulations of the node is threshold rather than ending on a unseen node. Sylvain -Original Message- From: Don Dailey [EMAIL PROTECTED] To: Dmitry Kamenetsky [EMAIL PROTECTED], computer-go computer-go@computer-go.org Date: Wed, 21 Feb 2007 12:54:43 -0500 Subject: Re: Re[2]: [computer-go] UCT vs MC On Wed, 2007-02-21 at 16:56 +0300, Dmitry Kamenetsky wrote: Thank you for your answer. However, I am even more confused now. I understand that - is for negamax, but I don't understand why it became 1-. I am trying to implement your algorithm and I just want to know what lines 7, 16 and 26 should be? I'm not sure this is what you are looking for, but in negamax, scores can be negative or positive. The scores are always adjusted so that you can view positive numbers as good and negative as bad from the point of view you are referencing. So to get the score from the other point of view you simple negate it. But in UCT, we don't deal with negative numbers. A score is between 0 and 1, so 0.001 is almost losing and 0.999 is almost winning for example. To change 0.99 to the other players point of view in this system, where scores must be between 0 and 1, you must negate it and add 1. So 0.99 becomes: 1 - 0.99 = 0.01 I hope that is what you are asking about and that this explains it. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re[2]: [computer-go] UCT vs MC
Thank you for your answer. However, I am even more confused now. I understand that - is for negamax, but I don't understand why it became 1-. I am trying to implement your algorithm and I just want to know what lines 7, 16 and 26 should be? -Original Message- From: Sylvain Gelly [EMAIL PROTECTED] To: Dmitry Kamenetsky [EMAIL PROTECTED] Date: Wed, 21 Feb 2007 11:03:08 +0100 Subject: Re: [computer-go] UCT vs MC Hello Dmitry, Your code says that the value is backed up by sum and negation (line 26, value := -value). But I don't see any negative values in your sample tree, or values greater than one. How do you actually back up values to the root? Sorry, it is value := 1-value. Thank you for pointing out the mistake. I am confused about value. What is it actually storing? I thought node[i].value stores the number of wins (for Black) for node i. Then why some of the values in Figure 1 not integer? If line 26 is now value := 1-value, then should some of the other lines also change? For example should line 7 be updateValue(node, 1-node[i].value), and line 16 be else v[i]:= (1-node.childNode [i].value)/node.childNode[i].nb+sqrt(...)? You're right there were some confusion :-). In fact it is very simple. The - is here because it is negamax and not minimax, so that you can always take the max of the value (but the value is negated every 2 levels). The value stored then corresponds to the value of the player to play in the node. It seems that node[i].value indeed keeps the number of wins for the player to play in node i. the 1- does not exist. In Figure 1, it is an example of UCT in general case, where the reward is not always in [0,1]. And the values displayed in the nodes are the averages. So that explains the non integers and the values not in [0,1]. Can you also update all the changes in your report? Thank you. I'll try to find sometime to do that. Can't tell it will be soon though. Regards, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: Re[2]: [computer-go] UCT vs MC
Thank you for your answer. However, I am even more confused now. I understand that - is for negamax, but I don't understand why it became 1-. I am trying to implement your algorithm and I just want to know what lines 7, 16 and 26 should be? It became a 1- because I said a mistake while answering. The 1 would have been here only to keep values always between 0 and 1 (instead of [0,1] if black or [-1,0] if white), IF value was the average win and not the total win. So my fault, sorry :-/. Is that make things clearer? Sylvain -Original Message- From: Sylvain Gelly [EMAIL PROTECTED] To: Dmitry Kamenetsky [EMAIL PROTECTED] Date: Wed, 21 Feb 2007 11:03:08 +0100 Subject: Re: [computer-go] UCT vs MC Hello Dmitry, Your code says that the value is backed up by sum and negation (line 26, value := -value). But I don't see any negative values in your sample tree, or values greater than one. How do you actually back up values to the root? Sorry, it is value := 1-value. Thank you for pointing out the mistake. I am confused about value. What is it actually storing? I thought node[i].value stores the number of wins (for Black) for node i. Then why some of the values in Figure 1 not integer? If line 26 is now value := 1-value, then should some of the other lines also change? For example should line 7 be updateValue(node, 1-node[i].value), and line 16 be else v[i]:= (1-node.childNode [i].value)/node.childNode[i].nb+sqrt(...)? You're right there were some confusion :-). In fact it is very simple. The - is here because it is negamax and not minimax, so that you can always take the max of the value (but the value is negated every 2 levels). The value stored then corresponds to the value of the player to play in the node. It seems that node[i].value indeed keeps the number of wins for the player to play in node i. the 1- does not exist. In Figure 1, it is an example of UCT in general case, where the reward is not always in [0,1]. And the values displayed in the nodes are the averages. So that explains the non integers and the values not in [0,1]. Can you also update all the changes in your report? Thank you. I'll try to find sometime to do that. Can't tell it will be soon though. Regards, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: Re[2]: [computer-go] UCT vs MC
On Wed, 2007-02-21 at 16:56 +0300, Dmitry Kamenetsky wrote: Thank you for your answer. However, I am even more confused now. I understand that - is for negamax, but I don't understand why it became 1-. I am trying to implement your algorithm and I just want to know what lines 7, 16 and 26 should be? I'm not sure this is what you are looking for, but in negamax, scores can be negative or positive. The scores are always adjusted so that you can view positive numbers as good and negative as bad from the point of view you are referencing. So to get the score from the other point of view you simple negate it. But in UCT, we don't deal with negative numbers. A score is between 0 and 1, so 0.001 is almost losing and 0.999 is almost winning for example. To change 0.99 to the other players point of view in this system, where scores must be between 0 and 1, you must negate it and add 1. So 0.99 becomes: 1 - 0.99 = 0.01 I hope that is what you are asking about and that this explains it. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] UCT vs MC
Hi Sylvain, Your code says that the value is backed up by sum and negation (line 26, value := -value). But I don't see any negative values in your sample tree, or values greater than one. How do you actually back up values to the root? Sorry, it is value := 1-value. Thank you for pointing out the mistake. I am confused about value. What is it actually storing? I thought node[i].value stores the number of wins (for Black) for node i. Then why some of the values in Figure 1 not integer? If line 26 is now value := 1-value, then should some of the other lines also change? For example should line 7 be updateValue(node, 1-node[i].value), and line 16 be else v[i]:= (1-node.childNode[i].value)/node.childNode[i].nb+sqrt(...)? Can you also update all the changes in your report? Thank you. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT vs MC
Quoting [EMAIL PROTECTED]: I'm curious about the full width depth and the principal variation depth to compare UCT wilth alpha-beta. The comparison is not so easy to do I think, because using MC as an evaluation function for alpha beta, you have to do several simulations for one evaluation and take the average. So the question is how many simulations to do (tradeoff false alpha-beta cuts/depth)? The right should be the number which makes the stronger player. I did not made such experiments. Perhaps someone did? My old program Viking5 used alphabeta with monte carlo evaluation and alpha-beta search. Valkyria uses UCT and similar code for Monte Carlo evaluation. One cannot compared these programs directly since they do not share code. But I would guess that alkyria would search the principle variation to 20-100% deeper depth than Viking5 would. The cost is that Valkyria might sometimes get stuck on the second best move. In the opening the difference is probably small becuase UCT searches quite uniform, but if there is fighting where critical stones are very unstable the seacrh can get very deep if there are many forcing moves. I know at least one 9x9 position where Valkyria can search forced sequences to 15 ply, but normally perhaps it might get 4-5 meaningful nodes deep where Viking5 would get 3 ply. It is also tricky to compare the principal eval of UCT with alpha-beta because near the leaf the number of visits to the nodes are less than what could be considered sound MC-eval. When I print out principal variations I stop as soon as the number of visits for a node is less than 1000. This means that the moves in the pruned principle variation is of high quality. For alpha-beta a similar effect can be seen by choosing different amounts of simulations for the eval. Normally a few 100 simulations is necessary but one could make 1 simulation eval that spits out very deep evals that unfortunately is almost random. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] UCT vs MC
Thanks. So it seems that doing as many random games as possible is not the ideal approach. In UCT, I suppose the equivalent of the principal variation would be the path from the root that always visits the child with the highest number of simulations. When you make a move with 70,000 simulations, how deep is this UCT_principal variation? I expect that after this many simulations the UCT tree will include every position in some number of ply from the root. Deeper in the tree it will not visit every child. Typically, how many ply in the UCT tree is full width? I'm curious about the full width depth and the principal variation depth to compare UCT wilth alpha-beta. It's great to see a new approach to computer go that works so well. Thanks for sharing your work. David -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, January 01, 2007 3:56 AM To: computer-go Subject: Re: [computer-go] UCT vs MC Hello and happy new year! I have some questions about your paper... Whouah that's a lot of questions :). I'll try to answer well. I thought that the Monte Carlo evaluation of a position is done by making many random games from that position, and averaging the win/loss result. So the evaluation of a position would be a number between 0 and 1. I thought several thousand random games would be used for one evaluation. No, we consider one evaluation as one simulation, so the evaluation function is a bernoulli random variable. Now you are mainly interested in its expectation, and it is why you can think making a lot of simulations and averaging, trying to approximate the expectation by the empirical average. In your paper, say that each UCT leaf node is evaluated by exactly one random game (simulation), with a result of 0 or 1. Is this true? Yes it is true. We try having more that one simulation per leaf node, and: -with the same number of nodes, the improvement is very small; -with the same number of total simulations the level is much weaker. This seems counter intuitive, but in fact it is not. At each simulation we add a node. So a node often visited will have a lot of descendants, so will average a lot of simulations. The key idea of UCT is that the value in a node is the average of its children's values weigthed by the frequency of visits for its children. I think you say Mogo does 70,000 random games per move. Does this mean that the UCT tree for a move has 70,000 nodes? When you say 70,000 games per move, does that mean total game move made, or game per node evaluation? That means per MoGo's move. So yes, UCT tree for a move has 7 nodes. It is the total number of simulations. I use the same count for the CGOS versions of MoGo. MoGo_xxx_10k uses 1 simulations for each move, or if you prefer, 1 nodes in its tree. That means that if the CGOS game finished after 40 MoGo moves, then MoGo has computed 400 000 random simulations for the complete game. (There is also a 5000 simulations per move version on CGOS). How many simulations (random games with patterns) does Mogo do per second? On a P4 3.4Ghz, 4500 in 9x9, 1000 in 19x19. How do you back up values in the UCT tree? There are values in the example tree, but I can't see how they are calculated. As in the UCT algorithm. For each node for the root to the leaf of the current sequence you simply add the 0/1 result to a variable, and 1 to the count of the number of simulations. Your code says that the value is backed up by sum and negation (line 26, value := -value). But I don't see any negative values in your sample tree, or values greater than one. How do you actually back up values to the root? Sorry, it is value := 1-value. Thank you for pointing out the mistake. on page 5 you say that UCB1-TUNED performs better and you use that formula. In the code for the algorithm, you use UCB (line 16). Which is correct? Since the beginning we used UCB1-TUNED and it performed better. Now with all other improvements, and with a fine parameters tuning the difference is very small. UCB1-TUNED has the advantage that it does not need a parameters tuning to performs well in Go (the famous sqrt(1/10) constant Remi Coulom posted in this list). In your paper you show win rates against GnuGo of about 50%, depending on the parameters. The current Mogo beats GnuGo over 90%. What changed? Are you doing more simulations, or do you have more go knowledge in your patterns? The results near 50% was with the uniform random simulations. The 80% is with the improved simulations. In the current MoGo there are new improvements not yet published. Currently, against gnugo 3.6 level 8 with 7 simulations, the result is 92.5%. MoGo which plays on tournaments makes more than
Re: [computer-go] UCT vs MC
I seem to remember someone on this group a couple of years ago or so saying that there won't be a 1 Dan 9x9 player anytime soon. I don't remember the exact quote or who said it. I'm looking through the archives but I can't find it. I would not name the person even when I do, but it gives me a strong feeling of Deja Vue. Chrilly probably remembers when the strongest chess computers were about beginner strength despite furious attempts to make them play strongly. - Don On Mon, 2007-01-01 at 01:33 +0100, John Tromp wrote: On 1/1/07, David Fotland [EMAIL PROTECTED] wrote: In your paper you show win rates against GnuGo of about 50%, depending on the parameters. The current Mogo beats GnuGo over 90%. What changed? Are you doing more simulations, or do you have more go knowledge in your patterns? Does Mogo have an opening book? I spent most of yesterday on KGS playtesting MoGo on 9x9 with 30 min total thinking time. The experience was quite unlike any other program I've played on 9x9 in the past. As I wrote to Sylvain in a private email: I had a lot of fun playing MoGo today. In the first game, it played some nonsensical moves and I got a totally won position, but MoGo turned out to be very inventive and led me into a trap:( That was not the last game I was to lose to MoGo. I found it much more challenging than any other program I played. It is quite resourceful. And in one game you'll see it play a beautiful tesuji. This really makes me feel like it's only a matter of time till MC programs can challenge professionals on 9x9. I enclose 2 of the games I played. In the first, MoGo is quite enterprising in the opening, with moves like e6. It would be very hard for an evaluation function to appreciate the potential w has for territory after black c8. But MoGo correctly assesses that w will control the right half of the board. Furthermore, it very nicely punishes blacks mistake of playing f5 prematurely with a beautiful tesuji at d3. In the other game, Mogo plays a different atari on the 6th move, leading to a very different game. It shows good timing in playing b7 when the right group can fend for itself and plays a nice probe at e3 to determine its followup. Apparently it sees that g2 is sente on the d2 group, preventing black from a killing attempt at h5. It makes a mistakeat move 30 with f7 though. Playing a4 c4 a2 b3 a3 d4 a6 b3 e8 instead would have given it a win. Later testing with MoGo showed that it indeed was unlucky to choose f7, and prefers e8 with a bit more search. I feel that the shodan level go 9x9 programs have arrived... regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT vs MC
On Mon, 2007-01-01 at 20:10 +0100, [EMAIL PROTECTED] wrote: I'm curious about the full width depth and the principal variation depth to compare UCT wilth alpha-beta. The comparison is not so easy to do I think, because using MC as an evaluation function for alpha beta, you have to do several simulations for one evaluation and take the average. So the question is how many simulations to do (tradeoff false alpha-beta cuts/depth)? The right should be the number which makes the stronger player. I did not made such experiments. Perhaps someone did? I did something similar. But it takes a huge amount of horsepower to zero in on the right value. What seems to be the case is that you need less simulations as you search deeper. Of course it's always better to do more simulations but the question is the trade-off of whether it's better to go 1 ply deeper doing less, or one ply less doing more. Unfortunately, to get the right answer requires a lot of work. There are other variables too such as how much cheating should you do. You can seriously reduce the number of simulations you do at end nodes by stopping early when it appears unlikely your score will fall within the alpha/beta window.You can do this by asking the question, what would be my score if the next N games were all wins or losses? I also discovered that it is not efficient doing too few simulations. If you are not doing enough, doubling the number of simulations only increases the search effort slightly, or in some cases it improves it. This is almost certainly because of move ordering issues, very difficult to get good move ordering with a fuzzy evaluator. I have a theory on how to make straight alpha beta work with monte carlo evaluations at end nodes. You want to use monte carlo as an evaluation function, but you want it applied to quiet positions. I think you need to take your end node positions before you apply monte carlo as an evaluation and do something similar to the following: 1. Identify clearly dead and live groups. 2. Create a proxy position that is similar to the position you want to evaluate, but has been fixed-up to be less confusing to the monte carlo simulations. This could involve placing stones so that living groups are not touched in the simulation. It might involve artificially killing dead groups so that monte carlo is not distracted killing them. A veto list involves identifying moves that cannot possibly help and telling the monte carlo simulation about these moves. Even if you know the move can't possible help in the next 4 or 5 moves, it might be a major boost of quality to simulation. It might involve move suggestions, veto-tables, etc. to help the monte carlo search.The idea is to do anything that makes the final position more quiet and monte carlo search more relevant and doing it safely - only what you can be sure of. 3. And your monte carlo search should have some intelligence built in like Mogo does, it's not 100 percent random. This is just an idea that hasn't been tried or tested as far as I know, something to be taken with a grain of salt! - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/