Re: [computer-go] Amsterdam 2007 paper
Hi Rémi, Thank you for this paper. I found the work very interesting, well written, and the paper is clear and pleasant to read. As two things are modified in the same time (simulation policy and tree search), I wonder what is the contribution of each part in the overall improvement. For example I wonder what is the exact improvement of the new policy simulation on itself (without modifying UCT) over the one of MoGo. I guess you already have those results, but don't have enough room to put it on this paper. For example, if I remember correctly plain UCT with MoGo's simulation policy at 70k per move was 83% against gnugo 3.6. What is the result with your simulation policy? 85%/90%/95 %/ more? That would help to know where this approach is more usefull: simulation, tree or even both. I mean it is possible that combining the two improvements makes a stronger player that taking the sum of each improvement. If so, that would mean that some win-win effect arises, and the tree search part type has to be related to the simulation type. Again, very interesting work. Sylvain 2007/5/17, Rémi Coulom [EMAIL PROTECTED]: Hi, I first thought I would keep my ideas secret until the Asmterdam tournament, but now that I have submitted my paper, I cannot wait to share it. So, here it is: http://remi.coulom.free.fr/Amsterdam2007/ Comments and questions are very welcome. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Amsterdam 2007 paper
Thanks, for sharing this early! Everything in this paper makes perfect sense, and is well written. Reading it is like finding the last pieces of a difficult puzzle. I do however have some problem with the math of section 2.5 A Minorization-Maximization Algorithm. I have a feeling that a lot of definitions are left out here. What is A, B, M and N? I think I can get what N is. Somehow I have a feeling that you are using some special notation of summation here, but even so I have no clue what is summed. Also for Wi you use {} and || which might mean something special, but since I am already quite confused it hard to figure out. Later on the section Derivation of the Minorization-Maximization Formula seemed pretty clear to me although that was supposed to be more difficult. Best Magnus Quoting Rémi Coulom [EMAIL PROTECTED]: Hi, I first thought I would keep my ideas secret until the Asmterdam tournament, but now that I have submitted my paper, I cannot wait to share it. So, here it is: http://remi.coulom.free.fr/Amsterdam2007/ Comments and questions are very welcome. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] producing a good probability distribution over legal moves
I think there is something to this; it seems like it should be possible to use a database of randomly selected positions from games along with the best known followup, and use that as a faster way of testing a program's strength than playing full games. Such a database would be valuable for all sorts of approaches. The question is how you know whether the program has discovered a better move than the best known move, or just chose another move that's just as good? Probably it would have to be generalized into keeping a score for each followup, and again the question becomes how much to trust it and how to update it. Perhaps the scores could be kept from running a large number of simulations and we could test how rapidly the program converges on that using fewer resources. A weakness of this approach is that sometimes the best move depends on how you plan to follow it up; a program that plays the theoretically best move but doesn't know how to follow it up is weaker than a program that plays safer moves. - Brian On 5/16/07, George Dahl [EMAIL PROTECTED] wrote: I find Monte-Carlo Go a fascinating avenue of research, but what pains me is that a huge number of simulations are performed each game and at the end of the game the results are thrown out. So what I was thinking is that perhaps the knowledge generated by the simulations could be collapsed in some way. Suppose that epsilon greedy versions of a reasonably strong MC Go program were to play a very large number of games against themselves. By epsilon greedy versions I mean that with probability epsilon a random move is played and with probability 1- epsilon the move the MC Player would normally play is played. Each position in these games would be stored along with the Monte Calro/UCT evaluation for that position's desirability. This would produce an arbitrarily large database of position/score pairs. At this point a general function approximator / learning algorithm (such as a neural network) could be trained to map positions to scores. If this was successful, it would produce something that could very quickly (even a large neural net evaluation or what have you would be much faster than doing a large number of MC playouts) map positions to scores. Obviously the scores would not be perfect since the monte carlo program did not play anywhere near perfect Go. But this static evaluator could then be plugged back into the monte carlo player and used to bias the random playouts. Wouldn't it be useful to be able to quickly estimate the MC score without doing any playouts? Clearly this idea could be extended recursively with a lot of offline training. What makes this formulation more valuable is that given enough time and effort someone familiar with machine learning should be able to produce a learning architecture that can actually learn the MC scores. It would be a straightforward, if potentially quite difficult, supervised learning task with effectively unlimited data since more could be generated at will. Such a learning architecture could be used in the manner I described above or thrown at the more general reinforcement learning problem. Does anyone have any thoughts on this idea? Does anyone know of it being tried before? - George ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] producing a good probability distribution over legal moves
On 17, May 2007, at 8:17 AM, Brian Slesinsky wrote: A weakness of this approach is that sometimes the best move depends on how you plan to follow it up; a program that plays the theoretically best move but doesn't know how to follow it up is weaker than a program that plays safer moves. I have often said that when SlugGo makes what looks to me to be a really good move, my joyful surprise quickly turns to worry about what is about to happen. It is exactly this inability to make the proper continuing followup moves that is the problem. Cheers, David ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] producing a good probability distribution over legal moves
What you would have after your training/evaluator phase is a hueristic knowlege of possibly better montecarlo trees to consider. This will definitely cut down on the search space, but could also alienate a strong search path. I have been thinking along these same line for some time. The problem then lies in where you can decide what trees would be worth looking at initially. What about a database of professional games? Take the winning games as examples of strong searches that ended in a win. The problem is even more complex because where in the winning tree do you tell montecarlo to start searching? Will you assign a higher probability to each move in those games (defining a known probabilistically stronger predicted result)? That is one approach. The other approach is the purely simulated approach where you run simulations and gradually allow your probability function to evolve based on the results. Although this is a purer approach I think the aforementioned strategy may yield some useful experimentation. The strong point is that it will take advantage of the human brain's innate pattern recognition and calculation skills. Since we have recorded games we have plenty of examples of this thought process. For a 9Dan winning game, those trees surely are worth investigating... On 5/16/07, George Dahl [EMAIL PROTECTED] wrote: I find Monte-Carlo Go a fascinating avenue of research, but what pains me is that a huge number of simulations are performed each game and at the end of the game the results are thrown out. So what I was thinking is that perhaps the knowledge generated by the simulations could be collapsed in some way. Suppose that epsilon greedy versions of a reasonably strong MC Go program were to play a very large number of games against themselves. By epsilon greedy versions I mean that with probability epsilon a random move is played and with probability 1- epsilon the move the MC Player would normally play is played. Each position in these games would be stored along with the Monte Calro/UCT evaluation for that position's desirability. This would produce an arbitrarily large database of position/score pairs. At this point a general function approximator / learning algorithm (such as a neural network) could be trained to map positions to scores. If this was successful, it would produce something that could very quickly (even a large neural net evaluation or what have you would be much faster than doing a large number of MC playouts) map positions to scores. Obviously the scores would not be perfect since the monte carlo program did not play anywhere near perfect Go. But this static evaluator could then be plugged back into the monte carlo player and used to bias the random playouts. Wouldn't it be useful to be able to quickly estimate the MC score without doing any playouts? Clearly this idea could be extended recursively with a lot of offline training. What makes this formulation more valuable is that given enough time and effort someone familiar with machine learning should be able to produce a learning architecture that can actually learn the MC scores. It would be a straightforward, if potentially quite difficult, supervised learning task with effectively unlimited data since more could be generated at will. Such a learning architecture could be used in the manner I described above or thrown at the more general reinforcement learning problem. Does anyone have any thoughts on this idea? Does anyone know of it being tried before? - George ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Professional Go player at the Computer Olympiad
Dear all, We are proud to announce that Guo Juan, 5d professional from China, will be present at the Computer Olympiad in Amsterdam on the Sunday 17th of June. She will play against the winner of the Olympiad and comment the most interesting games. We will broadcast as many Go games and comments as possible on KGS. The website of the Olympiad can be found here: http://amsterdam2007.icga.org/ Best regards, and hopping to meet some of you there, Guillaume ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Amsterdam 2007 paper
This is very interesting. I've skimmed so far and don't understand everything yet, but I can make some comments on readability. Table 1 takes some study to understand. If I understand it correctly, the Feature column might be more clearly labeled Feature of Candidate Move. (For example, Pass means that the candidate move is a pass.) The Level column might be more clearly labeled the Alternative for that feature. It was unclear what distance to the move before means at first. It sounds like this is distance to the move before the previous move or maybe distance from own previous move. - Brian On 5/16/07, Rémi Coulom [EMAIL PROTECTED] wrote: Hi, I first thought I would keep my ideas secret until the Asmterdam tournament, but now that I have submitted my paper, I cannot wait to share it. So, here it is: http://remi.coulom.free.fr/Amsterdam2007/ Comments and questions are very welcome. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
It seems that e-mail at my university does not work any more. I have received none of the replies to my message of yesterday, but I could read them on the web archives of the list. So I have registered from another address, and will answer to the questions I have read on the web. In section 2.3, you say: Also, the generalization to teams assumes that the strength of a team is the sum (in terms of Elo ratings) of the strengths of its members. But it seems from the equation just above that section that you means the product and not the sum. Or did I misunderstand something? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] JCIS extended abstract
Dear all, Following the example of Rémi, I would like to share with you a paper that I wrote which describe some important elements of my Go program Mango. I submitted this paper recently to the JCIS workshop 2007. Due to the fact that it was an extended abstract, a lot of details are missing. I am now working on the full paper, which will contain much more information. The extended abstract can be found here: www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf Cheers, Guillaume PS: To answer's Sylvain question: Modifying the probability distribution of moves in the tree was more effective in Mango than the improvement done on the simulation strategy. However, Mango simulation strategy is not as good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the last KGS tournament because it had a better way to use domain knowledge in the Tree. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Amsterdam 2007 paper
Thanks for the great paper. And thanks for sharing it before it's published. Now I know what directions to take my engine in next. Time for Team MoGo so share some more secrets :) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Álvaro Begué wrote: There are many things in the paper that we had never thought of, like considering the distance to the penultimate move. That feature improved the effectiveness of progressive widening a lot. When I had only the distance to the previous move, and the opponent made a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to move the focus of local search away from the danger. Considering the distance to the penultimate move fixes a lot of the problems of local search. Maybe I should try to consider distances to even older moves. Also the approach you devised with John seems to be very similar to what I do, indeed. I also worked on the idea of adjusting the random policy online to adapt to the current position, with a method that looks like incremental Elo rating algorithms. I am deeply convinced that this can bring huge improvements. If the random policy only uses general patterns, whatever tree search we do at the root won't be able to make a strong Go player. We have to find a way to influence random simulations, not only inside the tree near the root, but also far from the root. Once we know how to do this effectively, we'll have very strong programs. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] producing a good probability distribution over legal moves
On Thu, 2007-05-17 at 12:17 -0400, George Dahl wrote: Imagine if you had a monte carlo program that took almost no time to run. You would use it to do heavy playouts for another monte carlo program to make it even stronger. I tried something like this as a test with simple monte carlo. I called it recursive monte carlo. This was not UCT, it was just the simple statistic monte/carlo where you choose the single move that had the most success in totally random play-outs. This started when I noticed that even running 1 or 2 simulations gave far better play than random. So my idea was that instead of choosing moves randomly in the play-outs, I would choose each move in the playout by doing another set of play-outs - applying the same simulation idea recursively. I didn't expect it to be fast enough to be practical, but I thought that at least I could see if the idea was feasible. If I'm doing 1000 play-outs, the version that does the sub-play-outs should be far stronger than the one that didn't even if it's a lot slower. Instead, it was much slower and played worse. I didn't pursue this any farther, but I suspect that it should have worked. I think the problem is that I wasn't getting fair representation of each move. The sub-playouts were causing some moves not to get sampled much for instance. I am guessing that if I used such a technique for the play-out portion of the monte-carlo search, it would be incredibly strong for a given level if I only considered the number of play-outs of the outer-level. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Yes, now I understand. I think what made it hard for me conceptually was that P(Rj) can be rewritten n different ways for each feature ci 1 = i = n. I had this problem with your example too. I first thought that the lines with the factors were arbitarary, but then I realized that each line goes with one way of rewriting P(Rj) it all became clear. Quoting Rémi Coulom [EMAIL PROTECTED]: to Magnus: If you consider the example of section 2.2: 1,2,3 wins against 4,2 and 1,5,6,7. The probability is P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example: N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2 N2j=c1c3,B2j=0 N3j=c1c2,B3j=0 N4j=0,B4j=c1c2c3 I will add this example to the paper if it makes things clearer. Regarding the definition of Wi, |{Nij|Nij!=0}| means number of elements of the set of all the Nij that verify Nij!=0. It is incorrect. It should be |{j|Nij!=0}|. I'll fix it in the final version. As I wrote in the next sentence, it is equal to the number of wins of i. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Amsterdam 2007 paper
I first thought I would keep my ideas secret until the Asmterdam tournament, but now that I have submitted my paper, I cannot wait to share it. So, here it is: http://remi.coulom.free.fr/Amsterdam2007/ Comments and questions are very welcome. I'd like to propose a potential direction of further research. In your paper, you acknowledge that the strong assumption that each feature's Elo can be added to form the feature team Elo may not be correct all the time. The Stern/Herbrich/Graepel method did not need to make this assumption because for them a feature team was it's own first class feature (leading to exponential growth of the number of features). You could evaluate the degree to which each feature violates additive-Elo assumption by distributing that feature to all the other features and retesting the prediction rate. For example, instead of having features {Pass, Capture, Extension}, you would evaluate the Pass feature additive-Elo assumption by testing with features {Capture, Extension, Pass-Capture, Pass-Extension}. This obviously leads to more first class features, but you can test each one one at a time to see if it is worth it. Or at least to validate that the additive-Elo assumption is okay in most cases. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Amsterdam 2007 paper
Very interesting paper: many ideas for me to study. And thanks for giving us an early look at it. I'll make one suggestion: in the final version, it deserves some better salesmanship in the results section. I've read through too many papers, only to reach the results section at the end and see something like although our program was unable to defeat Wally on a 9x9 board, the authors still feel... So now I look there first to see if I want to invest time on the paper. When I look at the end of this paper, what jumps out is 37.5% victories against Gnugo. IMHO, it's asking an awful lot of potential readers to expect them to recognize that this is an impressive result, which of course it is. I think it would be easier for people to understand if the Performance against Gnugo section started with 9x9 results against gnugo, maybe mentioned scaling issues and the 13x13 KGS results, and then went on to 19x19. Just my 2 cents. Now back to studying the paper. Dave Hillis Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] JCIS extended abstract
How do you handle n_i = 0 in equation 1.2? I'd assume that if a heuristic says a move is really bad (H_B = 0), then you'd want to avoid simulations for a while. Also, has mango experimented with other related strategies for a soft transition? In my own mathematical recreation, I came up with a slightly variant method. Just to keep the text short, here's the notation that I'll use... N = number of simulations through parent node n_i = number of simulations through this node w_i = winning simulations through this node v_i = w_i / n_i H_B = heuristic_bias n_h = heuristic confidence (used in my equation) UCTdelta = sqrt(ln(N)/n_i) p_hat = estimated probability of winning with soft transition ... then equation 1.2 is p_hat + UCTdelta For you, p_hat = (w_i + H_B)/n_i I derived and posted the following formula to the list: p_hat = (w_i + n_h*H_B)/(n_i+n_h) The two equations are fairly similar... especially if n_h is set to 1. Then the only difference is that I replace n_i with (n_i+1)... and allows p_hat to be calculated with n_i = 0. I made no attempt to handle UCTdelta, but replacing n_i in there with n_i+n_h is probably reasonable On 5/17/07, Chaslot G (MICC) [EMAIL PROTECTED] wrote: Dear all, Following the example of Rémi, I would like to share with you a paper that I wrote which describe some important elements of my Go program Mango. I submitted this paper recently to the JCIS workshop 2007. Due to the fact that it was an extended abstract, a lot of details are missing. I am now working on the full paper, which will contain much more information. The extended abstract can be found here: www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf Cheers, Guillaume PS: To answer's Sylvain question: Modifying the probability distribution of moves in the tree was more effective in Mango than the improvement done on the simulation strategy. However, Mango simulation strategy is not as good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the last KGS tournament because it had a better way to use domain knowledge in the Tree. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
On 5/17/07, Rémi Coulom [EMAIL PROTECTED] wrote: Álvaro Begué wrote: There are many things in the paper that we had never thought of, like considering the distance to the penultimate move. That feature improved the effectiveness of progressive widening a lot. When I had only the distance to the previous move, and the opponent made a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to move the focus of local search away from the danger. Considering the distance to the penultimate move fixes a lot of the problems of local search. Maybe I should try to consider distances to even older moves. Nice paper, thanks! It's funny to see the distance to the last move now being used as an effective feature in Mont Carlo. Only a few years ago I got a lot of religious criticism when I used it in my move predictor... Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
I have now also finished a first version of UCT-Suzie (in parallel the Peter Woitke works on the Alpha-Beta Version). UCT-Suzie uses a hashtable, mainly because I found the programming of the tree too complicated. The Monte-Carlo part uses some simple patterns according the MoGo article. Progress is rather slow, because I am working (more than) full-time on FPGA-projects in Computer-Tomography. Here are the problems with hash tables as a tree: 1. Time - it is more expensive - you must gather the children together when making decisions about which node to expand (which generally involves re-generating the keys by making all the legal moves.) There are ways around this that trade space for time but in either case it is more expensive. I do not understand this. In UCT-Suzie the moves are generated, when a new leaf-node is reached. The Hashtable has a link to the move-list. When the node is reached the next time, the moves must not be generated again. Just the calculation of the UCT-Urgency value (WinRate + sqrt() ) has to be done. I assume that this calculation has to be done also in a tree representation. I see no difference in this respect with Gunnars Gnu-GO UCT code. Memory is at least for 9x9 no problem. The number of Monte-Carlo runs/sec. is about 17K (9x9). This can be improved, because the UCT-Player uses the Alpha-Beta DoMove/UndoMove functions which are overkill for UCT. 2. GHI - you must take special care to deal with Graph History Interaction - primarly recognizing that ko situations are different. You can get by with relatively simple solutions that don't fully address this issue but it's still imperfect. I have serious problems with KO. UCT-Suzie plays generally strong, but makes terrible blunders in KO-positions. So far I do not even understand the problem. Could you describe it more detailed? I had also some serious SuperKO problems. UCT-Suzie was very clever to find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly. GameStack-Overflow. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Amsterdam 2007 paper
Chris Fant wrote: I first thought I would keep my ideas secret until the Asmterdam tournament, but now that I have submitted my paper, I cannot wait to share it. So, here it is: http://remi.coulom.free.fr/Amsterdam2007/ Comments and questions are very welcome. I'd like to propose a potential direction of further research. In your paper, you acknowledge that the strong assumption that each feature's Elo can be added to form the feature team Elo may not be correct all the time. The Stern/Herbrich/Graepel method did not need to make this assumption because for them a feature team was it's own first class feature (leading to exponential growth of the number of features). You could evaluate the degree to which each feature violates additive-Elo assumption by distributing that feature to all the other features and retesting the prediction rate. For example, instead of having features {Pass, Capture, Extension}, you would evaluate the Pass feature additive-Elo assumption by testing with features {Capture, Extension, Pass-Capture, Pass-Extension}. This is a bad example, because Pass and Capture are mutually exclusive, and so are Pass and Extension, but I understand the idea, and I think it is very good. In fact, I had planned to add it to the conclusion of the paper. That is the reason why I wrote that relevance of the model section. But it was 1 AM when I finished the paper, and I had been writing it since early in the morning, so I decided I wouldn't start explaining something complicated, sorry. Another (less important) problem I forgot to mention in the paper, regarding the validity of the model, is the hypothesis of independence. When training from consecutive moves of the same game, it may not be reasonable to consider that all the positions are independent. Sampling one or two random position in every game might be better. Thanks for your remarks, Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Hi Rémi, 2007/5/17, Rémi Coulom [EMAIL PROTECTED]: to Sylvain: Here are tests of Crazy Stone at 90s/game 1CPU against GNU 3.6 level 10, measured over about 200 games [...] Thank you for your answer. These numbers are interesting. The improvement in the tree search is really huge. It is what I expected and is consistent with my own experiments (different of course, but comparing in the same class). That is good to be consistent, at least we have a chance to understand something :-p. Unfortunately I don't have time in the next weeks to read it really carefully and make (I hope) more interesting comments. But there are already so many interesting comments on the list ;-). Again thank you for this interesting work. Cheers, Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
On 5/17/07, Chrilly [EMAIL PROTECTED] wrote: I have now also finished a first version of UCT-Suzie (in parallel the Peter Woitke works on the Alpha-Beta Version). UCT-Suzie uses a hashtable, mainly because I found the programming of the tree too complicated. The Monte-Carlo part uses some simple patterns according the MoGo article. Progress is rather slow, because I am working (more than) full-time on FPGA-projects in Computer-Tomography. Here are the problems with hash tables as a tree: 1. Time - it is more expensive - you must gather the children together when making decisions about which node to expand (which generally involves re-generating the keys by making all the legal moves.) There are ways around this that trade space for time but in either case it is more expensive. I do not understand this. In UCT-Suzie the moves are generated, when a new leaf-node is reached. The Hashtable has a link to the move-list. When the node is reached the next time, the moves must not be generated again. Just the calculation of the UCT-Urgency value (WinRate + sqrt() ) has to be done. I assume that this calculation has to be done also in a tree representation. I see no difference in this respect with Gunnars Gnu-GO UCT code. Memory is at least for 9x9 no problem. The number of Monte-Carlo runs/sec. is about 17K (9x9). This can be improved, because the UCT-Player uses the Alpha-Beta DoMove/UndoMove functions which are overkill for UCT. 2. GHI - you must take special care to deal with Graph History Interaction - primarly recognizing that ko situations are different. You can get by with relatively simple solutions that don't fully address this issue but it's still imperfect. I have serious problems with KO. UCT-Suzie plays generally strong, but makes terrible blunders in KO-positions. So far I do not even understand the problem. Could you describe it more detailed? I had also some serious SuperKO problems. UCT-Suzie was very clever to find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly. GameStack-Overflow. Chrilly So does your hash function consider all previous board states (for superKo)? If so, how? I can think of one way, but I don't use it since I have a tree that handles the allowable moves independent of the hashtable. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Amsterdam 2007 paper
Rémi Coulom wrote: to Magnus: If you consider the example of section 2.2: 1,2,3 wins against 4,2 and 1,5,6,7. The probability is P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example: N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2 N2j=c1c3,B2j=0 N3j=c1c2,B3j=0 N4j=0,B4j=c1c2c3 I will add this example to the paper if it makes things clearer. I'd recommend not using N_ij as an arbitrary constant when N is used for a completely different purpose in the paper. My other stumbling block was the jump to the f(x) equation. Calling it f(y_i) and putting the definition of W_i near there would help some. Putting log(N_ij*y_i +B_ij) = delta(N_ij)*[log(N_ij)+log(y_i)] + delta(B_ij)*log(N_ij) would make the jump very easy. by delta(x) = 1 when x=0 and 0 for anything else. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] JCIS extended abstract
How do you handle n_i = 0 in equation 1.2? I don't compute it when n_i is not equal to 1. At that point, I give every node a high upper bound in order to visit every node once. Remember that I am using progressive unpruning in the same time, so only the best nodes according to the heuristic are visited. I'd assume that if a heuristic says a move is really bad (H_B = 0), then you'd want to avoid simulations for a while. Yes! This is exactly the purpose of progressive unprunning. p_hat = (w_i + n_h*H_B)/(n_i+n_h) Interesting... But then how do you compute n_h in practice If I understood Remi correctly, he computes the value by adding one virtual loss and one virtual win for each gamma, ie, with your notation: p_hat= (w_i+gamma)/(gamma*2+n_i) This formula tends to center the probability distribution on 0.5. But if a pattern is good, I would rather shift the distribution towards 1! Adding a term in gamma/n_i as I do in my paper is a solution to shift the probability distribution to towards 1 which works well in practice. For the full version of my paper I will compare different ways to modify the probability distribution according to knowledge. I believe there is no optimal way to do that :( This problem is fundamental, because it gives the opportunity to include previously learned knowledge into Monte-Carlo Tree Search. -Original Message- From: [EMAIL PROTECTED] on behalf of Jason House Sent: Thu 17/05/2007 22:26 To: computer-go Subject: Re: [computer-go] JCIS extended abstract How do you handle n_i = 0 in equation 1.2? I'd assume that if a heuristic says a move is really bad (H_B = 0), then you'd want to avoid simulations for a while. Also, has mango experimented with other related strategies for a soft transition? In my own mathematical recreation, I came up with a slightly variant method. Just to keep the text short, here's the notation that I'll use... N = number of simulations through parent node n_i = number of simulations through this node w_i = winning simulations through this node v_i = w_i / n_i H_B = heuristic_bias n_h = heuristic confidence (used in my equation) UCTdelta = sqrt(ln(N)/n_i) p_hat = estimated probability of winning with soft transition ... then equation 1.2 is p_hat + UCTdelta For you, p_hat = (w_i + H_B)/n_i I derived and posted the following formula to the list: p_hat = (w_i + n_h*H_B)/(n_i+n_h) The two equations are fairly similar... especially if n_h is set to 1. Then the only difference is that I replace n_i with (n_i+1)... and allows p_hat to be calculated with n_i = 0. I made no attempt to handle UCTdelta, but replacing n_i in there with n_i+n_h is probably reasonable On 5/17/07, Chaslot G (MICC) [EMAIL PROTECTED] wrote: Dear all, Following the example of Rémi, I would like to share with you a paper that I wrote which describe some important elements of my Go program Mango. I submitted this paper recently to the JCIS workshop 2007. Due to the fact that it was an extended abstract, a lot of details are missing. I am now working on the full paper, which will contain much more information. The extended abstract can be found here: www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf Cheers, Guillaume PS: To answer's Sylvain question: Modifying the probability distribution of moves in the tree was more effective in Mango than the improvement done on the simulation strategy. However, Mango simulation strategy is not as good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the last KGS tournament because it had a better way to use domain knowledge in the Tree. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ winmail.dat___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] JCIS extended abstract
Chaslot G (MICC) wrote: p_hat = (w_i + n_h*H_B)/(n_i+n_h) Interesting... But then how do you compute n_h in practice The mathematical derivation is based on estimating an a-priori probability distribution. In theory, one simply needs to run MC simulations for a wide variety of heuristically identical situations, and then fit the best beta distribution to the measured data. A beta distribution has two parameters - alpha and beta. n_h = alpha+beta and n_h*H_B = alpha. In practice... I don't have an MC bot yet. I'm slowly redoing my bot in D (an up and coming programming language http://www.tiobe.com/tpci.htm). For the full version of my paper I will compare different ways to modify the probability distribution according to knowledge. I believe there is no optimal way to do that :( Well, if beta distributions are a good fit, then the above would be the optimal probability distribution... Of course, my analysis doesn't take tree searches into... Maybe I'll get lucky and it'll work well like a multi-armed bandit. Actually, even if it doesn't, being optimal before it's time to build a subtree may be enough. I think I've seen stuff like waiting until doing 100 simulations. If n_h is relatively small, the effect is probably sufficiently washed out by then and optimality probably doesn't matter. I guess if empirical evidence shows beta distributions are a good fit and a high n_h is appropriate, then I'll have to revisit the shortcomings... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/