Re: [computer-go] KO in Hashtable-UCT?
I have serious problems with KO. UCT-Suzie plays generally strong, but makes terrible blunders in KO-positions. So far I do not even understand the problem. Could you describe it more detailed? I had also some serious SuperKO problems. UCT-Suzie was very clever to find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly. GameStack-Overflow. Chrilly So does your hash function consider all previous board states (for superKo)? If so, how? I can think of one way, but I don't use it since I have a tree that handles the allowable moves independent of the hashtable. When going down a variation the Hash and other Board-State Information like e.g. the KO-Point are stored on a stack. Starting from the current Top of Stack the detection goes down and search for the same hash-key and Ko-Point. Its the Repeated Position Detection method of chess. The Gamestack-Pointer is decremented by 2, one can stop, when a non-capturing move is done (in chess its the other way round). One can start 4 Plies from the top of stack. Due to the stoping criterion one has to check only a few entries (most of the time none). If a SuperKO occurs, the position is evaluated by the Material-Balance. BlackCaptures - WhiteCaptures + Komi. Probably a better way is to ignore the result. But I assumed that SuperKO is a rare event and the result has no significant impact on the search-tree. Maybe there is something wrong with this approach and the Ko-Problems I have a related to this simple SuperKO handling. I noticed several times that a direct transformation of chess methods has some subtle flaws. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
On 5/18/07, Chrilly [EMAIL PROTECTED] wrote: When going down a variation the Hash and other Board-State Information like e.g. the KO-Point are stored on a stack. Starting from the current Top of Stack the detection goes down and search for the same hash-key and Ko-Point. Its the Repeated Position Detection method of chess. The Gamestack-Pointer is decremented by 2, one can stop, when a non-capturing move is done (in chess its the other way round). Non-capturing moves can create repetition (but there will of course be captures elsewhere in the cycle). Fortunately, there are other simple criteria. E.g., you can stop whenever a move is played on an intersection for the first time. If a SuperKO occurs, the position is evaluated by the Material-Balance. BlackCaptures - WhiteCaptures + Komi. I guess you mean the *change* in material balance after one cycle. I don't see why komi should be used here. Probably a better way is to ignore the result. Only for balanced cycles. Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
Non-capturing moves can create repetition (but there will of course be captures elsewhere in the cycle). So far the SuperKOs I have found where a round-trip of KOs. Fortunately, there are other simple criteria. E.g., you can stop whenever a move is played on an intersection for the first time. Okay. If a SuperKO occurs, the position is evaluated by the Material-Balance. BlackCaptures - WhiteCaptures + Komi. I guess you mean the *change* in material balance after one cycle. I don't see why komi should be used here. No. I assume the game is over. As there is usually no reliable method for counting territory I use the difference of the captured stones as a - first approximation - for the evaluation. If e.g. both sides have captured - in the whole game - the same amount of stones, the eval is 0, but white wins due to Komi. At least for significant differences in captures this is true. MC-Runs are also stopped, when one sides leads in Capture by a large margin. Depending on the margin it produces only a negible bias. Only for balanced cycles. The cycles I have found so far are balanced. Each side captures in turn 1 stone. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
On 5/18/07, chrilly [EMAIL PROTECTED] wrote: Non-capturing moves can create repetition (but there will of course be captures elsewhere in the cycle). So far the SuperKOs I have found where a round-trip of KOs. Maybe the others are just hiding from you ;-) Fortunately, there are other simple criteria. E.g., you can stop whenever a move is played on an intersection for the first time. Okay. If a SuperKO occurs, the position is evaluated by the Material-Balance. BlackCaptures - WhiteCaptures + Komi. I guess you mean the *change* in material balance after one cycle. I don't see why komi should be used here. No. I assume the game is over. As there is usually no reliable method for counting territory I use the difference of the captured stones as a - first approximation - for the evaluation. If e.g. both sides have captured - in the whole game - the same amount of stones, the eval is 0, but white wins due to Komi. At least for significant differences in captures this is true. MC-Runs are also stopped, when one sides leads in Capture by a large margin. Depending on the margin it produces only a negible bias. Hmm... To me it seems more natural that the evaluation primarily depends on the nature of the cycle. Your program may choose to play a cycle because of some arbitrary material balance value, which will be meaningless if you can't break out of that cycle. IMO you should simply treat balanced cycles as draws. Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
It took me a long time to get around my mental block and accept the advice of everyone here, but your intuition is correct: superko is so rare, and so expensive to detect, that you should NOT check for it on every move. Orego's current approach is: During search, just maintain a single ko point and ignore superko. (There is a maximum number of moves per game to prevent infinitely long games.) After search, when actually making a move: 1) Make a copy of the board 2) Compute the Zobrist hash of the current position from scratch 3) Check for superko violations (against a stack of previous Zobrist hashes for positions in the real game,) 4) If there is a violation, go back to the copy and try the next best move Thanks, Peter Drake http://www.lclark.edu/~drake/ On May 17, 2007, at 11:00 PM, Chrilly wrote: I have serious problems with KO. UCT-Suzie plays generally strong, but makes terrible blunders in KO-positions. So far I do not even understand the problem. Could you describe it more detailed? I had also some serious SuperKO problems. UCT-Suzie was very clever to find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly. GameStack-Overflow. Chrilly So does your hash function consider all previous board states (for superKo)? If so, how? I can think of one way, but I don't use it since I have a tree that handles the allowable moves independent of the hashtable. When going down a variation the Hash and other Board-State Information like e.g. the KO-Point are stored on a stack. Starting from the current Top of Stack the detection goes down and search for the same hash-key and Ko-Point. Its the Repeated Position Detection method of chess. The Gamestack-Pointer is decremented by 2, one can stop, when a non-capturing move is done (in chess its the other way round). One can start 4 Plies from the top of stack. Due to the stoping criterion one has to check only a few entries (most of the time none). If a SuperKO occurs, the position is evaluated by the Material- Balance. BlackCaptures - WhiteCaptures + Komi. Probably a better way is to ignore the result. But I assumed that SuperKO is a rare event and the result has no significant impact on the search-tree. Maybe there is something wrong with this approach and the Ko- Problems I have a related to this simple SuperKO handling. I noticed several times that a direct transformation of chess methods has some subtle flaws. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
On 5/18/07, Peter Drake [EMAIL PROTECTED] wrote: It took me a long time to get around my mental block and accept the advice of everyone here, but your intuition is correct: superko is so rare, and so expensive to detect, that you should NOT check for it on every move. In dimwit, we check for superko while descending the UCT tree, and only check basic ko in the MC playouts. Our check involves a single lookup (of the current, incrementally computed Zobrist key) in a hash_map of previous Zobrist keys, which is not all that expensive. I agree that doing superko checks in MC playouts would cause an unnecesary slowdown. regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
After search, when actually making a move: 1) Make a copy of the board 2) Compute the Zobrist hash of the current position from scratch 3) Check for superko violations (against a stack of previous Zobrist hashes for positions in the real game,) 4) If there is a violation, go back to the copy and try the next best move But then you have an engine which does not converge to perfect play given infinite resources. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Keep the common case fast (was Re: [computer-go] KO in Hashtable-UCT?)
Yes, that sounds reasonable too. In thinking about optimization, the rule as always is keep the common case fast. The cases in UCT Go, in decreasing order of commonness, are: 1) Play a move in an MC playout 2) Play a UCT tree move 3) Start (or finish) a playout 4) Make (or accept) a move in the actual game 5) Initialize data structures and the beginning of the game 1 has to be blazingly fast. 2-3 have to be relatively fast. 4-5 don't matter much unless you're spending an outrageous amount of time on them. Also note that almost all UCT moves are played from nodes with very few children (often 0 or 1). Peter Drake http://www.lclark.edu/~drake/ On May 18, 2007, at 9:15 AM, John Tromp wrote: On 5/18/07, Peter Drake [EMAIL PROTECTED] wrote: It took me a long time to get around my mental block and accept the advice of everyone here, but your intuition is correct: superko is so rare, and so expensive to detect, that you should NOT check for it on every move. In dimwit, we check for superko while descending the UCT tree, and only check basic ko in the MC playouts. Our check involves a single lookup (of the current, incrementally computed Zobrist key) in a hash_map of previous Zobrist keys, which is not all that expensive. I agree that doing superko checks in MC playouts would cause an unnecesary slowdown. regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
True... My experience has been that (largely) ignoring the extremely rare case of superko is a better use of the finite resources we have. Have others found the same thing? Peter Drake http://www.lclark.edu/~drake/ On May 18, 2007, at 9:19 AM, Chris Fant wrote: After search, when actually making a move: 1) Make a copy of the board 2) Compute the Zobrist hash of the current position from scratch 3) Check for superko violations (against a stack of previous Zobrist hashes for positions in the real game,) 4) If there is a violation, go back to the copy and try the next best move But then you have an engine which does not converge to perfect play given infinite resources. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
Peter Drake said: It took me a long time to get around my mental block and accept the advice of everyone here, but your intuition is correct: superko is so rare, and so expensive to detect, that you should NOT check for it on every move. Depending on how a program works, you may well need to check for repetitions other than straight ko, ie black white a2--pass b2--pass b1--a1 a2... Forrest Curo - This email was sent using AIS WebMail. http://www.americanis.net/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: computer-go Digest, Vol 34, Issue 15
Very interesting paper! I have one question. The assumption in your paper is that increasing the performance of the simulation player will increase the performance of Monte-Carlo methods that use that simulation player. However, we found in MoGo that this is not necessarily the case! Do you think there is some property of your learning algorithm that makes it particularly suitable for Monte-Carlo methods? Thanks! Dave Date: Thu, 17 May 2007 01:17:55 +0200 From: R?mi Coulom [EMAIL PROTECTED] Subject: [computer-go] Amsterdam 2007 paper To: computer-go computer-go@computer-go.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi, I first thought I would keep my ideas secret until the Asmterdam tournament, but now that I have submitted my paper, I cannot wait to share it. So, here it is: http://remi.coulom.free.fr/Amsterdam2007/ Comments and questions are very welcome. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Amsterdam 2007 paper
I also use an online learning algorithm in RLGO to adjust feature weights during the game. I use around a million features (all possible patterns from 1x1 up to 3x3 at all locations on the board) and update the weights online from simulated games using temporal difference learning. I also use the sum of feature weights to estimate the value of a move, rather than a multiplicative estimate. The learning signal is only win/lose at the end of each simulation, rather than supervised learning like Remi. The results are encouraging (currently ~1820 Elo on CGOS, based on 5000 simulations per move) for a program that does not use UCT or Monte-Carlo Tree Search in any way. Like Alvaro and Remi, I am convinced that online learning during simulation has great potential. In fact, I would go even further, and suggest that Monte-Carlo Tree Search and UCT are at one end of a large spectrum of promising sample-based search algorithms. We don't need to restrict ourselves to a tabular representation: state abstraction (for example, using pattern features) provides a way to generalise between positions and learn more efficiently. Nor do we need to restrict our learning algorithm to Monte-Carlo: any online reinforcement learning algorithm can be applied to learn from simulated experience. In classical game programming terms, we can think of this approach as dynamically learning an evaluation function. Instead of learning a single evaluation function that applies to all positions, we tailor the evaluation function to the current position, by learning from simulated experience. Rather than thinking of learning as an offline procedure that takes months to train weights, let's think of it as an online procedure that can run in a few seconds. Once we learn the evaluation function, it can be used in any way we please - for example alpha-beta search (RLGO 2.1 uses a full width 5-ply search) or as a heuristic function for UCT (current research in MoGo). I would be very interested to hear more about Dimwit's approach, and also Remi's experiments with online learning in CrazyStone. -Dave PS Apologies if you receive this twice, I had some difficulties posting. Rémi: John and I are planning a similar usage of patterns and features in the MC simulations, although we were thinking of a very different method to update strengths. Although we derived our formulas from an extension of the logistic regression instead of Bradley-Terry models, we arrived at very similar expressions. The main difference is that what we have been calling score seems to be the log of your strength, and instead of multiplying strengths when considering teams, we just add the scores of different features. We use an online learning algorithm to adjust the scores during the game, which severely limits the kind of algorithms we can use, but also allows us to learn things that apply specifically to the situations that are appearing on the board in this game. Now we only have to make dimwit 400 points stronger to show the world that our approach works. :) There are many things in the paper that we had never thought of, like considering the distance to the penultimate move. Most of those things won't be directly applicable to dimwit, but it gives us many suggestions of things to try. Thanks for this important contribution to computer go. Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Amsterdam 2007 paper
Thanks for the great paper. And thanks for sharing it before it's published. Now I know what directions to take my engine in next. Time for Team MoGo so share some more secrets :) We are publishing MoGo's secrets at ICML 2007, in just over a month. So not long to wait now! -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
I don't think you can ignore superko in the UCT search - you will lose games. You will also lose games even if you only check the move you will actually play in the game. But in the play-outs, it has almost no value whatsoever, even if it was free. (It probably has SOME value, but miniscule and clearly would weaken the program because you would do less play-outs in a given amount of time.) If you keep superko tests in the UCT tree, you have a scalable program. - Don On Fri, 2007-05-18 at 09:33 -0700, Peter Drake wrote: True... My experience has been that (largely) ignoring the extremely rare case of superko is a better use of the finite resources we have. Have others found the same thing? Peter Drake http://www.lclark.edu/~drake/ On May 18, 2007, at 9:19 AM, Chris Fant wrote: After search, when actually making a move: 1) Make a copy of the board 2) Compute the Zobrist hash of the current position from scratch 3) Check for superko violations (against a stack of previous Zobrist hashes for positions in the real game,) 4) If there is a violation, go back to the copy and try the next best move But then you have an engine which does not converge to perfect play given infinite resources. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
John's post is almost identical to mine! Sorry about that. I could have just said ditto for Lazarus. - Don On Fri, 2007-05-18 at 12:15 -0400, John Tromp wrote: On 5/18/07, Peter Drake [EMAIL PROTECTED] wrote: It took me a long time to get around my mental block and accept the advice of everyone here, but your intuition is correct: superko is so rare, and so expensive to detect, that you should NOT check for it on every move. In dimwit, we check for superko while descending the UCT tree, and only check basic ko in the MC playouts. Our check involves a single lookup (of the current, incrementally computed Zobrist key) in a hash_map of previous Zobrist keys, which is not all that expensive. I agree that doing superko checks in MC playouts would cause an unnecesary slowdown. regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
Lazarus does full superko in the UCT search - but the play-out only look at simple KO. In the play-outs, I'm pretty sure infinite play-outs due to not using superko are possible - even with the randomness.But I have a limit on the length of the play-out games because when you use heavy play-outs the games can occasionally last for hundreds of moves.The more randomness you take out of the play-outs the more likely you will get ridiculously long loops. Superko would solve this, but it would also slow down the play-outs significantly.I see no reason to have the refinement of superko in a random play-out. - Don On Fri, 2007-05-18 at 09:01 -0700, Peter Drake wrote: It took me a long time to get around my mental block and accept the advice of everyone here, but your intuition is correct: superko is so rare, and so expensive to detect, that you should NOT check for it on every move. Orego's current approach is: During search, just maintain a single ko point and ignore superko. (There is a maximum number of moves per game to prevent infinitely long games.) After search, when actually making a move: 1) Make a copy of the board 2) Compute the Zobrist hash of the current position from scratch 3) Check for superko violations (against a stack of previous Zobrist hashes for positions in the real game,) 4) If there is a violation, go back to the copy and try the next best move Thanks, Peter Drake http://www.lclark.edu/~drake/ On May 17, 2007, at 11:00 PM, Chrilly wrote: I have serious problems with KO. UCT-Suzie plays generally strong, but makes terrible blunders in KO-positions. So far I do not even understand the problem. Could you describe it more detailed? I had also some serious SuperKO problems. UCT-Suzie was very clever to find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly. GameStack-Overflow. Chrilly So does your hash function consider all previous board states (for superKo)? If so, how? I can think of one way, but I don't use it since I have a tree that handles the allowable moves independent of the hashtable. When going down a variation the Hash and other Board-State Information like e.g. the KO-Point are stored on a stack. Starting from the current Top of Stack the detection goes down and search for the same hash-key and Ko-Point. Its the Repeated Position Detection method of chess. The Gamestack-Pointer is decremented by 2, one can stop, when a non-capturing move is done (in chess its the other way round). One can start 4 Plies from the top of stack. Due to the stoping criterion one has to check only a few entries (most of the time none). If a SuperKO occurs, the position is evaluated by the Material-Balance. BlackCaptures - WhiteCaptures + Komi. Probably a better way is to ignore the result. But I assumed that SuperKO is a rare event and the result has no significant impact on the search-tree. Maybe there is something wrong with this approach and the Ko-Problems I have a related to this simple SuperKO handling. I noticed several times that a direct transformation of chess methods has some subtle flaws. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: computer-go Digest, Vol 34, Issue 15
David Silver wrote: Very interesting paper! I have one question. The assumption in your paper is that increasing the performance of the simulation player will increase the performance of Monte-Carlo methods that use that simulation player. However, we found in MoGo that this is not necessarily the case! Do you think there is some property of your learning algorithm that makes it particularly suitable for Monte-Carlo methods? Thanks! Dave Maximizing the likelihood does not optimize the performance of the simulation player. For instance, by making it more greedy, I am sure it would become a stronger player. I have the feeling that maximizing the likelihood produces a good balance between playing good moves and being random. It would be worth testing the strength of the MC player with more or less greedy versions of the random player to test this. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: computer-go Digest, Vol 34, Issue 15
On 5/18/07, Rémi Coulom [EMAIL PROTECTED] wrote: David Silver wrote: Very interesting paper! I have one question. The assumption in your paper is that increasing the performance of the simulation player will increase the performance of Monte-Carlo methods that use that simulation player. However, we found in MoGo that this is not necessarily the case! Do you think there is some property of your learning algorithm that makes it particularly suitable for Monte-Carlo methods? Thanks! Dave Maximizing the likelihood does not optimize the performance of the simulation player. For instance, by making it more greedy, I am sure it would become a stronger player. I have the feeling that maximizing the likelihood produces a good balance between playing good moves and being random. It would be worth testing the strength of the MC player with more or less greedy versions of the random player to test this. Rémi My own take on this is that we don't really know what the right probability distribution is to get MC simulations that predict well the outcome of the game. What we can do is provide indications of what properties these moves have and combine them with weights that will be tuned by playing many games. Rémi's way of computing a probability distribution is probably an acceptable way to pick, but it's perfectly possible that using some power of his strengths would actually give a stronger program. In our case, we are currently trying to estimate a score for each move, which roughly corresponds to how many points we think the move is worth. Then we'll make the probability of each move proportional to exp(A*score), where A is a number to be tuned by playing many games. One could also use other features by making the probability proportional to exp(A*score+B*is_capture+C*is_atari+...) but one would have to tune A, B, C, ... by playing even more games. I guess we could use the maximum-likelihood settings (what Rémi did) as an initial guess, and then try our hand at this difficult optimization problem by trying perturbations of those settings. There is a reason why just bought a powerful computer and we are going to buy more. :) Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] KO in Hashtable-UCT?
But then you have an engine which does not converge to perfect play given infinite resources. I assume that you're joking, given: a) a current lack of infinite resources, and b) a current lack of convergence of any kind. No, but I can rephrase for those spooked by the concept of infinity: But then you have an engine that stops scaling at an unknown point. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/