Re: [computer-go] How to design the stronger playout policy?
I think the issue with MC or deterministic is following. With MC your value of the node can get more accurate with larger number of playouts. With a deterministic you don'thave such a mechanism here (there are other?mechanisms in UCT). DL -Original Message- From: Don Dailey <[EMAIL PROTECTED]> To: computer-go Sent: Wed, 9 Jan 2008 3:46 pm Subject: Re: [computer-go] How to design the stronger playout policy? Mark Boon wrote: > > On 8-jan-08, at 17:04, Don Dailey wrote: > >> And yes, it slows down the play-outs. Still, the play-outs seem to >> >> require a good bit of randomness - certainly they cannot be >> >> deterministic and it seems difficult to find the general principles that >> >> are important to the play-out policy. >> > > I was thinking about this and wanted to make sure I understand what > you mean by 'cannot be deterministic'. You could build a playing strategy that would always play the same exact game from a particular position and this would be deterministic but then the "Monte Carlo" component is completely missing from the program. You would still get some chaos from the tree itself since you always expand a node once visited. I don't think this is fully understood so I could be wrong in my statement - but it does seem likely that a certain amount of randomness will improve the results. Attempts to make the play-outs stronger and stronger have failed, perhaps due to the fact that this makes them less and less random, or perhaps it's due to some other phenomenon. - Don > For example a stone gets put into atari and it finds an escaping move. > With plain playouts the stone gets captured anyway roughly 50% of the > time. With heavy playouts it gives a weighting so that the chance of > escaping becomes larger? Or does it always play the escaping move and > just the rest of the playout is random? > > Mark > > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ More new features than ever. Check out the new AOL Mail ! - http://webmail.aol.com ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Mark Boon wrote: > > On 8-jan-08, at 17:04, Don Dailey wrote: > >> And yes, it slows down the play-outs. Still, the play-outs seem to >> >> require a good bit of randomness - certainly they cannot be >> >> deterministic and it seems difficult to find the general principles that >> >> are important to the play-out policy. >> > > I was thinking about this and wanted to make sure I understand what > you mean by 'cannot be deterministic'. You could build a playing strategy that would always play the same exact game from a particular position and this would be deterministic but then the "Monte Carlo" component is completely missing from the program. You would still get some chaos from the tree itself since you always expand a node once visited. I don't think this is fully understood so I could be wrong in my statement - but it does seem likely that a certain amount of randomness will improve the results. Attempts to make the play-outs stronger and stronger have failed, perhaps due to the fact that this makes them less and less random, or perhaps it's due to some other phenomenon. - Don > For example a stone gets put into atari and it finds an escaping move. > With plain playouts the stone gets captured anyway roughly 50% of the > time. With heavy playouts it gives a weighting so that the chance of > escaping becomes larger? Or does it always play the escaping move and > just the rest of the playout is random? > > Mark > > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
On 8-jan-08, at 17:04, Don Dailey wrote: And yes, it slows down the play-outs. Still, the play-outs seem to require a good bit of randomness - certainly they cannot be deterministic and it seems difficult to find the general principles that are important to the play-out policy. I was thinking about this and wanted to make sure I understand what you mean by 'cannot be deterministic'. For example a stone gets put into atari and it finds an escaping move. With plain playouts the stone gets captured anyway roughly 50% of the time. With heavy playouts it gives a weighting so that the chance of escaping becomes larger? Or does it always play the escaping move and just the rest of the playout is random? Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Quoting Don Dailey <[EMAIL PROTECTED]>: And yes, it slows down the play-outs. Still, the play-outs seem to require a good bit of randomness - certainly they cannot be deterministic and it seems difficult to find the general principles that are important to the play-out policy. Not all changes to the playouts slows down the program. If the playouts become shorter as a consequence the program can speed up. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
That would be exciting seeing your team get involved with this Monte Carlo stuff, especially since you have some previous experience with this. - Don David Doshay wrote: > I have been interested in monte-carlo approaches to Go since running > my first MC simulations in magnetic phase transitions when I was in > graduate school in the 1980's. What held me back, even when the latest > crop of MC programs started winning against older stronger programs > and my program SlugGo, is that in the physics simulations we know on > theoretical grounds what the shape of the random distributions are, but > in Go we do not. I was amazed at how well UCT helps get around the > problem and still allows the use of nearly flat random distributions > (flat except for a few hand tuned rules). > > Recent work on the ELO ratings of patterns comes very close to what I > think is needed to move forward, and we are also working on ways to > determine biased probability distributions that are appropriate for Go > move selection. > > I have little doubt that such "weighted playout" (not really heavy or > light) > considerations will lead to the next major step in computer Go progress. > The advantage of such a method is that it intrinsically matches the basic > premise of MC: the right degree of randomness allows you to search the > problem space appropriately. > > Cheers, > David > > > > On 8, Jan 2008, at 11:04 AM, Don Dailey wrote: > >> I think Dave Hillis coined this term "heavy playouts." >> >> In the first programs the play-outs were uniformly random. Any move >> would get played with equal likelihood with the exception of eye-filling >> moves which don't get played at all of course. >> >> But it was found that the program improves if the play-outs are somewhat >> "managed". So Dave starting calling the original formula "light >> play-outs" and the slower but better method "heavy play-outs." >> >> And yes, it slows down the play-outs. Still, the play-outs seem to >> require a good bit of randomness - certainly they cannot be >> deterministic and it seems difficult to find the general principles that >> are important to the play-out policy. >> >> - Don > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
I have been interested in monte-carlo approaches to Go since running my first MC simulations in magnetic phase transitions when I was in graduate school in the 1980's. What held me back, even when the latest crop of MC programs started winning against older stronger programs and my program SlugGo, is that in the physics simulations we know on theoretical grounds what the shape of the random distributions are, but in Go we do not. I was amazed at how well UCT helps get around the problem and still allows the use of nearly flat random distributions (flat except for a few hand tuned rules). Recent work on the ELO ratings of patterns comes very close to what I think is needed to move forward, and we are also working on ways to determine biased probability distributions that are appropriate for Go move selection. I have little doubt that such "weighted playout" (not really heavy or light) considerations will lead to the next major step in computer Go progress. The advantage of such a method is that it intrinsically matches the basic premise of MC: the right degree of randomness allows you to search the problem space appropriately. Cheers, David On 8, Jan 2008, at 11:04 AM, Don Dailey wrote: I think Dave Hillis coined this term "heavy playouts." In the first programs the play-outs were uniformly random. Any move would get played with equal likelihood with the exception of eye- filling moves which don't get played at all of course. But it was found that the program improves if the play-outs are somewhat "managed". So Dave starting calling the original formula "light play-outs" and the slower but better method "heavy play-outs." And yes, it slows down the play-outs. Still, the play-outs seem to require a good bit of randomness - certainly they cannot be deterministic and it seems difficult to find the general principles that are important to the play-out policy. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
I think Dave Hillis coined this term "heavy playouts." In the first programs the play-outs were uniformly random. Any move would get played with equal likelihood with the exception of eye-filling moves which don't get played at all of course. But it was found that the program improves if the play-outs are somewhat "managed". So Dave starting calling the original formula "light play-outs" and the slower but better method "heavy play-outs." And yes, it slows down the play-outs. Still, the play-outs seem to require a good bit of randomness - certainly they cannot be deterministic and it seems difficult to find the general principles that are important to the play-out policy. - Don Mark Boon wrote: > > On 5-jan-08, at 11:48, Gian-Carlo Pascutto wrote: >> > >>> Would you explain the details of the playout policy? >> >> (1) Captures of groups that could not save themselves last move. >> (2) Save groups in atari due to last move by capturing or extending. >> (3) Patterns next to last move. >> (4) Global moves. >> > > Is this what is meant by 'heavy playout'? > > I suppose it slows down the playout considerably. Just keeping track > of actual liberties instead of pseudo-liberties should probably be > three times slower by itself. But in itself I think the idea is > interesting. I've always felt programming Go would go towards several > layers of playing engines. Each one a bit more sophisiticated built on > top of a more primitive one. > > Mark > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
On 5-jan-08, at 11:48, Gian-Carlo Pascutto wrote: > Would you explain the details of the playout policy? (1) Captures of groups that could not save themselves last move. (2) Save groups in atari due to last move by capturing or extending. (3) Patterns next to last move. (4) Global moves. Is this what is meant by 'heavy playout'? I suppose it slows down the playout considerably. Just keeping track of actual liberties instead of pseudo-liberties should probably be three times slower by itself. But in itself I think the idea is interesting. I've always felt programming Go would go towards several layers of playing engines. Each one a bit more sophisiticated built on top of a more primitive one. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Don Dailey wrote: >Lazarus uses a system very simlar to the original MoGo policy as >documented in the paper. However I did find one significant >improvement.I used Rémi's ELO system to rate patterns and I simply >throw out moves which match the weakest patterns in the play-outs.In >the tree, I also throw out these moves but this is progressive. >Those moves still get explored but not near leaf nodes. It is interesting that many people say the similar thing. I'll try to use ELO rating of patterns again. >Programs that play games tend to be highly idiosyncratic. What may >work today may not work tomorrow or may work for me and not you.It >depends on what is already in your program, what isn't, how you >implemented it, etc.Some improvements work well with others and some >do not cooperate with other changes. The whole process is a bit of a >black art, not just the play-outs. I fully agree. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Yamato wrote: I finally improved my playouts by using Remi's ELO system to learn a set of "interesting" patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not. I also have implemented Remi's Minorization-Maximization algorithm. But I could not find how to use the result of it to improve the strength. > Would you explain the details of the playout policy? (1) Captures of groups that could not save themselves last move. (2) Save groups in atari due to last move by capturing or extending. (3) Patterns next to last move. (4) Global moves. I quantize the MM pattern scores to 0..255 by multipying them with a large constant and clipping. This causes the "very good" patterns to have close scores. I then use a threshold so I do not play the very bad patterns at all. The remaining moves are played with the probabilities indicated by the quantized values. I also throw away very bad moves in phase (4) unless there are no alternatives. This gives a small but measurable improvement. But now I believe all the above is actually flawed. With this system I will play bad saving moves even if there are great pattern moves. It might be that your ladder detection avoids these problems somewhat. Considering the probabilities of all moves as Crazy Stone does avoids this problem. I am now trying to get a similar effect without incrementally updating all urgencies. Do you use only 3x3 patterns? Yes. I have not tried bigger ones. For size = 4 the tables would become 2 x 16M. Might be worth a try. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Lazarus uses a system very simlar to the original MoGo policy as documented in the paper. However I did find one significant improvement.I used Rémi's ELO system to rate patterns and I simply throw out moves which match the weakest patterns in the play-outs.In the tree, I also throw out these moves but this is progressive. Those moves still get explored but not near leaf nodes. It seems like I also found a minor improvement in giving captures a higher priority at some point but I don't remember the details. I have not put a huge amount of energy into anything other than this. Programs that play games tend to be highly idiosyncratic. What may work today may not work tomorrow or may work for me and not you.It depends on what is already in your program, what isn't, how you implemented it, etc.Some improvements work well with others and some do not cooperate with other changes. The whole process is a bit of a black art, not just the play-outs. The very biggest problem of all is how to test a change. It's not so difficult in the early stages where you are testing major improvements and getting 100 ELO at a time.But when you get the point where you refine a program, you are talking about small but important improvements.You are looking for 10 or 20 ELO and hoping to put several of them together to get 100 or 200. But testing even 20 ELO points takes thousands of games to get a reliable measure. If we could measure 5 ELO improvements in 5 minutes, you can bet most of us would be able to produce very strong programs. Some of the top computer chess guys have testing systems that play 50,000 games or more to measure the value of small improvements, such as a weight adjustment. They play those games on several computers (or cpu's) and play relatively fast - I have heard of testing systems that average a game per second per cpu at reasonably strong levels (they are still playing at least master strength.) The problem is that if you test too fast, you are not really stressing a monte carlo program or testing the sub-systems in the same way they would be tested in real games. Monte Carlo relies on statistics gathering and that seems to require games that last at least a few seconds. So unless you have a large bank of workstations it's difficult to get enough game to reliable measure small improvements - since this requires tens of thousands of games. - Don Gian-Carlo Pascutto wrote: > Yamato wrote: >> I guess the current top programs have much better playout policy than >> the classical MoGo-style one. >> >> The original policy of MoGo was, >> >> (1) If the last move is an Atari, plays one saving move randomly. >> (2) If there are "interesting" moves in the 8 positions around the >> last move, plays one randomly. >> (3) If there are the moves capturing stones, plays one randomly. >> (4) Plays one random move on the board. >> >> I (and maybe many others) use it with some improvements, however it >> will be not enough to catch up the top programs. > > What improvements did you try? The obvious one I know are prioritizing > saving and capturing moves by the size of the string. > > Zen appears quite strong on CGOS. Leela using the above system was > certainly weaker. > >> Then I have tested a lot of change of probability distributions, but >> it was very hard to improve the strength. >> >> Any comments? > > I had the same problem, i.e. it seems almost impossible to improve the > MoGo system by having a different pattern set for "interesting" moves, > or even by varying the probability of "interesting" moves by pattern > score. > > I tried 2 things: > > a) I exctracted about 5000 positions with a known winner (determined > by UCT) from CGOS games, and measured the Mean Square Error of the > result fof my playouts against the known result (also described in one > of the MoGo papers). Then I applied a genetic algorithm to optimize > the playout patterns. > > This worked, in the sense that the MSE measured over the 5000 > positions dropped. However, it did not produce a stronger program! I > found that somewhat shocking. > > It makes me doubt the value of the MSE measure. > > 2) I made a simple genetic algorithm that makes a random pool of a few > hundred playout policites, picks 2 random parents and > crossovers/mutates to 2 children, plays a 10 game match between the 2 > children with simulations = 100, and then keeps the winner. > > This did not produce anything interesting either. My best guess is > that the match results are simply too random. > > So I did not found any way to automatically optimize the patterns. > > I finally improved my playouts by using Remi's ELO system to learn a > set of "interesting" patterns, and just randomly fiddling with the > probabilities (compressing/expanding) until something improved my > program in self-play with about +25%. Not a very satisfying method or > an exceptional result. There could be some other
Re: [computer-go] How to design the stronger playout policy?
Gian-Carlo Pascutto wrote: >What improvements did you try? The obvious one I know are prioritizing >saving and capturing moves by the size of the string. > >Zen appears quite strong on CGOS. Leela using the above system was >certainly weaker. I use the static ladder search in playouts. For example, if a move that matched a 3x3 pattern is capturable in ladder, that is not interesting. Of course such a rule makes a program slower, but I believe it is an improvement. >I finally improved my playouts by using Remi's ELO system to learn a set >of "interesting" patterns, and just randomly fiddling with the >probabilities (compressing/expanding) until something improved my >program in self-play with about +25%. Not a very satisfying method or an >exceptional result. There could be some other magic combination that is >even better, or maybe not. I also have implemented Remi's Minorization-Maximization algorithm. But I could not find how to use the result of it to improve the strength. Would you explain the details of the playout policy? Do you use only 3x3 patterns? >What is so frustrating is that the playouts are essentially black magic. > I know of no way to automatically determine what is good and not >besides playing about 500 games between 2 strategies. The results are >very often completely counterintuitive. There is no systematic way to >improve. Yes. In addition, the big problem is that testing policies is very time consuming. I think at least 1000 games that use 3000 or more playouts per move are needed to judge whether a change is good or bad. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to design the stronger playout policy?
Yamato wrote: I guess the current top programs have much better playout policy than the classical MoGo-style one. The original policy of MoGo was, (1) If the last move is an Atari, plays one saving move randomly. (2) If there are "interesting" moves in the 8 positions around the last move, plays one randomly. (3) If there are the moves capturing stones, plays one randomly. (4) Plays one random move on the board. I (and maybe many others) use it with some improvements, however it will be not enough to catch up the top programs. What improvements did you try? The obvious one I know are prioritizing saving and capturing moves by the size of the string. Zen appears quite strong on CGOS. Leela using the above system was certainly weaker. Then I have tested a lot of change of probability distributions, but it was very hard to improve the strength. Any comments? I had the same problem, i.e. it seems almost impossible to improve the MoGo system by having a different pattern set for "interesting" moves, or even by varying the probability of "interesting" moves by pattern score. I tried 2 things: a) I exctracted about 5000 positions with a known winner (determined by UCT) from CGOS games, and measured the Mean Square Error of the result fof my playouts against the known result (also described in one of the MoGo papers). Then I applied a genetic algorithm to optimize the playout patterns. This worked, in the sense that the MSE measured over the 5000 positions dropped. However, it did not produce a stronger program! I found that somewhat shocking. It makes me doubt the value of the MSE measure. 2) I made a simple genetic algorithm that makes a random pool of a few hundred playout policites, picks 2 random parents and crossovers/mutates to 2 children, plays a 10 game match between the 2 children with simulations = 100, and then keeps the winner. This did not produce anything interesting either. My best guess is that the match results are simply too random. So I did not found any way to automatically optimize the patterns. I finally improved my playouts by using Remi's ELO system to learn a set of "interesting" patterns, and just randomly fiddling with the probabilities (compressing/expanding) until something improved my program in self-play with about +25%. Not a very satisfying method or an exceptional result. There could be some other magic combination that is even better, or maybe not. I now got some improvement by "merging" the (1) (2) (3) in the MoGo system and using probabilities on that. It makes sense because the playouts wont try hopeless saving moves, for example. What is so frustrating is that the playouts are essentially black magic. I know of no way to automatically determine what is good and not besides playing about 500 games between 2 strategies. The results are very often completely counterintuitive. There is no systematic way to improve. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] How to design the stronger playout policy?
I guess the current top programs have much better playout policy than the classical MoGo-style one. The original policy of MoGo was, (1) If the last move is an Atari, plays one saving move randomly. (2) If there are "interesting" moves in the 8 positions around the last move, plays one randomly. (3) If there are the moves capturing stones, plays one randomly. (4) Plays one random move on the board. I (and maybe many others) use it with some improvements, however it will be not enough to catch up the top programs. Crazy Stone uses a probability distribution of patterns from the Bradeley-Terry Model. greenpeep uses similar patterns extracted from the offline self-play. Then I have tested a lot of change of probability distributions, but it was very hard to improve the strength. Any comments? -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/