Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)
Thanks, the dictionary is really great. Chrilly - Original Message - From: David Silver To: computer-go@computer-go.org Sent: Tuesday, July 03, 2007 11:29 PM Subject: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode) It's because Go is not only game in the world and certainly not only reinforcement learning problem. They are using a widely accepted terminology. But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to present generel concepts, but it is extremly complicated to formulate an algorithm in this framework. Here is quick and dirty RL-Computer Go translation kit to try and help bridge the gap! RL terminology Go terminology State Position Action Move Reward Win/Loss Return Win/Loss Episode Game Time-step One move Agent Program Value function Evaluation function Policy Player Default policy Simulation player Uniform random policy Light simulation player Other stochastic policy Heavy simulation player Greedy policy 1-ply search player Epsilon-greedy policy 1-ply search player with some random moves FeatureFactor used for position evaluation Weight Weight of each factor in evaluation function Tabular representation One weight for each complete position Partial tabular UCT tree representation State abstraction One weight for many positions Linear value function Evaluation function approximation using weighted sum of various factors Feature discovery Learning new factors for the evaluation function Sample-based search Simulation (Monte-Carlo methods, etc.) Transition function Rules of the game Environment Rules of the game + opponent Trajectory Move sequence Online During actual play Offline Before/after actual play (e.g. preprocessing) On-policy If both players play as normal Off-policy If either player behaves differently -Dave -- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)
I wonder whether the use of games as a metaphor would make general machine learning concepts more easily understood by non-specialists? That is, if you took a machine learning paper and rewrote it in terms of games, would that make it easier or harder to understand for people unfamiliar with both game programming and machine learning? Of course, this is hard to tell once you've learned either vocabulary. It's easy to assume that whatever jargon you know (or invent) is inherently easier to understand. - Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
On Wed, 2007-07-04 at 11:34 +0200, Magnus Persson wrote: but what really will make a difference is in the quality in the playouts. I would like to suggest a more abstract view of things. In the purest form of the algorithm there isn't an artificial distinction between the tree and the play-outs. The algorithm is applied as if the whole tree already exists (conceptually) and nodes are updated to the end of the game. We had to impose end nodes and a tree that grows in depth due to the fact that it's impractical to store the whole tree in memory. So we have a tree phase on the one hand, and on the other hand we have a play-out phase that simulates an unexplored tree (but without updates which introduces out of necessity a small inefficiency.) This makes everything a bit of a compromise but a well advised one due to hardware limitations. But then we started imposing our will on the play-outs in order to make them smarter. But we didn't do the same to the tree portion because we now believe they are 2 separate things (even though they really are not.) So I prefer to think of the play-outs and the tree as the same thing. I think whatever is done can be applied to both. For instance Lazarus does a lot of pruning and the pruning rules are the same for tree portion and the play-out portion. Actually, Lazarus saw most of the improvement from the tree pruning when I test each without the other. But I notice that we are now looking at the tree as the search portion and the play-outs as the evaluation function. I think that is incredible because I have always believed that tree search and evaluation are the same thing, just different forms or states. Like water and ice, or matter and energy. It's interesting that chess has this too. Traditionally programs have always had these 3 very crude phases, search, quiescence, evaluation. Modern programs have somewhat blurred these distinctions but it hasn't changed very much. UCT comes along and finally does away with the distinction altogether. Now you can call it all evaluation or search, whatever pleases you. But in it's purest form, UCT with totally random play-outs is a beautiful thing - a recursive evaluation function with zero (almost) domain specific knowledge. Of course now we just had to go and spoil it all by imposing domain specific rules. I have done the same and I admit it.It would be fun to see how far we could go if domain specific knowledge was forbidden as an experiment. Once patterns are introduced along with other direct Go knowledge, it's still fun but it feels a bit wrong, kind of like cheating. It's clear that when we do this, we introduce strengths and weaknesses to the program, making it a bit more fragile, less universal or robust. Stronger too, but more susceptible to in-transitivity. Of course we do this in Chess programs in a big way. We very tediously tell the program what is good and what is bad. It has no choice, it must accept our definition of right and wrong, our morality. However in our great wisdom we provide a search mechanism in order to correct our bad judgments. The search mechanism is an admission that we know we are wrong about many things. Of course you are right - if the play-outs are improved, the quality of the moves will also improve. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
On Wed, 2007-07-04 at 16:57 -0400, George Dahl wrote: Of course now we just had to go and spoil it all by imposing domain specific rules. I have done the same and I admit it.It would be fun to see how far we could go if domain specific knowledge was forbidden as an experiment. Once patterns are introduced along with other direct Go knowledge, it's still fun but it feels a bit wrong, kind of like cheating. Is it still cheating if the program learns and discover's the patterns itself? Then isn't it just precomputing a few things? Of course it isn't cheating really, but it seems more elegant to me if the computer is doing the figuring out, not the programmer. Of course the programmer has to figure out how to write the program in the first place. But the idea of writing a Go program without any hand-coded Go knowledge is very appealing to me. Of course, there HAS to be Go knowledge, even if it's figured out by the software. In Lazarus, I use several patterns for pruning moves. But those patterns are not generated by ME. Lazarus knows more about Go than I do and so Lazarus generated those patterns (off-line.) Ultimately, I would like programs to figure out on the fly what to do. It's fun to imagine how a program would work if God wrote it. Would there be tons of hard coded knowledge built into it, or would it be a learning meta-system that had facilities for quickly finding out things for itself that it needed to know? - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
And how much would generating patterns from pro games be cheating? How about a system that gives a reward to shapes it actually played in a game, the pro games are then used as seed to start the system.. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
Pro games are cheating unless the program is one of the players. :) You are right though, sometimes compromises must be made when seeding an algorithm. My ideas on using domain knowledge from humans are sort of about maximizing a ratio. The ratio of program performance to domain knowledge added (by humans, directly). Obviously it is hard to quantify these sorts of things, but if program A is 3 times as good (whatever that means) as program B and uses only twice the human given Go knowledge, I would rather have program A. - George On 7/4/07, Benjamin Teuber [EMAIL PROTECTED] wrote: And how much would generating patterns from pro games be cheating? How about a system that gives a reward to shapes it actually played in a game, the pro games are then used as seed to start the system.. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
On Thu, 2007-07-05 at 00:53 +0200, Benjamin Teuber wrote: And how much would generating patterns from pro games be cheating? How about a system that gives a reward to shapes it actually played in a game, the pro games are then used as seed to start the system.. I have posted before about the evils of trying to extract knowledge from human games. I don't think it is very effective compared to generating that knowledge from computer games for several reasons. Of course I realize this is not a popular point of view! - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
On Wed, 2007-07-04 at 19:23 -0400, Don Dailey wrote: On Thu, 2007-07-05 at 01:09 +0200, Magnus Persson wrote: Just to disturb the vision a strong go program without hardwired go knowledge I currently think that there are some really important things in Go that are really hard or even impossible to learn with for examples patterns. The ideal program would need to learn procedural skills (algorithms). I'm not saying a program can be as good without hardwired knowledge, I'm just saying it would be a cool thing! And even if you could, it would still require hard coded meta-skills - skills programmed explicitly to enable it to LEARN or discover what it needed. So even if it wasn't direct go knowledge it would be indirect go knowledge. Kind of like, give a man a fish or teach him to fish. - Don - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
On Thu, 2007-07-05 at 01:09 +0200, Magnus Persson wrote: Just to disturb the vision a strong go program without hardwired go knowledge I currently think that there are some really important things in Go that are really hard or even impossible to learn with for examples patterns. The ideal program would need to learn procedural skills (algorithms). I'm not saying a program can be as good without hardwired knowledge, I'm just saying it would be a cool thing! - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
In other words UCT works well when evaluation/playouts is/are strong. I believe there are still improvements possible to the UCT algorithm as shown by the recent papers by Mogo and Crazystone authors, but what really will make a difference is in the quality in the playouts. Sylvain said that good moves in the playouts do not always improve the performance of UCT. What do you think about this claim? -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
I believe this claim is true in two senses: 1) If the computation necessary to find better moves is too expensive, performing many dumb playouts may be a better investment. 2) If the playouts are too deterministic, and the moves are merely pretty good, the program may avoid an important move and thus misjudge the value of a position. Peter Drake http://www.lclark.edu/~drake/ On Jul 4, 2007, at 5:52 PM, Yamato wrote: In other words UCT works well when evaluation/playouts is/are strong. I believe there are still improvements possible to the UCT algorithm as shown by the recent papers by Mogo and Crazystone authors, but what really will make a difference is in the quality in the playouts. Sylvain said that good moves in the playouts do not always improve the performance of UCT. What do you think about this claim? -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Explanation to MoGo paper wanted.
2) If the playouts are too deterministic, and the moves are merely pretty good, the program may avoid an important move and thus misjudge the value of a position. IMO, this is the most interesting part of Computer Go today. How can one possibly design an optimal playout agent when making a playout agent that plays strong is not the solution? The only known method seems to be trial and error. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/