Re: [computer-go] Re: Explanation to MoGo paper wanted.
Dave wrote: We have seen a similar effect many times in MoGo. Often we try something that seems like it should improve the quality of the simulation player, but it makes the overall performance worse. It is frustrating and surprising! Has anyone else encountered this? I'm not surprised. The goal of Monte Carlo simulations should be to provide an unbiased estimate of the true min-max value with as low variance as possible. This has little to do with strength, unless you happen to find a perfect simulation player, but then the whole search business becomes moot. The fact that many modifications of uniformly random playouts simultaneously improve simulation playing strength and overall strength is a red herring. Uniformly random playouts are strongly biased to overestimate the value of having tightly connected stones since e.g. one space jumps become cut through disproportionally often compared to what happens in relevant paths through the min-max tree. Almost any change in simulation policy that counters this tendency will improve overall strength and likewise pretty much every sensible change will improve simulation strength compared to uniformly random play. At higher levels something that may happen is that a change in the simulation policy improves the skill at making life in tight spots, without changing other skills. This would likely improve simulation strength but would cause a bias for positions where there's room for a futile invasion that barely fails, decreasing overall strength. Similar phenomena have turned up in GNU Go over the years. If you tune tactical reading or life and death reading to find some new class of attacking moves, results are likely to become worse if you don't do matching changes in the capability to find defense moves. There's also the classical effect of fixing an obvious mistake just to find some regression tests starting to fail. Closer examination shows that the tests were previously only passing because there were two mistakes that cancelled each other and fixing one of them breaks the balance. /Gunnar ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
This discussion reminds me of a naive theory that I sometimes wonder about: Since the players in the playouts are so weak, it seems like the improving the ability to defend a strong position from a not-very-clever move (and not lose it via a blunder) should be more important than improving the ability to find an attack. If there are two equally bad players that can easily attack each other but can't defend, it seems like the results will be close to random, almost regardless of starting position, unless it is very strong. On the other hand, if two bad players are somewhat better at defense but lousy at seeing weaknesses in the other side, there will be less noise and the one with more territory will tend to win, but an attack on a mostly solid position will sometimes be found via a random move, and given enough playouts, this will result in the probability of defense with a weakness being slightly lower than a truly winning position. It seems like this effect would be especially true of the endgame where there aren't so many points to take, but a position could be lost due to a blunder. I'm not sure how useful that is, since to defend a position you need to know how it might be attacked, but perhaps it leads somewhere? - Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
How is this a ko threat? Lazarus threatens a chain of 4 or 5 stones with a self-atari move. If the opponent captures, where is the ko? If the opponent doesn't capture, where is the ko? sorry, this is just terminology on my part -- a 'ko threat' is any threat that can be used during a ko, regardless of whether or not there is a ko on the board. so for instance, a strong player may count and evaluate the size and number of all of his (and his opponent's) ko threats on the board before deciding whether or not to start a crucial ko. he may also create situations that will generate multiple ko threats later, stockpiling them for later use. wasting one of these only reduces his ability to successfully win one or more kos later. one way that strong players can crush weaker players is simply to start a game-deciding ko (for instance by having the ko winner either connect or split two important one-eyed groups), knowing that their stockpile of threats can't be matched. once their opponent runs out of threats, the stronger player will lose the ko, but have one of his threats go unanswered. all this to say that lazarus can actually play and *expect* to have one or more of these threats unanswered if there is a standing ko of large enough value on the board. just start an invasion/reduction ko or a group-splitting/connecting ko, and some of these threats can be potential points. s. Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=listsid=396545469 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
There is one other issue I have seen that is similar. Sometimes Lazarus will play a move that doesn't hurt nor help it's position. It's not a wasted move because the opponent must respond or else lose. this sounds a good bit like a ko threat, which is tricky to distinguish from a good play. s. Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
The attack is easily refuted with a capture, and when that happens no time was lost. But the opponent must capture immediately or the threat Lazarus made actually works. this, in fact, is a ko threat. if you play it *outside* of a ko, then it's a wasted ko threat. no big loss if there are no kos expected and no lost kos have happened. otherwise it's more tricky. s. Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=listsid=396545433 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
steve uurtamo said: There is one other issue I have seen that is similar. Sometimes Lazarus will play a move that doesn't hurt nor help it's position. It's not a wasted move because the opponent must respond or else lose. this sounds a good bit like a ko threat, which is tricky to distinguish from a good play.nfo/computer-go/ To play a ko threat in a ko fight is usually good play. To play such moves anytime else is a really insidious bad habit, one difficult to extinguish for anyone who assumes forced moves are necessarily a good thing. One can burn a lot of opportunities this way without even knowing they were ever there. Forrest Curo - This email was sent using AIS WebMail. http://www.americanis.net/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
On Fri, 2007-07-06 at 16:52 -0700, steve uurtamo wrote: The attack is easily refuted with a capture, and when that happens no time was lost. But the opponent must capture immediately or the threat Lazarus made actually works. this, in fact, is a ko threat. if you play it *outside* of a ko, then it's a wasted ko threat. no big loss if there are no kos expected and no lost kos have happened. otherwise it's more tricky. How is this a ko threat? Lazarus threatens a chain of 4 or 5 stones with a self-atari move. If the opponent captures, where is the ko? If the opponent doesn't capture, where is the ko? - Don Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=listsid=396545433 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
I think Steve meant that the move /should have been used as/ a ko threat. Peter Drake http://www.lclark.edu/~drake/ On Jul 6, 2007, at 5:12 PM, Don Dailey wrote: On Fri, 2007-07-06 at 16:52 -0700, steve uurtamo wrote: The attack is easily refuted with a capture, and when that happens no time was lost. But the opponent must capture immediately or the threat Lazarus made actually works. this, in fact, is a ko threat. if you play it *outside* of a ko, then it's a wasted ko threat. no big loss if there are no kos expected and no lost kos have happened. otherwise it's more tricky. How is this a ko threat? Lazarus threatens a chain of 4 or 5 stones with a self-atari move. If the opponent captures, where is the ko? If the opponent doesn't capture, where is the ko? - Don _ ___ Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=listsid=396545433 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
In Go things are insofar worse as there is only one standard sparring partner, Gnu-Go. This creates severe inbreeding effects. In chess there was a similar problem. There were more strong opponents around, but over the years they become very similar. Suddenly there was a new programm, Rybka, which plays different and all the inbreedings have a lot of difficulties. I think there is no better way. One can do some pre-filtering with test positions. If a version is especially bad in these tests, one can ignore it. But being good in test positions and in games are different things. When MonteGNU is published, it will be an alternative of GNU Go. Of course MC vs MC may have some problems, but at least it is stronger than GNU Go on 9x9. And, if the way to combine UCT and the local tactical search is discovered, the regression test like GNU Go will be also useful. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Explanation to MoGo paper wanted.
In other words UCT works well when evaluation/playouts is/are strong. I believe there are still improvements possible to the UCT algorithm as shown by the recent papers by Mogo and Crazystone authors, but what really will make a difference is in the quality in the playouts. Sylvain said that good moves in the playouts do not always improve the performance of UCT. What do you think about this claim? I believe this claim is true in two senses: 1) If the computation necessary to find better moves is too expensive, performing many dumb playouts may be a better investment. Sure, this is true. But even with the same number of simulations, stonger playouts do not necessarily perform better than dumb playouts. This is the real mystery! 2) If the playouts are too deterministic, and the moves are merely pretty good, the program may avoid an important move and thus misjudge the value of a position. We tried the whole spectrum from completely random to completely deterministic playouts, but we never came close to the performance of the dumb playouts! We have seen a similar effect many times in MoGo. Often we try something that seems like it should improve the quality of the simulation player, but it makes the overall performance worse. It is frustrating and surprising! Has anyone else encountered this? -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Explanation to MoGo paper wanted.
Seems like it should be up to the person in the other environment to adapt your successful algorithm (and notation/terminology) to their environment. But how do the other people in other environments find out about the algorithm? And find out that it is something they could use in their own environment? I think we can help with both, by presenting our work in a more general way. -Dave___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
On Thu, 2007-07-05 at 10:50 -0600, David Silver wrote: We tried the whole spectrum from completely random to completely deterministic playouts, but we never came close to the performance of the dumb playouts! I don't understand - I though Mogo wasn't using dumb play-outs? We have seen a similar effect many times in MoGo. Often we try something that seems like it should improve the quality of the simulation player, but it makes the overall performance worse. It is frustrating and surprising! Has anyone else encountered this? Here is why this happens: Let's think of this in the context of pruning moves, since controlling the play-outs can be cast in this way. Presumably, if you run 1000 random play-outs from a given position you will get a fair indication of how good the position is. But what if you are able to prune out many of the bad moves in that simulation? Would this improve the accuracy of the simulation? Probably, but not necessarily. Suppose that during the play-outs, you are able to prune out 50% of the bad black moves, but only 30% of the bad white moves? You would be playing 1000 simulations where BLACK was playing consistently stronger, regardless of how good the actual position was. If the chances were in fact pretty much even, it would look as if black had a big advantage. If that color bias was consistent for that type of position, building a UCT tree below it would not quickly fix the problem. The extra knowledge you impose is not impartial knowledge, it will work better for one side than another, and for one position differently than another.So even if the average quality of the play-outs improve each position is responding differently to the extra knowledge making it more difficult to compare one position to another. There is one other issue I have seen that is similar. Sometimes Lazarus will play a move that doesn't hurt nor help it's position. It's not a wasted move because the opponent must respond or else lose. An example is a simple self-atari which itself is a direct threat. The opponent is forced to respond, so there is no reason not to try for the cheap shot in his territory, but in the grand scheme of things this move is a distraction and if you could remove them from the tree it would help the program focus on what is really important.However, it sometimes pays to try moves like these. When I fixed this problem in Lazarus, it started winning less against weaker programs simply because they sometimes fail to defend. I imagine this can happen in more sophisticated contexts, where certain moves could be very effective in exploiting more naive (but not totally stupid) programs. In such a case, an improvement could make your program appear weaker. I don't think this is merely academic because I model skill not as how many good moves you play, but as how many bad moves you avoid playing. In other words there is no such thing as a good move - there are only bad moves. So you want to present your opponent with opportunities to play them. In chess it works the same, if you are playing a weak opponent it's really stupid to trade quickly into a drawn ending just because the position is even. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Explanation to MoGo paper wanted.
One of my favorite observations about Go is that expert play tends to be on the edge of catastrophy. By playing better moves on the average, you become more vulnerable to the occasional misstep. If a program is not very good, random better or worse moves do not have much effect. If the program improves, it's closer to the edge, and random missteps are more likely to make it fall off the cliff. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Explanation to MoGo paper wanted.
One of my favorite observations about Go is that expert play tends to be on the edge of catastrophy. By playing better moves on the average, you become more vulnerable to the occasional misstep. If a program is not very good, random better or worse moves do not have much effect. If the program improves, it's closer to the edge, and random missteps are more likely to make it fall off the cliff. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
We have encountered this consistently in our non-MC/UCT program. Things that fix an obvious problem lead to unintended consequences that sometimes take weeks to tease apart. So far we have been able to understand how this comes about in each situation, but still have little ability to predict in advance when the fix will simply work (and thus improve play) and when it will lead to the unintended consequences that result in worse play. Cheers, David On 5, Jul 2007, at 9:50 AM, David Silver wrote: Often we try something that seems like it should improve the quality of the simulation player, but it makes the overall performance worse. It is frustrating and surprising! Has anyone else encountered this? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
There is one other issue I have seen that is similar. Sometimes Lazarus will play a move that doesn't hurt nor help it's position. It's not a wasted move because the opponent must respond or else lose. An example is a simple self-atari which itself is a direct threat. The opponent is forced to respond, so there is no reason not to try for the cheap shot in his territory, but in the grand scheme of things this move is a distraction and if you could remove them from the tree it would help the program focus on what is really important.However, it sometimes pays to try moves like these. When I fixed this problem in Lazarus, it started winning less against weaker programs simply because they sometimes fail to defend. And is that version stronger against higher-level programs? Losing against weaker programs might be the cost that we should pay temporarily. I think one of the problems is in testing. Currently we have almost no way to judge whether a improvement is good or bad, other than playing a lot of games against GNU Go. It takes very long time and seems inefficient. Moreover, even it may not be a very good method. GNU Go often cannot respond to an obvious bad move correctly, so pruning such moves decrease the winning rate. -- Yamato ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)
Thanks, the dictionary is really great. Chrilly - Original Message - From: David Silver To: computer-go@computer-go.org Sent: Tuesday, July 03, 2007 11:29 PM Subject: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode) It's because Go is not only game in the world and certainly not only reinforcement learning problem. They are using a widely accepted terminology. But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to present generel concepts, but it is extremly complicated to formulate an algorithm in this framework. Here is quick and dirty RL-Computer Go translation kit to try and help bridge the gap! RL terminology Go terminology State Position Action Move Reward Win/Loss Return Win/Loss Episode Game Time-step One move Agent Program Value function Evaluation function Policy Player Default policy Simulation player Uniform random policy Light simulation player Other stochastic policy Heavy simulation player Greedy policy 1-ply search player Epsilon-greedy policy 1-ply search player with some random moves FeatureFactor used for position evaluation Weight Weight of each factor in evaluation function Tabular representation One weight for each complete position Partial tabular UCT tree representation State abstraction One weight for many positions Linear value function Evaluation function approximation using weighted sum of various factors Feature discovery Learning new factors for the evaluation function Sample-based search Simulation (Monte-Carlo methods, etc.) Transition function Rules of the game Environment Rules of the game + opponent Trajectory Move sequence Online During actual play Offline Before/after actual play (e.g. preprocessing) On-policy If both players play as normal Off-policy If either player behaves differently -Dave -- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)
I wonder whether the use of games as a metaphor would make general machine learning concepts more easily understood by non-specialists? That is, if you took a machine learning paper and rewrote it in terms of games, would that make it easier or harder to understand for people unfamiliar with both game programming and machine learning? Of course, this is hard to tell once you've learned either vocabulary. It's easy to assume that whatever jargon you know (or invent) is inherently easier to understand. - Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)
It's because Go is not only game in the world and certainly not only reinforcement learning problem. They are using a widely accepted terminology. But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Really? I think it is a wonderful example of clear thinking and clear writing - I couldn't put the book down. It is the reason I chose to study RL, and to come study with Rich Sutton. Its maybe suitable to present generel concepts, but it is extremly complicated to formulate an algorithm in this framework. Of course everyone hopes that ideas will be presented to them in their personal terminology, as it saves them some effort. But we make progress in science by unifying and identifying the general concepts. But the main point is: I think game programmers should be more proud of their work and should present their results in the language of game programming. We are the ones which make progress, not these paper tigers. Isn't there room for both? Shouldn't we present our work within our own community, but also make efforts to share our ideas with others? -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.
2. We want other communities to find out about UCT, and start using it many different domains. It is not just a Go-programming algorithm! Yes. I think the idea has many potential fields of application. In the samewhat dated book R.Epstein: The of Gambling and Statistical Logic the simple algorithm Play an arm as long as he is winning is proposed. But does not help too much. E.g. J.Schaeffer invented the History-Heuristik. This was long before jump-prediction become an important topic in microprocessor design. The first jump-predictions where static rules. After some years the hardware-designers invented then their own history-heuristic for jumps. ideas: http://www.cs.ualberta.ca/~silver/research/presentations/files/sylvain-silver.pdf Thanks. Actually I think the best notation would be: description in plain text + mathematical notation + pseudocode + many diagrams. But in a conference paper we have just 8 pages to describe everything, so we must make some compromises. Yes, I fully agree. Why do you call this UCT if there is no tree? Isn't this just roll-out simulation, as used by Tesauro and Galperin in 1996? Its the selection rule for making the rollouts at the root. This is a variant of UCT. Exactly its the multiarmed-bandit selection rule. It differs e.g from the rule proposed by Epstein. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)
Isn't there room for both? Shouldn't we present our work within our own community, but also make efforts to share our ideas with others? Yes, I do this by writing popular articles about computer-chess and games programming. The point of concern is: One is only considered important if one considers ones one work as important. Sometimes I have the feeling that academic researchers are a little bit ashamed that they do not do something more serious, important. And they hide then their work behind a more serious title/topic and vocabulary. E.g. J. Schaeffer Donsky wrote Falling from Grace. Both made important contributions to computer-chess. But in this article they blamed themself, that its their own (and the communities) fault that they have fallen from AI-Grace. But isn't it the problem of AI when the concepts do not work? Why didn't they wrote an article The concepts of AI are bullshit? Feng Hsu was the first one who did this. He was proud enough about his work. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted. (BackGammonCode)
the language of mathematics is perhaps the most universal language for computer scientists. pseudocode comes in somewhere after that, and well-known algorithms probably somewhere inbetween. game programming is an application of computer science, and the language of game programming isn't necessarily appropriate (and would seem obtuse) to the much larger audience of potential readers outside of its domain. whenever an algorithm is applicable outside of the game programming field, rephrasing its game-specific language might make the most sense to the readers who are intended to read about it. that having been said, the most appropriate language is obviously that which is understandable by the largest number of potential readers interested in the title and (if there is one) the abstract. s. Need a vacation? Get great deals to amazing places on Yahoo! Travel. http://travel.yahoo.com/___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: Explanation to MoGo paper wanted.(BackGammonCode)
the language of mathematics is perhaps the most universal language for computer scientists. pseudocode comes in somewhere after that, and well-known algorithms probably somewhere inbetween. game programming is an application of computer science, and the language of game programming isn't necessarily appropriate (and would seem obtuse) to the much larger audience of potential readers outside of its domain. whenever an algorithm is applicable outside of the game programming field, rephrasing its game-specific language might make the most sense to the readers who are intended to read about it. that having been said, the most appropriate language is obviously that which is understandable by the largest number of potential readers interested in the title and (if there is one) the abstract. s. I have a PhD in Mathematical statistics. So I am not at all against the use of Mathematics. I think the language should be choosen which is most appropriate. For some mathematical proofs about the Big-O behaviour of algorithms there is no other language than mathematics. But for describing algorithms this notation is not suited. D.Knuth choose in the Art of Computer Programming structured English and for a precise analysis MIX. His argument for MIX is, that he writes books for eternity. Therefore he can not use the latest fashion in programming language. There is some reason behind this argument, but I think that only a few programmers can read nowadays MIX. MIX does also not reveflect the capabilities of modern hardware. Knuth has therefore to rewrite his books in MMIX (Inschallah). Maybe pseudo-Algol would have been more ethernal than MIX. But in any case he uses different levels of notation. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Explanation to MoGo paper wanted. (BackGammon Code)
It's because Go is not only game in the world and certainly not only reinforcement learning problem. They are using a widely accepted terminology. But a very inappropriate one. I have read Suttons book and all the things I know (e.g. TD-Gammon) are completly obfuscated. Its maybe suitable to present generel concepts, but it is extremly complicated to formulate an algorithm in this framework. Here is quick and dirty RL-Computer Go translation kit to try and help bridge the gap! RL terminology Go terminology State Position Action Move Reward Win/Loss Return Win/Loss Episode Game Time-step One move Agent Program Value function Evaluation function Policy Player Default policy Simulation player Uniform random policy Light simulation player Other stochastic policy Heavy simulation player Greedy policy 1-ply search player Epsilon-greedy policy 1-ply search player with some random moves FeatureFactor used for position evaluation Weight Weight of each factor in evaluation function Tabular representation One weight for each complete position Partial tabular UCT tree representation State abstraction One weight for many positions Linear value function Evaluation function approximation using weighted sum of various factors Feature discovery Learning new factors for the evaluation function Sample-based search Simulation (Monte-Carlo methods, etc.) Transition function Rules of the game Environment Rules of the game + opponent Trajectory Move sequence Online During actual play Offline Before/after actual play (e.g. preprocessing) On-policy If both players play as normal Off-policy If either player behaves differently -Dave ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/