Re: [computer-go] FW: computer-go] Monte carlo play?
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On Sunday 16 November 2008, Heikki Levanto wrote: On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. Do you know what use (if any) is made of the standard deviation of the results? The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: FW: computer-go] Monte carlo play?
In a game of self play, if both parties are employing only monte carlo, ... random simulations... wouldnt it be very weak... ... and some playing around I am clearly mistaken because its works quite well. Yes, it doesn't make sense but it does indeed seem to work :-). Plain Monte-Carlo bots *are* rather weak. But they work better than might be expected, and they are fairly simple, which is extremely motivating for potential Go programmers!-) Instead of having to learn all of Go and a lot of AI before being able to start building anything, one can start quickly, and then improve together with the bot. And the current top programs claim plain Monte-Carlo bots as their remote ancestors, so it isn't a dead end, either (though complete rewrites may be needed somewhere along the path;-). Another way to think about playouts is by comparing them with a human player processing through the ranks: 1 only knowing the rules makes for very inefficient/slow/buggy play 2 playing lots of games quickly has often been recommended to get a better feeling for the game; personally, I don't like fast games(*), but at least the worst inadequacies tend to show up even if we only play equally clueless opponents (probably because weaknesses aren't evenly distributed, so that peaks in understanding in one player happen to match weaknesses in understanding in the other) 3 human players learn over time, until all players in a group have reached an equal level (or have gone as far as they can, with the effort they can afford to put into the game) 4 now further progress depends on innovation in the existing pool of players, on joining a different player pool, or on bringing in other players, preferably players whose strengths expose weaknesses in the current pool (similar effects can be achieved by reading books or observing games) If we allow the learning outcome in 2/3 to be represented as patterns (wait a minute, I've fallen into that trap before, so I shouldn't do this, I usually do this, but that would be faster - I wonder whether it'll work), then random playouts can emulate a weak form of this process. Instead of learning from lots of games over time, plain Monte-Carlo bots extrapolate from lots of dumb games, everytime they consider their next move. Random games are really too dumb to be considered games, but bots can play lots of them in a short time, so they can at least see trends (a mixture of on-the-spot learning and very bad reading ahead). If we assume that the quantity compensates somewhat for the quality, a Monte-Carlo bot is a bit like beginners after lots of games who always forget their lessons learned when the session ends. They may have an idea of who is ahead and what is alive, and be able to organise their games based on those ideas, but a stronger player (who remembers his lessons) may be able to rip their ideas apart (you might both think that group is alive but what do you do if I play here? and who is ahead if you lose that group?). And then there is the scenario of strong player visits club and humiliates local champion (you might think that playing there kills that group, but what if they reply here?). Adding patterns or other biases to make playouts heavier (both slower and more relevant for evaluation) is similar to improving reading skills, but with differences: bots already read to the end, so eliminating useless moves doesn't improve the playout depth, it improves at best the density of information to be gained from playouts (so you don't have to play as many of them, or can get more out of those you do play); but playouts are also still used for on-the-spot-learning, and biases have limited scope for success, so one has to be careful not to ruin this aspect. Using a tree on top of the playouts is not just a way to represent the knowledge extracted from the playouts, but also helps to direct playout effort towards relevant situations, and gives a place where traditional Go knowledge can be built in without affecting the playouts directly (so they become more of an evaluation function than a candidate move generator). One problem is that the top programs tend to be fairly close in strength. There are variations in method, so there is some further progress, but how much of the human going-through-the-ranks evolution has been captured in current top programs, I don't know. I suspect that it is less than their ranks would indicate. There is certainly reading going on, discussion, and inviting stronger players or different bots. Oh, and if any of this seems to make sense, remember that I just made it all up!-) You might want to consult these slides for overviews from folks who have been in the business: Old-fashioned Computer Go vs Monte-Carlo Go; by Bruno Bouzy, Invited tutorial, IEEE 2007 Symposium on Computational Intelligence in Games, CIG 07, Hawaï, USA, 1st April 2007. Abstract:
Re: [computer-go] FW: computer-go] Monte carlo play?
Hello Heikki, Heikki Levanto: [EMAIL PROTECTED]: On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. I believe CrazyStone's use of patterns does so and it seems successful. Hideki What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On Sun, Nov 16, 2008 at 11:46:28AM +, D Gilder wrote: This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. Do you know what use (if any) is made of the standard deviation of the results? Now statistics isn't my strong point, but one of the first and most successfull systems was called UCT for Upper Confidence Bound. It calculated some sort of expected error, and added that to the winning ratio. Then it expanded the branch that had the highest value. If that expansion was a win, the error margin would get smaller, but the average result would get higher. Perhaps some other branch would be tried next, but this one would still be a good candidate. If the result was a loss, the average would drop, and so would the error, so this move would become much less likely to be expanded. I am sure someone who understands statistics will soon jump in and correct my explanation :-) - Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
Quoting Hideki Kato [EMAIL PROTECTED]: Heikki Levanto: [EMAIL PROTECTED]: The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. I believe CrazyStone's use of patterns does so and it seems successful. With Valkyria I try to follow two principles in heavy playouts. 1) In contact fights there are a lot of shapes that are played most of the time. Thus Valkyria checks each move played if there is an obvious local response to it. If so it plays it deterministcally. In many situations there are two or more such candidates and then it plays one of those moves. 2) In many positions the last move played does not trigger any obvious response, and then a random move is chosen uniformly 3) There are moves that are inferior 100% of the time both locally and globally. These moves are pruned if they are selected and a new random move is chosen as long as there are moves left to try. I got hundreds of handcoded patterns for both 1 and 3. It would be too time consuming to test these patterns, so I use my knowledge and intuition (European 2 Dan) to simply decide what patterns to include. So Valkyria has a lot of go knowledge, but mostly such knowledge that all go players have up to some strength such as perhaps 8-10 kyu. It has no knowledge about global matters. The beauty of MC-evaluation is that globally strong moves are most of the time evaluated better than globally weak moves. Heavy playouts removes noise from MC-evaluation and makes it more sensitive to the true value of moves. Still there are biases with all heavy playouts, but they are overcome with MC Tree Search (MCTS) that corrects mistakes in the evaluation recursively. Here are my latest scaling experiment on 9x9 for Valkyria. Valkyria plays 1150 random games per second on my 4 year old laptop. This test is against gnugo 3.7.10 assumed to be Elo 1800. Most datapoints are based on 500 games. N sims means Valkyria playes N heavy playouts per move played. Winrates are in %. N sims WinRate Elo (rel Gnu) 47 7.4 1361 94 22 1580 188 37 1708 375 53 1821 750 69.91946 150081.22054 300088 2146 600092.62239 12000 94 2278 24000 97.22416 48000 97.42429 the heavy playouts of Valkyria needs just 375 random games per move to match gnugo using only 0.3 seconds per move. And even using only 47 simulations per move it can still win. So obviously the heavy playout code of Valkyria is much weaker ( Elo 1361) than Gnugo and most human opponents, but compared to CGOS a lot of programs witho no knowledge are about the same level, although they uses 2000 simulations or more. Searching efficiently using MCTS with AMAF it apparently can be made arbitrarily strong. Hope this explains how both the nature of playouts and the MCTS contributes to the playing strength of a program. Should one go heavy or light? I do not know, I feel that Valkyria is a little bit too slow on equivalent hardware against most top programs. On the other hand I think it could be tweaked and improved upon. Perhaps it can even be made faster by removing code that does not improve playing strength. And there is probably still room for adding code that improves strength without a noticable slowdown. I just know that is a lot of hard work doing it the way I did it. Best Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: FW: computer-go] Monte carlo play?
On Sun, 16 Nov 2008, Claus Reinke wrote: ... better feeling for the game; personally, I don't like fast games(*), but ... But there is this saying: Play quick, lose quick, learn quick! :) Thomas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
The random playouts or even heavy playouts are not intended to emulate a human player. Heikki is exactly right. It's a crude measurement of how good the position is. The moves in a random playout are horrible and so are the moves in a heavy playout. In fact, improving them arbitrarily will probably hurt the playouts. Random playouts work reasonably well because to a certain extent bad moves cancel each other. Chrilly called this mutual stupidity I think. For example, a group of stones may be weak and ultimately lost, but it is still possible to defend that group if the attacker isn't diligent. What happens in the playouts is that the attacker is NOT diligent, but neither is the defender! So the result comes out correct anyway. Of course this is not reliable, but it's amazingly good at getting a reasonable picture of what is weak, what is strong and who owns what. You can improve that by adding some knowledge to the playouts, but you must do this with great care. In my example above, let's say you add a piece of knowledge that causes the defender to succeed. You can argue that the playouts play better go now, but the conclusion that you come to for the group we are talking about is now wrong. - Don On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote: On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: Being a computer scientist but new to go, i can grasp some of the theory. The question I was trying to get across was: In a game of self play, if both parties are employing only monte carlo, surely its not a good conceptual representation of a human, and if the reinforcement learning is based on random simulations wouldnt it be very weak when playing a real human? Here is another amateur answering. The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. What they do is that they build a quite traditional search tree starting from the current position. They use a random playout as a crude way to evaluate a position. Based on this evaluation, they decide which branch of the tree to expand. This is the way I understand the random playouts: If, in a given position, white is clearly ahead, he will win the game if both parts play perfect moves. He is also likely to win if both parts play reasonably good moves (say, like human amateurs), but there is a bit more of a chance that one player hits upon a good combination which the other misses, so the result is not quite as reliable. If the playouts are totally random, there is still a better chance for white to win, because both parts make equally bad moves. The results have much more variation, of course. So far it does not sound like a very good proposal, but things change if you consider the facts that we don't have perfecr oracles, and good humans are slow to play out a position, and can not be integrated into a computer program. Whereas random playouts can be done awfully fast, tens of thousands of playouts in a second. Averaging the reuslts gives a fair indication of who is more likely to win from that position, just what is needed to decide which part of the search tree to expand. The 'random' playouts are not totally random, they include a minimum of tactical rules (do not fill own eyes, do not pass as long as there are valid moves). Even this little will produce a few blind spots, moves that the playouts can not see, and systematically wrong results. Adding more go-specific knowledge can make the results much better (more likely to be right), but can also add some more blind spots. And it costs time, reducing the number of playouts the program can make. Hope that explains something of the mystery Regards Heikki signature.asc Description: This is a digitally signed message part ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] FW: computer-go] Monte carlo play?
Yes, Valkyria does a lot of ladder reading as well. Actually pattern matching in the case of Valkyria is a little unclear, it is a decision trees where the leaves are often procedure calls that looks at a larger portion of the board. The ladder code is called for various reasons in the tree. Is 3.7.10 level 10 weaker than the default settings? I will run a test using 5000 playouts and 375 playouts to see if it makes a difference. Also have you tried this version on CGOS9x9? This version http://cgos.boardspace.net/9x9/cross/Valkyria3.2.6_C2.html used a machine similar to yours and reached 3000 playouts per second using two threads. Your code is then 8 times faster and should be much stronger. -Magnus Quoting David Fotland [EMAIL PROTECTED]: I thought Valkyria does local search (ladders) during the playouts. Many Faces is lighter on the playouts. I have 17 local 3x3 patterns, then go to uniform random without filling eyes. Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our performance is similar. I'm doing 23K playouts per second on my 2.2 GHz Core 2 Duo, so my performance might be a little better, depending on the specs of your old machine. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] FW: computer-go] Monte carlo play?
I think I added a small capture bias, but it didn't make much difference. Sorry, I forgot that it isn't quite pure random. Before the uniform random, if there is an enemy one liberty group on the board, with some small probability, I capture it. A pattern includes don't cares and is matched in any orientation. The 3x3 patterns are only matched adjacent to the last move (8 local places). David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Jason House Sent: Sunday, November 16, 2008 5:06 PM To: computer-go Subject: Re: [computer-go] FW: computer-go] Monte carlo play? On Nov 16, 2008, at 11:18 AM, David Fotland [EMAIL PROTECTED] games.com wrote: I thought Valkyria does local search (ladders) during the playouts. Many Faces is lighter on the playouts. I have 17 local 3x3 patterns, then go to uniform random without filling eyes. No capture bias in the playouts? I thought that was a big strength boost. Out of curiosity, how do you count your patterns. For example, is it still one pattern if it includes a don't care? How about rotations/ reflections of the same basic pattern? Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our performance is similar. I'm doing 23K playouts per second on my 2.2 GHz Core 2 Duo, so my performance might be a little better, depending on the specs of your old machine. David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Magnus Persson Sent: Sunday, November 16, 2008 5:45 AM To: computer-go@computer-go.org Subject: Re: [computer-go] FW: computer-go] Monte carlo play? Quoting Hideki Kato [EMAIL PROTECTED]: Heikki Levanto: [EMAIL PROTECTED]: The way I understand it, modern Monte Carlo programs do not even try to emulate a human player with a random player - obviously that would not work. I believe CrazyStone's use of patterns does so and it seems successful. With Valkyria I try to follow two principles in heavy playouts. 1) In contact fights there are a lot of shapes that are played most of the time. Thus Valkyria checks each move played if there is an obvious local response to it. If so it plays it deterministcally. In many situations there are two or more such candidates and then it plays one of those moves. 2) In many positions the last move played does not trigger any obvious response, and then a random move is chosen uniformly 3) There are moves that are inferior 100% of the time both locally and globally. These moves are pruned if they are selected and a new random move is chosen as long as there are moves left to try. I got hundreds of handcoded patterns for both 1 and 3. It would be too time consuming to test these patterns, so I use my knowledge and intuition (European 2 Dan) to simply decide what patterns to include. So Valkyria has a lot of go knowledge, but mostly such knowledge that all go players have up to some strength such as perhaps 8-10 kyu. It has no knowledge about global matters. The beauty of MC-evaluation is that globally strong moves are most of the time evaluated better than globally weak moves. Heavy playouts removes noise from MC-evaluation and makes it more sensitive to the true value of moves. Still there are biases with all heavy playouts, but they are overcome with MC Tree Search (MCTS) that corrects mistakes in the evaluation recursively. Here are my latest scaling experiment on 9x9 for Valkyria. Valkyria plays 1150 random games per second on my 4 year old laptop. This test is against gnugo 3.7.10 assumed to be Elo 1800. Most datapoints are based on 500 games. N sims means Valkyria playes N heavy playouts per move played. Winrates are in %. N simsWinRateElo (rel Gnu) 477.41361 94221580 188371708 375531821 75069.91946 150081.22054 3000882146 600092.62239 12000942278 2400097.22416 4800097.42429 the heavy playouts of Valkyria needs just 375 random games per move to match gnugo using only 0.3 seconds per move. And even using only 47 simulations per move it can still win. So obviously the heavy playout code of Valkyria is much weaker ( Elo 1361) than Gnugo and most human opponents, but compared to CGOS a lot of programs witho no knowledge are about the same level, although they uses 2000 simulations or more. Searching efficiently using MCTS with AMAF it apparently can be made arbitrarily strong. Hope this explains how both the nature of playouts and the MCTS contributes to the playing strength of a program. Should one go heavy or light? I do not know, I feel that Valkyria is a little bit too slow on equivalent hardware against most top programs. On the other hand I think
Re: [computer-go] FW: computer-go] Monte carlo play?
Some months ago I did several experiments with using tactics and patterns in playouts. Generally I found a big boost in strength using tactics. I also found a boost in strength using patterns but with a severe diminishing return after a certain number and even becoming detrimental when using large number of patterns (1,000s to 10,000s). Since I was using a generalized pattern-matcher, using it slowed things down considerably. Although it played a lot better with the same number of playouts, if I compared MC playouts with patterns to a MC playout without patterns using the same amount of CPU time the gain was not so obvious. Since most of the gain in strength was gained by just a few patterns I concluded just as David that it was probably better to just use a handful of hard-coded patterns during playouts. I only recently started to do real experiments with hard-coded patterns and so far my results are rather inconclusive. I found when mixing different things it's not always clear what contributes to any increased strength observed. So I'm still in the process of trying to dissect what is actually contributing where. I found for example that a lot of the increased level of play using patterns does not come from using them during playouts but comes from the effect they have on move-exploration. I don't know if this is due to my particular way of implementing MC playouts in combination with UCT search, but moves matching a pattern (usually) automatically make it first in the tree- expansion as well and generally I can say so far I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation. For example, in one particular experiment using just 5 patterns I saw a win-rate of 65% against the same program not using patterns (with the same number of playouts). But when not using the patterns during exploration saw the win-rate drop to just 55%. I still have a lot of testing to do and it's too early to draw any hard conclusions. But I think it's worthwhile trying to distinguish where the strength is actually gained. Better yet, finding out exactly 'why' it gained strength, because with MC playouts I often find results during testing highly counter-intuitive, occasionally to the point of being (seemingly) nonsensical. I also think what Don was proposing with his reference-bot could be interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000 (light) playouts. I don't know if it's possible, but I think it's a fruitful exercise. At a time where most people are looking at using more and more hardware to increase playing-strength, knowing what plays best at the other end of the spectrum is valuable as well. With that I mean, finding what plays best using severely constrained resources. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. - George On Sun, Nov 16, 2008 at 10:39 PM, Mark Boon [EMAIL PROTECTED] wrote: Some months ago I did several experiments with using tactics and patterns in playouts. Generally I found a big boost in strength using tactics. I also found a boost in strength using patterns but with a severe diminishing return after a certain number and even becoming detrimental when using large number of patterns (1,000s to 10,000s). Since I was using a generalized pattern-matcher, using it slowed things down considerably. Although it played a lot better with the same number of playouts, if I compared MC playouts with patterns to a MC playout without patterns using the same amount of CPU time the gain was not so obvious. Since most of the gain in strength was gained by just a few patterns I concluded just as David that it was probably better to just use a handful of hard-coded patterns during playouts. I only recently started to do real experiments with hard-coded patterns and so far my results are rather inconclusive. I found when mixing different things it's not always clear what contributes to any increased strength observed. So I'm still in the process of trying to dissect what is actually contributing where. I found for example that a lot of the increased level of play using patterns does not come from using them during playouts but comes from the effect they have on move-exploration. I don't know if this is due to my particular way of implementing MC playouts in combination with UCT search, but moves matching a pattern (usually) automatically make it first in the tree-expansion as well and generally I can say so far I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation. For example, in one particular experiment using just 5 patterns I saw a win-rate of 65% against the same program not using patterns (with the same number of playouts). But when not using the patterns during exploration saw the win-rate drop to just 55%. I still have a lot of testing to do and it's too early to draw any hard conclusions. But I think it's worthwhile trying to distinguish where the strength is actually gained. Better yet, finding out exactly 'why' it gained strength, because with MC playouts I often find results during testing highly counter-intuitive, occasionally to the point of being (seemingly) nonsensical. I also think what Don was proposing with his reference-bot could be interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000 (light) playouts. I don't know if it's possible, but I think it's a fruitful exercise. At a time where most people are looking at using more and more hardware to increase playing-strength, knowing what plays best at the other end of the spectrum is valuable as well. With that I mean, finding what plays best using severely constrained resources. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
I look forward to hearing more! Happy testing. - George On Sun, Nov 16, 2008 at 11:53 PM, Mark Boon [EMAIL PROTECTED] wrote: On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] FW: computer-go] Monte carlo play?
It seems move selection in the playouts should be very random at first and more deterministic toward the end of the playout. Has anyone tried that? Mark Boon wrote: On 17-nov-08, at 02:42, George Dahl wrote: So you say that: ...I'm observing that most of the increase in level comes from the selection during exploration and only in small part from the selection during simulation., could you elaborate at all? This is very interesting. That almost suggests it might be fruitful to use the patterns in the tree, but keep lighter playouts. That's exactly what it's suggesting. But as I said, I need to do some more testing to make a hard case for that. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/