Re: [computer-go] MC approach
The following code did not hurt the strength against self-play in over 2000 games at boardsize 8x8 (faster games) with 10k playouts per move: moveEval[m] = (float)wins[m]/nGames[m] + points[m]/(nGames[m]*Board-Spaces*100) where points[m] is accumulated only for wins. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On 2/9/07, Weston Markham [EMAIL PROTECTED] wrote: I don't seem to have any numbers on this anymore, but I should be able to try some experiments this weekend. I do have some code that does what I describe below. It is also using an all moves as first heuristic. According to my notes, I made this change in an attempt to avoid severely conservative (in my non-expert opinion) moves near the beginning of the game, which seem to be preferred when using all-moves-as-first. It specifically aims for a 30-point lead at the beginning of the game, and reduces this by one point for each turn into the game. I should point out that I am not averaging scores, but simply changing which games I count as wins for the evaluation of a move. This is perhaps not quite what Steve Uurtamo had in mind when he was originally musing about being greedy at the beginning of the game. Nevertheless, it is a very similar sort of idea to what he described, so I thought that I would mention it. Weston On 2/8/07, Chris Fant [EMAIL PROTECTED] wrote: On 2/8/07, Weston Markham [EMAIL PROTECTED] wrote: I believe that I have had some success with an approach like this, actually. I believe that I initially only tally games that are won by a certain margin, and reduce that margin to zero as the game progresses. I am pretty sure that this improves upon straight Monte Carlo. I think that I can get some numbers on it, if anyone is interested. Weston Yes, please. I did run some tests over the weekend, but they did not complete as quickly as I had expected. Anyway, I have enough data at this point to say that although there is nothing spectacular to show one way or the other, I was probably wrong that the code was an improvement. Playing against gnugo 3.7.10, komi 7.5, playing black in 50% of the games: 86/732 games won baseline (11.7%) 106/1000 games won after enabling logic described above. (10.6%, a slight degradation) I also tried a version that dynamically adjusts an expected score, and tallies the number of games where the score exceeds this. It appears to give an improvement: 12.8% of the games were won, out of 1000. Weston ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
I think that you are essentially correct. However, this is only going to affect a small number of games where two different moves are exactly tied for the best winning percentage, after many playouts. Even if the underlying probabilities are exactly the same, you can't really expect this to happen much. Weston On 2/11/07, Heikki Levanto [EMAIL PROTECTED] wrote: But there is a way. If we do N play-outs, the effect of any single of them is 1/N. If we make sure to scale the score to be less than half of this, it can not disturb anything in cases where the number of wins is different. Only in cases with exactly the same number of wins in the play-outs, would the score break the tie. In other words my large constant of 1000 was far too small. It would have to be something like 2NM, where M is the maximum score (say 361). Round it up to 1000N, and we should be safe. I still believe it would make endgames look more reasonable, and possibly even better, in case the winning program has overlooked a detail somewhere, having a large margin of points on the board should act as an insurance against small blunders. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On Mon, Feb 12, 2007 at 11:20:43AM -0500, Weston Markham wrote: I think that you are essentially correct. However, this is only going to affect a small number of games where two different moves are exactly tied for the best winning percentage, after many playouts. Even if the underlying probabilities are exactly the same, you can't really expect this to happen much. I am not sure I agree. In the end of each game there comes a time when the winner is already certain, or very neaqrly so. Even a MC player will see this at some point. When that point comes, there is no direction in pure MC play, except to avoid the worst blunders. The winning program is happy to let a tail be cut off here, a small group die there, and a territory be invaded here, if none of this shakes its unfaltering faith in its victory. Conversely, the loosing program doesn't even try all these tricks, as it sees it looses anyway. Both of them play pure random. It doesn't affect the result, usually. It can be that both players have misjudged something, and having wasted the winning margin, that something may turn out to be valuable anyway. Alas, I don't have my own MC player coded (haven't even started), so I can not make the experiment myself. If someone here would like to try, I'd like to hear of it. -Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On Wed, Feb 07, 2007 at 02:41:22PM -0600, Nick Apperson wrote: If it only did one playout you would be right, but imagine the following cases: case 1: White wins by .5 x 100, Black wins by .5 x 100 case 2: White wins by 100.5 x 91, Black wins by .5 x 109 the method that takes into account score would prefer the second case even though it has a lower winning percentage that may be represented by the fact that white is making an overplay for instance. Obviously this is just one example, but there are many cases like this and overplays tend to be priveledged in a sense I would suspect with this kind of algorithm. I have been thinking about this, and have to agree with you, averaging the results gives pretty small numbers, that can easily be disturbed by adding the winning scores to the mixture. But there is a way. If we do N play-outs, the effect of any single of them is 1/N. If we make sure to scale the score to be less than half of this, it can not disturb anything in cases where the number of wins is different. Only in cases with exactly the same number of wins in the play-outs, would the score break the tie. In other words my large constant of 1000 was far too small. It would have to be something like 2NM, where M is the maximum score (say 361). Round it up to 1000N, and we should be safe. I still believe it would make endgames look more reasonable, and possibly even better, in case the winning program has overlooked a detail somewhere, having a large margin of points on the board should act as an insurance against small blunders. Or am still missing something obvious? - Heikki -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
I don't seem to have any numbers on this anymore, but I should be able to try some experiments this weekend. I do have some code that does what I describe below. It is also using an all moves as first heuristic. According to my notes, I made this change in an attempt to avoid severely conservative (in my non-expert opinion) moves near the beginning of the game, which seem to be preferred when using all-moves-as-first. It specifically aims for a 30-point lead at the beginning of the game, and reduces this by one point for each turn into the game. I should point out that I am not averaging scores, but simply changing which games I count as wins for the evaluation of a move. This is perhaps not quite what Steve Uurtamo had in mind when he was originally musing about being greedy at the beginning of the game. Nevertheless, it is a very similar sort of idea to what he described, so I thought that I would mention it. Weston On 2/8/07, Chris Fant [EMAIL PROTECTED] wrote: On 2/8/07, Weston Markham [EMAIL PROTECTED] wrote: I believe that I have had some success with an approach like this, actually. I believe that I initially only tally games that are won by a certain margin, and reduce that margin to zero as the game progresses. I am pretty sure that this improves upon straight Monte Carlo. I think that I can get some numbers on it, if anyone is interested. Weston Yes, please. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
Quoting Heikki Levanto [EMAIL PROTECTED]: On Wed, Feb 07, 2007 at 04:42:01PM -0500, Don Dailey wrote: In truth the only thing that matters is to increase your winning percentage - not your score. There seems to be no point in tampering with this. I guess I must accept the wisdom of those who have tried these things. Still, it hurts my intuition that it could be better for a program to choose a line where it seems to win by 2 points, when another line seems to end in a 100 point win. What if the opponent can improve his play from what I expected, and gain an extra 3 points somewhere? Maybe all this shows that we have not (yet?) understood all the complications of the MC evaluation, and that more research is needed? No, it is actually quite simple although of course more reseacrh is needed for MC but not on this topic. Here I mght repeat what others already said but I will try to make it clear out of how I understand it. The line that ends in a 100 point win is often only a win if the opponent makes a mistake, otherwise you lose some points or even the game. The fallacy here (I have been there too) is that you think of specific situations where all lines ends with victory. But in real games many lines often exists that wins big but are also risky. You are of course safe when all moves are valued 1000 each. But before that there might be some evaluated for example 998 because the MC eval knows that something may go wrong (a 7ply deep tactical defect exists) if you then add the average score of 2 to 1000 and an average score of 10 to 998 then the program makes a mistake. It is all about eliminating uncertainty. The average score can contain a very large proportion of losees if it is compensated by bigger wins. Since mc eval is based on random games it will always have scores less than the maximum 1000, in won positions that can still be lost by greedy moves. And there is no way of securely determine the point when greedy play. Not even when all moves are evaluated as sure wins, there can be bugs in the program that inflates the win score of moves that leads to a tactical disaster. The solution is to improve the simulation part so that real sure wins are evaluated as sure wins early and among these one can bias the selected moves toward profitable moves. Valkyria does something like this, although I have forgotten the exact details now. This problem I think become less important when the program become stronger in other areas. The reason is that the simulation part plays a very bad endgame. Thus a strong MC/UCT program realizes that more territory is also an insurance against a bad endgame. I have no solid backing for this but my impression is that Valkyria used to win with 0.5 points almost always but now often wins with much more. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
The average score can contain a very large proportion of losees if it is compensated by bigger wins. yes, it is easy to see how this might cripple the play of an MC player. that 90% territory win that requires 3 opponent blunders is tempting enough to ignore the fact that all other non-blundering lines lead to 0.5 point losses. i wonder if this kind of greediness might, however, be useful for selecting, say, the first move or two in a 9x9 game. the thinking here is that since the endgame is essentially noise at this point, you might as well be greedy before tactics become an issue. that's probably faulty intuition, though. on another note, has anyone just let their MC code rip for a day or two (or maybe a week or more) on the first move alone? i would think that if you ordered the distribution of the resulting list, it would give very good information about how well MC acts as a board-eval function. (i.e. turn off all book lines, turn off all rules about not playing on the first line or two early in the game, etc. etc. turn off all heuristics related to the opening and then print the distribution over the board). what are the top, say, 10 moves on a 9x9 board and how are they distributed, and the top, say, 40 moves on a 19x19 board along with their distribution? if you fold board symmetries into your search, i suppose that you can get a factor of 8 here. my thinking is that if it's anything other than a very smooth distribution among the top moves, that's a good indicator. s. It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
I think there are 15 first moves in 9x9 go if you factor out the symetries. UCT isn't good at evauating all the moves, it will pick one of them and spend most of it's time on it.But you could search each 1 at a time. The UCT programs are memory bound, so you could search each of these 15 moves 1 at a time and study the scores. - Don On Thu, 2007-02-08 at 04:41 -0800, steve uurtamo wrote: The average score can contain a very large proportion of losees if it is compensated by bigger wins. yes, it is easy to see how this might cripple the play of an MC player. that 90% territory win that requires 3 opponent blunders is tempting enough to ignore the fact that all other non-blundering lines lead to 0.5 point losses. i wonder if this kind of greediness might, however, be useful for selecting, say, the first move or two in a 9x9 game. the thinking here is that since the endgame is essentially noise at this point, you might as well be greedy before tactics become an issue. that's probably faulty intuition, though. on another note, has anyone just let their MC code rip for a day or two (or maybe a week or more) on the first move alone? i would think that if you ordered the distribution of the resulting list, it would give very good information about how well MC acts as a board-eval function. (i.e. turn off all book lines, turn off all rules about not playing on the first line or two early in the game, etc. etc. turn off all heuristics related to the opening and then print the distribution over the board). what are the top, say, 10 moves on a 9x9 board and how are they distributed, and the top, say, 40 moves on a 19x19 board along with their distribution? if you fold board symmetries into your search, i suppose that you can get a factor of 8 here. my thinking is that if it's anything other than a very smooth distribution among the top moves, that's a good indicator. s. It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
I thought that the memory boundedness was completely fixed by not expanding a UCT node until it has been visited X number of times. Just increase X until you are no longer memory bound. I don't recall anyone reporting a loss in playing strength by doing this. On 2/8/07, Don Dailey [EMAIL PROTECTED] wrote: I think there are 15 first moves in 9x9 go if you factor out the symetries. UCT isn't good at evauating all the moves, it will pick one of them and spend most of it's time on it.But you could search each 1 at a time. The UCT programs are memory bound, so you could search each of these 15 moves 1 at a time and study the scores. - Don On Thu, 2007-02-08 at 04:41 -0800, steve uurtamo wrote: The average score can contain a very large proportion of losees if it is compensated by bigger wins. yes, it is easy to see how this might cripple the play of an MC player. that 90% territory win that requires 3 opponent blunders is tempting enough to ignore the fact that all other non-blundering lines lead to 0.5 point losses. i wonder if this kind of greediness might, however, be useful for selecting, say, the first move or two in a 9x9 game. the thinking here is that since the endgame is essentially noise at this point, you might as well be greedy before tactics become an issue. that's probably faulty intuition, though. on another note, has anyone just let their MC code rip for a day or two (or maybe a week or more) on the first move alone? i would think that if you ordered the distribution of the resulting list, it would give very good information about how well MC acts as a board-eval function. (i.e. turn off all book lines, turn off all rules about not playing on the first line or two early in the game, etc. etc. turn off all heuristics related to the opening and then print the distribution over the board). what are the top, say, 10 moves on a 9x9 board and how are they distributed, and the top, say, 40 moves on a 19x19 board along with their distribution? if you fold board symmetries into your search, i suppose that you can get a factor of 8 here. my thinking is that if it's anything other than a very smooth distribution among the top moves, that's a good indicator. s. It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
Quoting Don Dailey [EMAIL PROTECTED]: I think there are 15 first moves in 9x9 go if you factor out the symetries. UCT isn't good at evauating all the moves, it will pick one of them and spend most of it's time on it.But you could search each 1 at a time. The UCT programs are memory bound, so you could search each of these 15 moves 1 at a time and study the scores. It is perhaps better to do this experiment on 7x7 where it is actually known what the best moves in the opening are so it is possible to judge the quality. I am now not thinking about the first move mut more of finding a high quality principal variation. I did some experiments long time ago with t least several hours of computing time for some opening positions, but found that there were some serious problems for many positions. Improvements or changes to the program seemed to change the evaluation of opening positions almost randomly. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On Thu, 2007-02-08 at 08:59 -0500, Chris Fant wrote: I thought that the memory boundedness was completely fixed by not expanding a UCT node until it has been visited X number of times. Just increase X until you are no longer memory bound. I don't recall anyone reporting a loss in playing strength by doing this. There is a loss of playing strength as X grows, but the loss grows very slowly. Obviously, if you set X really high, it would not build much of a tree and it would play very weak. I have been planning to make this dynamic in Lazarus, but I have not done so yet. The idea is that X starts at 1 but when the table fills up there is a consolidation pass where the table is rebuilt, child nodes may fold back into their parents and X gets reset to a higher value. As the search proceeds this may happen a few time, perhaps doubling the value of X each time. In this way the program continues to scale smoothly. There is never a sudden out of memory wall and you can set the table initiallly to any size that suits you. I haven't bothered because I cannot detect a loss of strength in values up to 100 but it's on my todo list - I know there is a small loss of strength even if I cannot measure it. Right now, Lazarus simply stops searching and returns a move when this happens, but I have X set high enough that it can think for many minutes. - Don On 2/8/07, Don Dailey [EMAIL PROTECTED] wrote: I think there are 15 first moves in 9x9 go if you factor out the symetries. UCT isn't good at evauating all the moves, it will pick one of them and spend most of it's time on it.But you could search each 1 at a time. The UCT programs are memory bound, so you could search each of these 15 moves 1 at a time and study the scores. - Don On Thu, 2007-02-08 at 04:41 -0800, steve uurtamo wrote: The average score can contain a very large proportion of losees if it is compensated by bigger wins. yes, it is easy to see how this might cripple the play of an MC player. that 90% territory win that requires 3 opponent blunders is tempting enough to ignore the fact that all other non-blundering lines lead to 0.5 point losses. i wonder if this kind of greediness might, however, be useful for selecting, say, the first move or two in a 9x9 game. the thinking here is that since the endgame is essentially noise at this point, you might as well be greedy before tactics become an issue. that's probably faulty intuition, though. on another note, has anyone just let their MC code rip for a day or two (or maybe a week or more) on the first move alone? i would think that if you ordered the distribution of the resulting list, it would give very good information about how well MC acts as a board-eval function. (i.e. turn off all book lines, turn off all rules about not playing on the first line or two early in the game, etc. etc. turn off all heuristics related to the opening and then print the distribution over the board). what are the top, say, 10 moves on a 9x9 board and how are they distributed, and the top, say, 40 moves on a 19x19 board along with their distribution? if you fold board symmetries into your search, i suppose that you can get a factor of 8 here. my thinking is that if it's anything other than a very smooth distribution among the top moves, that's a good indicator. s. It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On 2/8/07, steve uurtamo [EMAIL PROTECTED] wrote: i wonder if this kind of greediness might, however, be useful for selecting, say, the first move or two in a 9x9 game. the thinking here is that since the endgame is essentially noise at this point, you might as well be greedy before tactics become an issue. that's probably faulty intuition, though. I believe that I have had some success with an approach like this, actually. I believe that I initially only tally games that are won by a certain margin, and reduce that margin to zero as the game progresses. I am pretty sure that this improves upon straight Monte Carlo. I think that I can get some numbers on it, if anyone is interested. Weston ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))
-Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 7 Feb 2007 5:34 AM Subject: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)) On Wed, Feb 07, 2007 at 12:06:40PM +0200, Tapani Raiko wrote: Let my try again using the handicap example. Let's say MC player is given a huge handicap. In the simulations, it is winning all of its games, so there is no information helping to select the next move. This situation happens in normal games too, once one player is so much ahead that it wins almost no matter what. It leads into really stupid-looking endgames, where live groups are allowed to die, and dead ones are allowed to be rescued. All this could be avoided by a simple rule: Instead of using +1 and -1 as the results, use +1000 and -1000, and add the final score to this. The purpose of the large constant (1000) is to make sure that it prefers any win to any loss (so that large_win + small_loss small_win + small_win). One could even add another term in the result, favouring games that end early (for the winner) or postpone them (for the looser), in hope of allowing the opponent more chances to make mistakes. As far as I can see, this ought to fit straight in to any MC or UCT program. It may not improve the winning chances, but it sure should make the programs play look more reasonable. Just my humble idea. Feel free to shoot down (with serious arguments), and/or use where ever you like. I would like to hear if this makes any practical difference, if anyone tries. Intuitively, it seems like this should work. You only give the winning margin a small weight, or only use it to break ties, or only apply it after the game is already decided. I've tried many variations, as have others, including your exact algorithm above. It can make some moves look a little prettier, but it always causes problems and I have to take it out. I have my theories as to why that is, but for brevity's sake, in my experience, giving any consideration to winning margin is detrimental. After the game is decided, there are more elegant ways to bring it to a close. I came to this conclusion reluctantly. - Dave Hillis Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))
I should have mentioned that I have only tested on 9x9. For larger boards, I don't know. - Dave Hillis ` Intuitively, it seems like this should work. You only give the winning margin a small weight, or only use it to break ties, or only apply it after the game is already decided. I've tried many variations, as have others, including your exact algorithm above. It can make some moves look a little prettier, but it always causes problems and I have to take it out. I have my theories as to why that is, but for brevity's sake, in my experience, giving any consideration to winning margin is detrimental. After the game is decided, there are more elegant ways to bring it to a close. I came to this conclusion reluctantly. Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))
If I recall correctly, someone spoke of constraining the opening moves to the 3rd,4th,and 5th lines in the absence of nearby stones, or something to that effect. What was the impact of this experiment? I notice the recent discussion of the need for a lot of thinking time to find good opening moves; would this thinking time be reduced with a better-than-random selection of opening moves during the playouts? If fuseki and joseki databases were used to bias the playouts, would the speed and quality of opening play improve? One concern about using expert knowledge of this sort - what happens when one's opponent plays out of book? Human players often find this frustrating - we sense that moves which deviate from joseki are wrong, but finding the correct refutation under time pressure is not always easy. To address this concern: at least one MC player uses an opening book; would it be profitable to automatically analyze past losses, and devote a few 100k playouts to look for better replies to plays which were favored by opponents in past games? This would be the analogue to advice often given to human players: study your own lost games; look for improvements. Unfortunately, an opening book does not generalize well; learning a better move for position Xi won't help with position Xj which differs by even one stone from Xi. What sort of move generators would cope with the vast number of similar positions? A single stone ( a ladder-breaker or a key point which makes or denies a second eye, for instance ) can make a large difference in the score, but one hopes that MC playouts would discover the negative consequences of moves which are almost but not quite right. Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))
What sort of sampling was used for the playouts? For this variable ( incorporating some information about the score vs only the win-loss variable ), does it make a difference whether playouts are totally random or incorporate varying degrees of similitude to good play? From: [EMAIL PROTECTED] [EMAIL PROTECTED] I should have mentioned that I have only tested on 9x9. For larger boards, I don't know. - Dave Hillis ` Intuitively, it seems like this should work. You only give the winning margin a small weight, or only use it to break ties, or only apply it after the game is already decided. I've tried many variations, as have others, including your exact algorithm above. It can make some moves look a little prettier, but it always causes problems and I have to take it out. I have my theories as to why that is, but for brevity's sake, in my experience, giving any consideration to winning margin is detrimental. After the game is decided, there are more elegant ways to bring it to a close. I came to this conclusion reluctantly. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))
terry mcintyre wrote: If I recall correctly, someone spoke of constraining the opening moves to the 3rd,4th,and 5th lines in the absence of nearby stones, or something to that effect. What was the impact of this experiment? For what it's worth, I tried a number of experiments along these lines and none of them produced any improvement in play whatsoever. I'm still baffled by this. Sylvain says, both in his paper and in a message posted here, that it seems that one must try to constrain moves based on sequences of moves rather than individual moves in isolation. All rather mysterious to me. -Richard ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))
The tests that involved factoring in the margin of victory while the game was still in play, used pure random playouts (or relatively close to it.) Later, I did some tests with esthetics as a goal in itself, and for these, I used what I call a heavy playout. I doubt that the playout type makes a difference but I don't know that for sure. If there was significant curiosity on this list about a specific configuration, I could probably run a quick test. - Dave Hillis -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 7 Feb 2007 12:42 PM Subject: Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)) What sort of sampling was used for the playouts? For this variable ( incorporating some information about the score vs only the win-loss variable ), does it make a difference whether playouts are totally random or incorporate varying degrees of similitude to good play? From: [EMAIL PROTECTED] [EMAIL PROTECTED] I should have mentioned that I have only tested on 9x9. For larger boards, I don't know. - Dave Hillis ` Intuitively, it seems like this should work. You only give the winning margin a small weight, or only use it to break ties, or only apply it after the game is already decided. I've tried many variations, as have others, including your exact algorithm above. It can make some moves look a little prettier, but it always causes problems and I have to take it out. I have my theories as to why that is, but for brevity's sake, in my experience, giving any consideration to winning margin is detrimental. After the game is decided, there are more elegant ways to bring it to a close. I came to this conclusion reluctantly. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Any questions? Get answers on any topic at Yahoo! Answers. Try it now. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
Don Dailey wrote: On Wed, 2007-02-07 at 11:34 +0100, Heikki Levanto wrote: All this could be avoided by a simple rule: Instead of using +1 and -1 as the results, use +1000 and -1000, and add the final score to this. Heikki, I've tried ideas such as this in the past and it's quite frustrating - everything that tries to take territory scoring into account weakens the program. If you just need to see prettier moves, I think it is good enough to priortize the moves using some other algorithm at the root of the tree. If you just cover the case where a program is easily winning or losing it will play nicer but not stronger. Don, do you have any theories or information about why this is the case? I would think either way the algorithm should always prefer higher average win probabilities, but faced with alternatives where the win probabilities are same or nearly the same but the average winning margins are higher for one alternative, wouldn't it be better to take the path with the better margin? I mean it may in fact be wrong about the win/loss classifications so choosing the better scores would seem to make sense within reason as long as it's not greedy. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
If it only did one playout you would be right, but imagine the following cases: case 1: White wins by .5 x 100, Black wins by .5 x 100 case 2: White wins by 100.5 x 91, Black wins by .5 x 109 the method that takes into account score would prefer the second case even though it has a lower winning percentage that may be represented by the fact that white is making an overplay for instance. Obviously this is just one example, but there are many cases like this and overplays tend to be priveledged in a sense I would suspect with this kind of algorithm. - Nick On 2/7/07, Matt Gokey [EMAIL PROTECTED] wrote: Don Dailey wrote: On Wed, 2007-02-07 at 11:34 +0100, Heikki Levanto wrote: All this could be avoided by a simple rule: Instead of using +1 and -1 as the results, use +1000 and -1000, and add the final score to this. Heikki, I've tried ideas such as this in the past and it's quite frustrating - everything that tries to take territory scoring into account weakens the program. If you just need to see prettier moves, I think it is good enough to priortize the moves using some other algorithm at the root of the tree. If you just cover the case where a program is easily winning or losing it will play nicer but not stronger. Don, do you have any theories or information about why this is the case? I would think either way the algorithm should always prefer higher average win probabilities, but faced with alternatives where the win probabilities are same or nearly the same but the average winning margins are higher for one alternative, wouldn't it be better to take the path with the better margin? I mean it may in fact be wrong about the win/loss classifications so choosing the better scores would seem to make sense within reason as long as it's not greedy. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
That drives me nuts! Minimax search would eliminate bad lines of play whenever a refutation is found. A good opponent would not play badly, and the quantity of possible bad moves should not affect the evaluation of good moves - but that seems to be what MC does, averaging out all moves regardless of whether they are known to be good, have been refuted, or are of indeterminate status. What am I missing? Terry McIntyre - Original Message From: Nick Apperson [EMAIL PROTECTED] If it only did one playout you would be right, but imagine the following cases: case 1: White wins by .5 x 100, Black wins by .5 x 100 case 2: White wins by 100.5 x 91, Black wins by .5 x 109 the method that takes into account score would prefer the second case even though it has a lower winning percentage that may be represented by the fact that white is making an overplay for instance. Obviously this is just one example, but there are many cases like this and overplays tend to be priveledged in a sense I would suspect with this kind of algorithm. - Nick Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On Wed, 2007-02-07 at 14:08 -0600, Matt Gokey wrote: Don, do you have any theories or information about why this is the case? Not really. In truth the only thing that matters is to increase your winning percentage - not your score. There seems to be no point in tampering with this. - Don I would think either way the algorithm should always prefer higher average win probabilities, but faced with alternatives where the win probabilities are same or nearly the same but the average winning margins are higher for one alternative, wouldn't it be better to take the path with the better margin? I mean it may in fact be wrong about the win/loss classifications so choosing the better scores would seem to make sense within reason as long as it's not greedy. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
It's a *weighted* average of all moves. UCT tree search doesn't explore bad moves as often as good ones, so they don't contribute as much to the estimation of the worth of a node. - Dave Hillis -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 7 Feb 2007 4:31 PM Subject: Re: [computer-go] MC approach That drives me nuts! Minimax search would eliminate bad lines of play whenever a refutation is found. A good opponent would not play badly, and the quantity of possible bad moves should not affect the evaluation of good moves - but that seems to be what MC does, averaging out all moves regardless of whether they are known to be good, have been refuted, or are of indeterminate status. What am I missing? Terry McIntyre - Original Message From: Nick Apperson [EMAIL PROTECTED] If it only did one playout you would be right, but imagine the following cases: case 1: White wins by .5 x 100, Black wins by .5 x 100 case 2: White wins by 100.5 x 91, Black wins by .5 x 109 the method that takes into account score would prefer the second case even though it has a lower winning percentage that may be represented by the fact that white is making an overplay for instance. Obviously this is just one example, but there are many cases like this and overplays tend to be priveledged in a sense I would suspect with this kind of algorithm. - Nick It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
The only think I can suggest which perhaps hasn't been tried is to only consider the score in the evaluation if NONE of the playouts resulted in a loss. On 2/7/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: It's a *weighted* average of all moves. UCT tree search doesn't explore bad moves as often as good ones, so they don't contribute as much to the estimation of the worth of a node. - Dave Hillis -Original Message- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 7 Feb 2007 4:31 PM Subject: Re: [computer-go] MC approach That drives me nuts! Minimax search would eliminate bad lines of play whenever a refutation is found. A good opponent would not play badly, and the quantity of possible bad moves should not affect the evaluation of good moves - but that seems to be what MC does, averaging out all moves regardless of whether they are known to be good, have been refuted, or are of indeterminate status. What am I missing? Terry McIntyre - Original Message From: Nick Apperson [EMAIL PROTECTED] If it only did one playout you would be right, but imagine the following cases: case 1: White wins by .5 x 100, Black wins by .5 x 100 case 2: White wins by 100.5 x 91, Black wins by .5 x 109 the method that takes into account score would prefer the second case even though it has a lower winning percentage that may be represented by the fact that white is making an overplay for instance. Obviously this is just one example, but there are many cases like this and overplays tend to be priveledged in a sense I would suspect with this kind of algorithm. - Nick It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam and email virus protection. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On Wed, Feb 07, 2007 at 04:42:01PM -0500, Don Dailey wrote: In truth the only thing that matters is to increase your winning percentage - not your score. There seems to be no point in tampering with this. I guess I must accept the wisdom of those who have tried these things. Still, it hurts my intuition that it could be better for a program to choose a line where it seems to win by 2 points, when another line seems to end in a 100 point win. What if the opponent can improve his play from what I expected, and gain an extra 3 points somewhere? Maybe all this shows that we have not (yet?) understood all the complications of the MC evaluation, and that more research is needed? -H -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
On Thu, 2007-02-08 at 00:46 +0100, Heikki Levanto wrote: On Wed, Feb 07, 2007 at 04:42:01PM -0500, Don Dailey wrote: In truth the only thing that matters is to increase your winning percentage - not your score. There seems to be no point in tampering with this. I guess I must accept the wisdom of those who have tried these things. Still, it hurts my intuition that it could be better for a program to choose a line where it seems to win by 2 points, when another line seems to end in a 100 point win. What if the opponent can improve his play from what I expected, and gain an extra 3 points somewhere? Maybe all this shows that we have not (yet?) understood all the complications of the MC evaluation, and that more research is needed? -H I agree, more research is needed. My intuition is also the same as yours - one would think 100 point win is better. It's not that it's better or it's a better move objectively, but it should be better against a fallible opponent who could easily get distracted by the little battles and forget the real point of the game. At least it's a way to fight when you are losing. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
But of course, it's not the size of the win that counts, it is rather the confidence that it really is a win. In random playouts that continue from a position from a close game, the ones that result in a large victory are generally only ones where the opponent made a severe blunder. (Put another way, the score of the game is affected more by how bad the bad moves are, rather than how good the good ones are, or even how good most of the moves are. Others have commented on this effect in this list, in other contexts.) Since you can't count on that happening in the real game, these simulations have a lower value in the context of ensuring a win. Even when the program is losing (say by a little bit) it is more important to play moves that it thinks are the most likely to convert the game to a win by confusing the opponent, rather than to play moves that will make the losing outcome more apparent. These tend to be different moves, and Monte Carlo methods are good at uncovering these differences. I agree that this is a bit surprising, but I find it much less so when I think about it in these terms. Given that people have reported such a strong effect, I am actually wondering if these simulations (those that result in a large score difference) should be _penalized_, for not being properly representative of the likely outcome of the game. In other words: value = 1000 * win - score Weston On 2/7/07, Don Dailey [EMAIL PROTECTED] wrote: My intuition is also the same as yours - one would think 100 point win is better. It's not that it's better or it's a better move objectively, but it should be better against a fallible opponent who could easily get distracted by the little battles and forget the real point of the game. At least it's a way to fight when you are losing. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MC approach
Matt Gokey: [EMAIL PROTECTED]: Weston Markham wrote: But of course, it's not the size of the win that counts, it is rather the confidence that it really is a win. Yes, and my reasoning was that a larger average win implied a higher confidence since there is more room for error. That intuition may not hold though. In random playouts that continue from a position from a close game, the ones that result in a large victory are generally only ones where the opponent made a severe blunder. (Put another way, the score of the game is affected more by how bad the bad moves are, rather than how good the good ones are, or even how good most of the moves are. Others have commented on this effect in this list, in other contexts.) Since you can't count on that happening in the real game, these simulations have a lower value in the context of ensuring a win. That is the first reasonable argument I've heard that makes some sense as to why this effect may be true. The opposite of course may be true as well and close games may really not be close due to the same blunder effect. Perhaps it is just another symptom of the fact that most playouts are nonsense games. We can test this effect by using, for example, v = 0.5 * (1.0 + tanh (k * score)); // v is in [0...1]. with a little penalty of simulation speed. As k being lager, this function closes to commonly used threshold function, and vice versa. I guess the best value of k depends on the sensefulness of the games, ie., current heuristics for pruning moves are not so effective that larger k is the best. - gg -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/