[computer-go] UCT RefBot
After getting some strange results with my experiments I realised that doing these tests on the reference-bot is probably not appropriate. The MC+AMAF combination is a completely different beast than MC+UCT-search. So I need to do the testing on a reference implementation for MC+UCT- search. People have mentioned here before they'd like to see something like that. I have something like a reference implementation that I use for my own experiments. But I'm hardly an expert on either MC or UCT so I'm a bit hesitant to propose a reference implementation. But maybe I can make a start. At least we do have a good standard definition of a reference MC algorithm. So I'm not going to repeat that here apart from saying it plays uniformly random simulations of moves that are legal and don't fill in an eye. That leaves the UCT part, so let me take a stab at it: - Let's say we're going to limit the search to N Monte-Carlo simulations. - For each simulation we need to determine which node in the tree to expand. This involves a recursive definition starting from the root- node. If a node is incomplete it is expanded further with the next move, the selection of which is done randomly by the MC algorithm. (A node is 'complete' when it has all the moves that the MC algorithm can generate as its children. I define this the case when the MC algorithm produces a pass-move. See below.) - If a node is 'complete', it tries to (recursively) expand the child with the best UCT value. This value is a combination of the win-loss ratio recorded in that node plus the UCT value. So the values used for comparison are based on the following formula: wins / visits + exploration-factor * sqrt( 2 * (log(parent-visits) / (10* visits)) - I'm actually using an exploration-factor of 1.0 at the moment and have no idea what is the best or standard value for light playouts. - When the node to expand is selected, a random move is chosen for which it creates a new node. Next, provided the selected move is not a pass, a MC playout is performed. The win-visits ratio of the node just created is set to 1/1 or 0/1 depending on the playout result. It also updates the win-visits ratio of its parent(s) by the same amount until it reaches the root-node of the tree. - When the randomly selected move is a pass it doesn't expand the tree unless the game is nearly finished. The node is set to 'complete'. The game is considered 'nearly finished' when all empty points have more than 2 occupied neighbors. In that case it counts the score and sees if the side that passes would win the game. If it does win it updates the wins-visits ratio, otherwise it doesn't do anything. - When it reaches N simulations, the child of the root-node with the best wins-visits ratio is played. I've also seen that simpy the child with the highest number of visits is selected. Is there a difference? - Towards the end of the game, it may not be possible to do N expansions of the tree. To prevent an endless loop, I record how many times it failed to expand the tree and exit the loop when it exceeds a certain number (currently set to 1000). The treatment of pass feels a bit convoluted. The idea behind it is that I don't want to allow pass-moves in the tree early in the game. But towards the very end of the game I want to allow pass to avoid being forced to make a bad move or not being able to make a move at all. I'd also like to hear opinions on what would be a good N for a reference bot. If I set N to 2,000, just like the reference-bots on CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't tested it a lot yet, but I think the break-even point against the MC- AMAF ref bot would be somewhere around N=10,000. Any opinons, ideas or criticism are welcome. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
I could find anything problematic with your specification so I just make some comments. Quoting Mark Boon [EMAIL PROTECTED]: - When it reaches N simulations, the child of the root-node with the best wins-visits ratio is played. I've also seen that simply the child with the highest number of visits is selected. Is there a difference? The following can happen if you chose the move with the best wins-visits ratio: If the search is very selective (for example using progressive widening or AMAF (RAVE)) a move can be searched a small number of times and get a really good ration and selected. In Valkyria which uses both methods of selectivity there is always moves that have higher win-visits ratios than the most searched move (19x19). If you do plain MC and UCT I think win-visits may be just as good as highest number of visits, because the search will be quite wide. Using the highest number of visits is sort of robust. Even if you know the move is not as good as it initially seemed, you sort of know that one has to search deep to see this and maybe the opponent cannot see the refuting move. Also the reason for the low win-visits ratio may be that problems in the position will give a low ratio for all moves if they are searched deep enough. This is by definition true in any lost position. If you can prove that all moves are losing the move that required the largest tree for the proof is the candidate for provoking a mistake. Valkyria plays fast whenever win-visits and highest number of visits agree and slow otherwise, in an attempt to eliminate uncertainty. I'd also like to hear opinions on what would be a good N for a reference bot. If I set N to 2,000, just like the reference-bots on CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't tested it a lot yet, but I think the break-even point against the MC-AMAF ref bot would be somewhere around N=10,000. What about doing a MC-AMAF-UCT version. Or perhaps just simply try a MC-AMAF-TS in a best win-visits ratio first manner? Best Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
Mark Boon wrote: The treatment of pass feels a bit convoluted. The idea behind it is that I don't want to allow pass-moves in the tree early in the game. But towards the very end of the game I want to allow pass to avoid being forced to make a bad move or not being able to make a move at all. You could simply allow the pass all the time. Early in the game, it will be a significantly inferior move and not often explored in the tree. That may not be optimal, but it's certainly not convoluted and you are guaranteed to never fail to generate the pass move when you needed it. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
Thanks for the comments Magnus. On 20-nov-08, at 13:00, Magnus Persson wrote: I'd also like to hear opinions on what would be a good N for a reference bot. If I set N to 2,000, just like the reference-bots on CGOS, it plays rather poorly. Much worse than the ref-bot. I haven't tested it a lot yet, but I think the break-even point against the MC-AMAF ref bot would be somewhere around N=10,000. What about doing a MC-AMAF-UCT version. Or perhaps just simply try a MC-AMAF-TS in a best win-visits ratio first manner? To start off, I wanted to keep things simple. But to be honest, I hadn't given AMAF (or RAVE?) in combination with tree-search much thought yet. Only yesterday did I look up an article describing RAVE and it's actually not entirely clear to me yet how this would be best implemented. I also did a little searching for past posts to this mailing-list but it did little to clarify things in my mind. The way I understood the article, after a playout it updates all the nodes at the current level of all the moves played during the playout (if it's a win for the player) with a RAVE value that is used in a similar fashion to the UCT value of a node. Only of the current node does it update the win-visit ratio. Is that correct? This implies creating a lot more nodes than I'm currently doing. I have seen remarks that others postpone expanding a node until a certain number of simulations have been done. I never quite understood the need for that, but maybe this has to do with the fact that AMAF requires a much larger number of node-creations? The way I had implemented my UCT-search it became very intuitive how to make the MC playout strategy prioritize certain moves in case of heavy (or semi-light) playouts and needed zero modirfications to the actual search algorithm. I'd probably need to completely review things when a RAVE value starts to influence move-priority 'a priori'. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
On 20-nov-08, at 13:22, Michael Williams wrote: Mark Boon wrote: The treatment of pass feels a bit convoluted. The idea behind it is that I don't want to allow pass-moves in the tree early in the game. But towards the very end of the game I want to allow pass to avoid being forced to make a bad move or not being able to make a move at all. You could simply allow the pass all the time. Early in the game, it will be a significantly inferior move and not often explored in the tree. That may not be optimal, but it's certainly not convoluted and you are guaranteed to never fail to generate the pass move when you needed it. I remember that's how I implemented it initially. I do not remember the exact details of why I moved away from that, but I believe that (in part) it had to do with the program occasionally passing too early in the game. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
You could simply allow the pass all the time. Early in the game, it will be a significantly inferior move and not often explored in the tree. That may not be optimal, but it's certainly not convoluted and you are guaranteed to never fail to generate the pass move when you needed it. I remember that's how I implemented it initially. I do not remember the exact details of why I moved away from that, but I believe that (in part) it had to do with the program occasionally passing too early in the game. When you say too early, do you mean before the game is decided or before all dead stones are removed. Only the first one is truly too early. In other words, there is no harm in leaving a dead stone on the board given that it does not impact who the winner is. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
On 20-nov-08, at 14:03, Michael Williams wrote: You could simply allow the pass all the time. Early in the game, it will be a significantly inferior move and not often explored in the tree. That may not be optimal, but it's certainly not convoluted and you are guaranteed to never fail to generate the pass move when you needed it. I remember that's how I implemented it initially. I do not remember the exact details of why I moved away from that, but I believe that (in part) it had to do with the program occasionally passing too early in the game. When you say too early, do you mean before the game is decided or before all dead stones are removed. Only the first one is truly too early. In other words, there is no harm in leaving a dead stone on the board given that it does not impact who the winner is. The problem is there were large interval's while working on it. So sometimes I don't exactly remember why I made certain decisions. (The older you get, the more you learn how important it is to document these kind of decisions ;-( ) I only remember being particularly annoyed by it. So possibly it occasionally passed too early in both cases you mentioned. It's also possible that it was a solution to a problem that doesn't exist anymore due to other changes / improvements in the code made in the meantime. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
Quoting Mark Boon [EMAIL PROTECTED]: Thanks for the comments Magnus. On 20-nov-08, at 13:00, Magnus Persson wrote: The way I understood the article, after a playout it updates all the nodes at the current level of all the moves played during the playout (if it's a win for the player) with a RAVE value that is used in a similar fashion to the UCT value of a node. Only of the current node does it update the win-visit ratio. Is that correct? This implies creating a lot more nodes than I'm currently doing. I have seen remarks that others postpone expanding a node until a certain number of simulations have been done. I never quite understood the need for that, but maybe this has to do with the fact that AMAF requires a much larger number of node-creations? I think you understand the basic principle of RAVE correctly. The hard part is how to weight together the AMAF value (which I call *virtual win-visit* ratio) with the actual win-visit ratio. And I have to admit that my implementation of RAVE in Valkyria might be a quite unique interpretation of the somewhat hard to understand information that has been posted here. I just implemented something that seemed to work and tested the parameters thoroughly. But I think I need to do new tests since there has been so many changes. Especially I should do test on large boards. Valkyria postpones expansion of new nodes about 10 times, and an expansion includes all children of a position. This is necessary in order to update virtual win-visit scores for all moves played of the color for the current simulation even if there is only one. Thus a node in the tree is a list of children for a given position. This simplifies code that bias the tree search, since all necessary algorithms are executed simultaneously for all children during expansion. This is perhaps not efficient, but is perhaps less prone to bugs. The effect of RAVE with a complete expansion is that after a few simulations (let us say 40) we have a rather good sample of virtual win-visit scores for all moves. Also if patterns biases exploration all real win-visit scores has been collected for those moves (often a single move) that are most likely to be the best. Thus the win rate for this position is already close to that of the best move. This is different from pure MC-UCT where all candidate moves are searched once and the win-rate initially is the average of all moves, bad as well as good ones. It takes a lot of time until MC-UCT will converge on to the winrate of the best move in each position. Simply put combining virtual and real win-visit values for searching the tree lets us safely ignore the worst moves in all positions and the search basically only searches moves that our pattern heuristics predicted to be good or moves that were discovered thanks to AMAF. The search is very selective when it works. And of course it often misses the best move. But it kind of works because a) often the second best move wins as well b) the selective search searches so deep effectively so it corrects itself quickly. Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
On 20-nov-08, at 14:03, Michael Williams wrote: You could simply allow the pass all the time. Early in the game, it will be a significantly inferior move and not often explored in the tree. That may not be optimal, but it's certainly not convoluted and you are guaranteed to never fail to generate the pass move when you needed it. I remember that's how I implemented it initially. I do not remember the exact details of why I moved away from that, but I believe that (in part) it had to do with the program occasionally passing too early in the game. When you say too early, do you mean before the game is decided or before all dead stones are removed. Only the first one is truly too early. In other words, there is no harm in leaving a dead stone on the board given that it does not impact who the winner is. I'm looking at my code and it's coming back to me. When deciding which node to expand, which is a recursive process, I stop when encountering two passes in a row. Instead of expanding the tree in that case, I just determine the winner and update the win-visits numbers in the tree. I believe I did this to allow for early termination of the search. Without this, it would continue the search until the time-allowance was depleted. This would cause a great waste of thinking time towards the end of the game when the maximum possible search-tree is much smaller than N. But with the termination on two consecutive passes, what would occasionally happen is that one side would pass and the simulation would return a win. Then the other side could pass as well and based on the number of stones still on the board, now the second player would win. This caused problems. Therefore I decided only to allow a pass towards the end of the game when it would also win when the opponents stones still on the board are all considered alive. I hope that makes sense :) Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
About allowing early passes. I think my problem is that RAVE/AMAF really say nothing about the value of pass moves, which makes it problematic when selective search do not search bad moves such as pass. So how are we supposed to know when to search pass and not? Thus, everytime I had some kind of design flaw or a bug in Valkyria it seemed to play early passes because of some weird interaction in my code. So allowing passes early seemed to be a good bug catcher... on the other hand it also seemed to cause bugs for me. I do not even remember anymore how Valkyria handles pass because I changed it so many times and had to add a lot of kludges to avoid problems with older kludges and so on... -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
On 20-nov-08, at 14:42, Magnus Persson wrote: I think you understand the basic principle of RAVE correctly. The hard part is how to weight together the AMAF value (which I call *virtual win-visit* ratio) with the actual win-visit ratio. And I have to admit that my implementation of RAVE in Valkyria might be a quite unique interpretation of the somewhat hard to understand information that has been posted here. I just implemented something that seemed to work and tested the parameters thoroughly. But I think I need to do new tests since there has been so many changes. Especially I should do test on large boards. Valkyria postpones expansion of new nodes about 10 times, and an expansion includes all children of a position. This is necessary in order to update virtual win-visit scores for all moves played of the color for the current simulation even if there is only one. Thus a node in the tree is a list of children for a given position. This simplifies code that bias the tree search, since all necessary algorithms are executed simultaneously for all children during expansion. This is perhaps not efficient, but is perhaps less prone to bugs. The effect of RAVE with a complete expansion is that after a few simulations (let us say 40) we have a rather good sample of virtual win-visit scores for all moves. Also if patterns biases exploration all real win-visit scores has been collected for those moves (often a single move) that are most likely to be the best. Thus the win rate for this position is already close to that of the best move. This is different from pure MC-UCT where all candidate moves are searched once and the win-rate initially is the average of all moves, bad as well as good ones. It takes a lot of time until MC- UCT will converge on to the winrate of the best move in each position. Simply put combining virtual and real win-visit values for searching the tree lets us safely ignore the worst moves in all positions and the search basically only searches moves that our pattern heuristics predicted to be good or moves that were discovered thanks to AMAF. The search is very selective when it works. And of course it often misses the best move. But it kind of works because a) often the second best move wins as well b) the selective search searches so deep effectively so it corrects itself quickly. Again, thanks for sharing. So it seems by and large I understood it in a similar fashion as you did :-) What is not exactly clear to me is what you mean by 'postponing expansion'. Let me write it in my own words to see if that's what you mean. When you have selected a best node based on the UCT + wins/ visits value which has no children yet, you first simply do a simulation and collect the playout result in the current node, including the AMAF value that you call 'virtual win-visit ratio, and only when that is done a certain number of times (in your case 10) do you suddenly create all the children and weight them based on the virtual win-visit ration and possibly weight them based on other move- priorities that resulted from 'heavy' playout selection? When thinking about it this way it makes sense. I already had a nagging feeling about the requirement of visiting each move at least once. Especially on 19x19 I could see that becoming a burden. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
On 20-nov-08, at 14:51, Magnus Persson wrote: and had to add a lot of kludges to avoid problems with older kludges and so on... Hey, that sounds strangely familiar :) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] UCT RefBot
On 20-nov-08, at 15:20, Magnus Persson wrote: Quoting Mark Boon [EMAIL PROTECTED]: What is not exactly clear to me is what you mean by 'postponing expansion'. Let me write it in my own words to see if that's what you mean. When you have selected a best node based on the UCT + wins/ visits value which has no children yet, you first simply do a simulation and collect the playout result in the current node, including the AMAF value that you call 'virtual win-visit ratio, and only when that is done a certain number of times (in your case 10) do you suddenly create all the children and weight them based on the virtual win-visit ration and possibly weight them based on other move-priorities that resulted from 'heavy' playout selection? Yes I suddenly create all children, but at the creation I have no simulation and thus no virtual win-visit ratios for the children (although one might copy such values from higher up in the tree, which I think the Mogo team tried but with little or no success). The virtual win-visit ratios are initialized to some default value. But one can initialize the ratios differently depending on static evaluation the position using patterns for example. Or the proximity heuristic. It is not clear to me how to best do this. And I need to test the parameters for biasing. The nice thing is that one can initialize the virtual win-visits ratios and keep real win- visits ratios unbiased. You can afford to make mistakes here because if the position is searched a lot the virtual values get data much quicker than the real ones. OK, things start to fall in place for me. I was wondering all this time what would happen with the information of the simulations that happen before expansion. So the answer is: nothing. But at least the result of the playout gets percolated up the tree, so I suppose you get the most important bit of information of the playout in your tree anyway. I have also been wondering before what was meant when I read that CrazyStone uses ownership information in the patterns (if I remember that correctly). I was always wondering how you got ownership information before doing playouts. So it turns out these come from simulations done before expansion. Somehow I had disregarded that idea because I was under the assumption it would discard too much information. But I suppose keeping the playout results in the tree is valuable enough by itself. And like you say, you win back a lot by not having to do at least a playout for every single move at every level in the tree. From the beginning I have been experimenting with ownership information and I think I have a few interesting ideas. But they didn't fit in yet. Maybe with my new understanding I can go back and try out a few of those ideas again. It means rewriting my search-code a bit. Hopefully it's not too much work. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: UCT RefBot
My first monte carlo programs used ownership info very effectively, but it can be that by using AMAF this information is used already. As a relative beginner in these matters, the more I look at AMAF, the less I like it, and I think that has to do with AMAF ignoring relative sequencing. By looking at all moves as if they were played first, it ignores that some moves only make sense after certain other moves. I haven't yet been able to pin this down with evidence, because the the game is lost, all moves are equally bad effect tends to obscure it (I guess I need to insert resign moves at the right places to limit the game records to the interesting pieces, even if that means my bot can't score the result). If I ignore that the Go board is 2d (dimensions, not dans;-), and think only of time (number of moves), space (board positions), and alternatives (simulation runs), then ownership maps project this 3d spacetime multiverse onto a 2d space histogram (likelihood of each position being black/white) while AMAF tries to project these three dimensions onto a 2d time histogram (likelihood of each move resulting in win/lose). As the various tweaks (don't count moves by other player, don't count moves already played, value early moves higher than later moves) and remaining bot blunders demonstrate, that doesn't quite work right, in spite of first appearances. My (as yet unproven) hypothesis is that the issues stem from AMAF trying to ignore the time dimension of its projection, hence you get highly rated moves that place themselves into atari (because that would have been the last, necessary, move to capture the opponent group, *after* surrounding it), that would be suicide right now (but not at some later point in many playouts), or even on an occupied position (no longer occupied later in many playouts), that try to fill an opponent's eye (which would make sense only if the second, nearly complete eye were prevented first, but that doesn't always happen in the same way, hence lower rating for each of the possible ways). Giving priority to early moves alleviates this somewhat, as the experiments show, but then you still don't get perfect loss or win results, even if the board is effectively owned by one of the players, because the tweaks have removed some data from the statistics to cover the worst holes. There should be a variation of AMAF that records move win rates in move sequences (these moves are good, but this one always comes after that one; never try this one unless you are sure you can make that one as well), preserving more of the structure in the time dimension of its projection without just reproducing the whole 3d input data space. Given that ownership maps don't suffer from this kind of trouble (unless one collapses their information to score, which has been said to perform badly), it is surprising that they are hard to make good use of. It might just be a matter of asking different questions: AMAF variants try to answer which move is likely best to make first; ownership maps answer which positions are likely to be mine in the end. The latter is not as directly useable, at least not as long as we are only asking about the next move. That continues with UCT: apart from offering scaffolding for introducing more Go knowledge, it also introduces a way of interleaving use of playout information with continued generation of it, but it tends to do so in a way biased towards move sequences. Again, not the kind of question that ownership maps can offer answers to, but that just means that there are other questions we are not asking yet. Perhaps there is a complement to UCT/Amaf that achieves such interleaving based on territorial information? One could allocate the first half of simulation runs to acquiring an ownership map, then use that information to narrow down the second half of simulations to the more interesting parts of the board (perhaps less than half the runs are needed to weigh the rest successfully?). Claus PS. Is there a location where one can find the current versions of the plain Amaf reference bots? I seem to be reaching the stage where issues I find are not always in my own code, but I don't know whether, eg, JrefBot 081016-2022 is the latest version). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Re: UCT RefBot
Claus, I think you are raising some very valid questions. I'm a bit ambivalent towards AMAF for very similar reasons. One thing in defense of AMAF though, is that it doesn't necessarily need to make Go-sense to be useful. MC simulations also don't make much Go-sense. For example, moves are generally rewarded highly just because they allow for the possibility of the other side of putting himself into atari with a big group. So they ignore the time dimension almost just as much as AMAF does. Yet evidence is very strong that the random MC playouts yield useful information despite the obvious shortcomings. For now I want to keep an open mind and allow for the possibility that AMAF does yield useful information as well. But I do foresee its usefulness is much more limited. For example, the most obvious way to improve MC playouts is to make the playouts smarter (and heavier) to avoid its pitfalls, like trying to avoid putting yourself into atari with a big group. To have a similar solution for AMAF's shortcomings is not so easy to see. So possibly the yield of AMAF becomes more and more limited with smarter playouts. It's definitely worth it to keep monitoring that. Just now I ran a test of a few thousand games of just plain UCT against UCT+AMAF using 2,000 simulations. The win-rate was 53.3% vs. 46.6% (± 0.9). So it seems AMAF made things a little worse, not better. But this was the first test-run after I finished implementing it, so I'm actually surprised it came this close. I'm sure I'll have a few bugs to fix before I can cast a final verdict. The other thing different is I now only expand the tree after 10 initial simulations whereas originally I always expanded immediately. It's possible that 2,000 playouts is a bit low to overcome the initial inefficiency of doing 10 simulations before expanding. Lastly there's is the weighting of the RAVE value. Right now I use: win/loss + uct-value + rave-value where the rave-value is the accumulation of all the weighted win/loss ratio where the weight decreases according to the depth in the sequence the move was encountered in the same way it's done in the original ref-bot. This may well be a little too braindead a formula and at some point I'll have to think a bit what to do there. What I'm going to do next is clean up my new implementation such that I can easily adjust two knobs. One controls the number of simulations before expansion and the other controls the weight of the rave-value in the formula above. In that case I should get exactly the same result as my original plain UCT implementation when I set both to zero. Only after that checks out will I start experimenting with other settings. Mark P.S. I got Don's javabot implementation here: http:// cgos.boardspace.net/public/javabot.zip Or you can get mine here: https://plug-and-go.dev.java.net/servlets/ ProjectProcess?tab=4stage=1 On 20-nov-08, at 23:34, Claus Reinke wrote: My first monte carlo programs used ownership info very effectively, but it can be that by using AMAF this information is used already. As a relative beginner in these matters, the more I look at AMAF, the less I like it, and I think that has to do with AMAF ignoring relative sequencing. By looking at all moves as if they were played first, it ignores that some moves only make sense after certain other moves. I haven't yet been able to pin this down with evidence, because the the game is lost, all moves are equally bad effect tends to obscure it (I guess I need to insert resign moves at the right places to limit the game records to the interesting pieces, even if that means my bot can't score the result). If I ignore that the Go board is 2d (dimensions, not dans;-), and think only of time (number of moves), space (board positions), and alternatives (simulation runs), then ownership maps project this 3d spacetime multiverse onto a 2d space histogram (likelihood of each position being black/ white) while AMAF tries to project these three dimensions onto a 2d time histogram (likelihood of each move resulting in win/lose). As the various tweaks (don't count moves by other player, don't count moves already played, value early moves higher than later moves) and remaining bot blunders demonstrate, that doesn't quite work right, in spite of first appearances. My (as yet unproven) hypothesis is that the issues stem from AMAF trying to ignore the time dimension of its projection, hence you get highly rated moves that place themselves into atari (because that would have been the last, necessary, move to capture the opponent group, *after* surrounding it), that would be suicide right now (but not at some later point in many playouts), or even on an occupied position (no longer occupied later in many playouts), that try to fill an opponent's eye (which would make sense only if the second, nearly complete eye were prevented
[computer-go] Selling a computer go program
Hi, I have read that the amount of money that a winning computer go program would make in a go tournament is insignificant compared to the amount of money that such a program would earn selling to the general public. I have also read that the biggest pirates of computer software come from Germany, the UK, and the US. The foreign exchange student we are hosting from Beijing China said that most people in China do not buy software, but download it for free off the net. So what is true? mike ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/