[computer-go] Call for participation: Next Computer Olympiad in Beijing
Dear Go programmers, The 13th Computer Olympiad (CO) will be held from September 28 to October 5 in Beijing, China. This event will be held together with the Conference on Computers and Games 2008 (CG 2008) (September 29 - October 1), and the 16th World Computer-Chess Championship (WCCC) (September 28 - October 5) Location: the Beijing Golden Century Golf club, Fangshan, Beijing, China. For more information: http://www.grappa.univ-lille3.fr/icga/event_info.php?id=18 I hope to meet you there, Guillaume ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] 10k UCT bots
I'm testing my bot on CGOS using pure UCT, no pondering, and 10,000 playouts per move. Can someone put up a comparable bot? A while back, someone else made a similar request, and I discovered that my bot had somehow broken. I've scoured for bugs and I believe I have a functional implementation. There are two hb-739-UCT-10k-# bots because an external library leaks memory and it's the easiest way to make cgosGtp restart the bot periodically. My bot may still be a bit weaker than 10k playouts would imply since it's slow and may be cutting back from time restrictions. Sent from my iPhone ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On May 13, 2008, at 7:25 AM, Jason House wrote: I'm testing my bot on CGOS using pure UCT, no pondering, and 10,000 playouts per move. Can someone put up a comparable bot? I will re-start 'myCtest-10k-UCT' later today. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On Tue, May 13, 2008 at 1:04 PM, Jason House [EMAIL PROTECTED] wrote: [,,,] I have a list of empty points. I pick one at random and then scan until I find a legal one. Others reduce the list size (swap to end?) and repick. What others do is the right thing to do. Your method will introduce some biases. I think libego used to do it your way but was then fixed. Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Christoph Birk wrote: On May 13, 2008, at 10:04 AM, Jason House wrote: On May 13, 2008, at 12:57 PM, Carter Cheng [EMAIL PROTECTED] wrote: I have a list of empty points. I pick one at random and then scan until I find a legal one. That's not random. Yes, it's not random at all. The points near the end of the list are much less likely to be chosen for instance. I have a list of empty points and I choose from this list at random and temporarily put away any non-legal choices and try again until I get a legal move. - Don Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On May 13, 2008, at 10:00 AM, Jason House wrote: On May 13, 2008, at 12:00 PM, David Fotland [EMAIL PROTECTED] games.com wrote: When you say pure uct, what is the playout policy? Pure random moves except don't fill one point eyes? That's exactly what I meant. I'd also assume other stuff like the UCB1 formula, no RAVE, and no initial move bias. I'm not opposed to other variants to see the effects, but I do want to ensure my implementation is correct. 'my-Ctest-10k-UCT' uses none of these, except guiding the search by UCT. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
It's not clear how bad Jason's method is however. The points near the end of the list are LESS likely to be chosen but probably not much less likely and this method is probably pretty fast.I wonder how bad it really is? The point after an illegal move is quite a bit more likely to be selected. If the list had just 1 illegal point, then the point after it in the list is twice as likely to be selected as any other point. Perhaps if you added a little noise after each move (perhaps swapping to points at random) you could minimize any seriously bad effects? - Don Don Dailey wrote: Christoph Birk wrote: On May 13, 2008, at 10:04 AM, Jason House wrote: On May 13, 2008, at 12:57 PM, Carter Cheng [EMAIL PROTECTED] wrote: I have a list of empty points. I pick one at random and then scan until I find a legal one. That's not random. Yes, it's not random at all. The points near the end of the list are much less likely to be chosen for instance. I have a list of empty points and I choose from this list at random and temporarily put away any non-legal choices and try again until I get a legal move. - Don Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On Tue, May 13, 2008 at 1:51 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. This has been explained already. After the first eye appears on the board, the first empty point after the eye has a probability of being picked that is twice the probability of any other empty point. What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? You move it to the end and you reduce the number of candidates by one. The code is roughly this: int pick(Move *empties, int num_empties) { int num_candidates = num_empties; int picked; do { picked = rand()%num_candidates; num_candidates--; } while(!acceptable(empties[picked]); return picked; } Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Hi Mark, Did you read my last email post? Using Jason's method, the point immediately AFTER an illegal point (perhaps an eye space) is TWICE as likely to be selected because you are scanning sequentially forward. Hitting on either point is going to lead to the same move selection. Occasionally you could get 2 or 3 or more illegal points in a row, especially near the end of the game. In such a case, you could be 3 or more times as likely to get the first point immediately after this sequence selected as your move choice. If your empty list is actually scrambled after every move, I think this is uniformly random except that it would be expensive. - Don Mark Boon wrote: On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Ooops! I hit sent before I finished writing the pseudo code. Sorry. int pick(Move *empties, int num_empties) { int num_candidates = num_empties; int picked; while(1) { picked = rand()%num_candidates; if(!acceptable(empties[picked])) { num_candidates--; swap(empties[picked],empties[num_candidates]); } else break; } return picked; } On Tue, May 13, 2008 at 1:57 PM, Álvaro Begué [EMAIL PROTECTED] wrote: On Tue, May 13, 2008 at 1:51 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. This has been explained already. After the first eye appears on the board, the first empty point after the eye has a probability of being picked that is twice the probability of any other empty point. What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? You move it to the end and you reduce the number of candidates by one. The code is roughly this: int pick(Move *empties, int num_empties) { int num_candidates = num_empties; int picked; do { picked = rand()%num_candidates; num_candidates--; } while(!acceptable(empties[picked]); return picked; } Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On 13-mei-08, at 14:15, Don Dailey wrote: Yes, it's not random at all. The points near the end of the list are much less likely to be chosen for instance. OK, I'm not very good at statistics, but I don't see how the last points are much less likely to be picked. At best they are a little less likely to be picked but I actually don't see that either. I would like to see some logical argument supporting that claim. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On May 13, 2008, at 1:51 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. And I was thinking let's not repeat this topic... The probability of picking a point is proportional to 1 + number of illegal points before it. In practice, illegal moves are pretty rare until endgame. At that point, it's a trade off between bias and speed. Random number generation is not cheap. I have yet to see empirical proof that the pick and scan is bad. I've tried both methods in the past and saw no measurable difference in strength. What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? The range of the random number is reduced by one after each failed lookup. Shuffled data has no impact on future use of the array of empty points. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On 13-mei-08, at 15:08, Jason House wrote: The range of the random number is reduced by one after each failed lookup. Shuffled data has no impact on future use of the array of empty points. OK, I understand now why a point at the end (or beginning) is a little less likely to be picked. Although I still have doubts whether that will lead to a noticable bias, I'll try to think about it. I would imagine moving an illegal point towards the end and only start including it when the other 'legal' moves run out can lead to terrible bias however because they may not remain illegal for very long and actually become important points to play. A ko-point is probably the most extreme example of that. Anyway, I don't bother to order the empty-point-list or scramble them in any way prior to the game. So the first point is the 1-1 point and the last is the 19-19 point (or whatever boardsize you're playing) so I have no qualms about those moves being a little less likely to be played. Or even a lot less. I think it would actually be beneficial. If this asymmetry really bothers you, you could very easily fix this by wrapping the search around. There's no asymmetry in a circle. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Mark Boon wrote: On 13-mei-08, at 15:08, Jason House wrote: The range of the random number is reduced by one after each failed lookup. Shuffled data has no impact on future use of the array of empty points. OK, I understand now why a point at the end (or beginning) is a little less likely to be picked. Although I still have doubts whether that will lead to a noticable bias, I'll try to think about it. I don't believe there is any bias based on where a good point is, only where the bad points are. Because you can think of this list of empty points as a circular array, where no particular location is any more significant than any other. I would imagine moving an illegal point towards the end and only start including it when the other 'legal' moves run out can lead to terrible bias however because they may not remain illegal for very long and actually become important points to play. A ko-point is probably the most extreme example of that. I had a bug in my program that may have been an actual improvement. I did not always resurrect empty points that were eyes. I went ahead and fixed this and didn't try to assess whether it was a good bug or not but I didn't notice any improvement. The net effect was that if a move was an eye move that you shouldn't move to, and the opponent moved to a diagonal, the program still would not move to it. That's probably sound in some cases and not in others. Anyway, I don't bother to order the empty-point-list or scramble them in any way prior to the game. So the first point is the 1-1 point and the last is the 19-19 point (or whatever boardsize you're playing) so I have no qualms about those moves being a little less likely to be played. Or even a lot less. I think it would actually be beneficial. If this asymmetry really bothers you, you could very easily fix this by wrapping the search around. There's no asymmetry in a circle. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Don Dailey wrote: Jason House wrote: On May 13, 2008, at 1:51 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. And I was thinking let's not repeat this topic... The probability of picking a point is proportional to 1 + number of illegal points before it. In practice, illegal moves are pretty rare until endgame. At that point, it's a trade off between bias and speed. Random number generation is not cheap. I have yet to see empirical proof that the pick and scan is bad. I've tried both methods in the past and saw no measurable difference in strength. I may perhaps take a look at this myself.I think there is a way to deal with this issue and get the best of both worlds or at least approach it. If this method does not degrade the quality, it's a moot point. Otherwise, here is an idea: 1. For each move list size, pre-compute N tables of move list traversal orderings. 2. At move selection time alternative between them. So you would have something in the C language like: int *order[S][C] I forget to mention that S is the move list size (number of empty points you are interested in) and C is the number of list orderings you want to alternate between. And a list is returns that is of size S-1 (since your first move is selected by calling the random number generater.) A number like 8 or 16 is good (a power of two) because each time you using the list to select a move, you can increment a counter so that you use a different move list ordering the next time you are confronted with a move list of this same exact size. Your first move is selected randomly, then from then on you use the array returned to form an offset from this. This would pay off if you really wanted something very close to uniform random and your random number generator was clearly expensive. I think there is a great deal of chaos that would make this almost perfectly uniform. The first move would be selected as you already do, using the random number generator but after this you traverse the list in random order as specified by this pre-computed table. Even though the bias is still there for any given move, it's basically non-predictable. On the very next move you will using a different table which you choose from in round robin fashion. Don't know how flawed this idea is. I believe it is would come close to uniformly random but wouldn't be perfect. There would be a minor bias when measured over a lot of games, it would probably occur that a certain move was being chosen slightly more often, but nothing like the bias in the current method. Is it worth it? Probably not, but just a crazy idea. Likely there are simpler ways to eliminate most of the bias while still minimizing the number of random move generations. - Don What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? The range of the random number is reduced by one after each failed lookup. Shuffled data has no impact on future use of the array of empty points. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
If this asymmetry really bothers you, you could very easily fix this by wrapping the search around. There's no asymmetry in a circle. That doesn't fix anything. Why not? The whole argument is about a bias against points towards the end. In a circular list there is no 'end'. I missed this from you. I assumed that you did this anyway. If you choose a random point and then traverse linearly to the end, what do you do when you reach the end? Do you just pass?I assumed you viewed the empty point list as a circular queue. - Don Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
I don't care much about it being noticeable. This thread is about putting bots on CGOS that use a reproducible algorithm, to help people detect bugs in their implementations. As part of specifying what these bots do, we should all pick the next move in a playout using the same criteria. If we agree to use uniform distribution among empty non-eyeish points, that's what should be implemented. I agree with this. If we get too clever we lose confidence that we are comparing apples to apples by risking bugs and making assumptions that may not hold. I take an opposite approach: the data will tell us when we are comparing apples and oranges. It will help us identify the major factors leading to strength. There's probably more subtle variation than we realize... But when someone is explicitly asking for a direct comparison, you don't want to just throw in your own magic and then try to sort it out later. You are probably looking for implementation bugs, not implementation differences. That has it's place but not in this context. - Don - Don I would imagine moving an illegal point towards the end and only start including it when the other 'legal' moves run out can lead to terrible bias however because they may not remain illegal for very long and actually become important points to play. A ko-point is probably the most extreme example of that. I don't think you understood the algorithm. The eyeish point is removed from the lottery only for picking this particular move, not for the rest of the playout. Anyway, I don't bother to order the empty-point-list or scramble them in any way prior to the game. So the first point is the 1-1 point and the last is the 19-19 point (or whatever boardsize you're playing) so I have no qualms about those moves being a little less likely to be played. Or even a lot less. I think it would actually be beneficial. Reproducibility was the point, not strength of the bot. If this asymmetry really bothers you, you could very easily fix this by wrapping the search around. There's no asymmetry in a circle. That doesn't fix anything. Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On Tue, May 13, 2008 at 3:08 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 15:44, Álvaro Begué wrote: On Tue, May 13, 2008 at 2:28 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 15:08, Jason House wrote: The range of the random number is reduced by one after each failed lookup. Shuffled data has no impact on future use of the array of empty points. OK, I understand now why a point at the end (or beginning) is a little less likely to be picked. Although I still have doubts whether that will lead to a noticable bias, I'll try to think about it. I don't care much about it being noticeable. This thread is about putting bots on CGOS that use a reproducible algorithm, to help people detect bugs in their implementations. As part of specifying what these bots do, we should all pick the next move in a playout using the same criteria. If we agree to use uniform distribution among empty non-eyeish points, that's what should be implemented. Indeed it seems I misunderstood. I thought the general algorithm needed to be the same, not necessarily the exact implementation. Why not publish some pseudo-code with the exact statements then? Seems a bit less prone to errors then by talking about it. No, we don't need to all use the same implementations. You can implement uniformly picking a random non-eyeish point any way you like. I actually don't use the method we've talked about. The problem with the other method is that it implements something else. I would imagine moving an illegal point towards the end and only start including it when the other 'legal' moves run out can lead to terrible bias however because they may not remain illegal for very long and actually become important points to play. A ko-point is probably the most extreme example of that. I don't think you understood the algorithm. The eyeish point is removed from the lottery only for picking this particular move, not for the rest of the playout. Yes, again I misunderstood. So you do another random lookup each time you hit an illegal point? That could turn out very expensive, especially by the end of the game. You can try it. It's actually not that expensive. A long time ago I modified libego to use this method and it didn't slow it down much (sorry, I can't remember the fraction of speed lost). Anyway, I don't bother to order the empty-point-list or scramble them in any way prior to the game. So the first point is the 1-1 point and the last is the 19-19 point (or whatever boardsize you're playing) so I have no qualms about those moves being a little less likely to be played. Or even a lot less. I think it would actually be beneficial. Reproducibility was the point, not strength of the bot. I'd say, share the source if that's the objective. I doubt you'll get real reproducibility any other way. Why not? The only things we haven't clearly specified are the UCT formula to use and how we end playouts (although there is only one natural choice for this last part, I believe). If this asymmetry really bothers you, you could very easily fix this by wrapping the search around. There's no asymmetry in a circle. That doesn't fix anything. Why not? The whole argument is about a bias against points towards the end. In a circular list there is no 'end'. No, the whole argument is about bias towards points that happen to be immediately after eyeish or illegal points. Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On Tue, 13 May 2008, Mark Boon wrote: If this asymmetry really bothers you, you could very easily fix this by wrapping the search around. There's no asymmetry in a circle. That doesn't fix anything. Why not? The whole argument is about a bias against points towards the end. In a circular list there is no 'end'. No, it was a bias towards moves behind illegal moves. Those moves are twice as likely to be played than other moves. Consider a list with 5 moves: [Move1] [Move2] [Move3] [Move4] [Move5] You create a random number between 1 and 5. If Move2 is illegeal for example, then you will play Move1 if random#=1 Move3 if random#=2 or 3, Move4 =4 Move5 =5 Move3 is twice as likely to be played. Even if you make a circular list. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On 13-mei-08, at 16:17, Don Dailey wrote: I missed this from you. I assumed that you did this anyway. If you choose a random point and then traverse linearly to the end, what do you do when you reach the end? Do you just pass?I assumed you viewed the empty point list as a circular queue. No I didn't. When I reach the end I traverse in the opposite direction, starting from the originally chosen location. Using a circular list may actually be simpler. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Mark Boon wrote: On 13-mei-08, at 16:17, Don Dailey wrote: I missed this from you. I assumed that you did this anyway. If you choose a random point and then traverse linearly to the end, what do you do when you reach the end? Do you just pass?I assumed you viewed the empty point list as a circular queue. No I didn't. When I reach the end I traverse in the opposite direction, starting from the originally chosen location. Using a circular list may actually be simpler. It's probably simpler. You have to choose between using branching or division: nextLoc = curLoc % LIST_SIZE; or if (nextLoc MAX_ELEMENT) nextLoc = firstLoc; - Don Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
For those currently coding this up, I think the most important thing about this playout algorithm is that it is *temporary*. You will almost certainly be?replacing it with something different and better just a little bit down the road. Creating an MC-UCT bot has a well worn path and its kind of an ontology recapitulates phylogeny deal. First you implement light playouts (the random non-eyefilling moves people are describing), then you implement UCT, then you throw away the light playouts and replace them?with heavy playouts, then you start extreme modifications to UCT... So you probably don't want to worry about hair-splitting tweaks except as an academic exercise. - Dave Hillis -Original Message- From: Christoph Birk [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Tue, 13 May 2008 3:40 pm Subject: Re: [computer-go] 10k UCT bots On Tue, 13 May 2008, Mark Boon wrote:? If this asymmetry really bothers you, you could very easily fix this by? wrapping the search around. There's no asymmetry in a circle.? That doesn't fix anything.? ? Why not? The whole argument is about a bias against points towards the end. In a circular list there is no 'end'.? ? No, it was a bias towards moves behind illegal moves.? Those moves are twice as likely to be played than other moves. Consider a list with 5 moves:? ? [Move1] [Move2] [Move3] [Move4] [Move5]? ? You create a random number between 1 and 5. If Move2 is illegeal? for example, then you will play? ?Move1 if random#=1? ?Move3 if random#=2 or 3,? ?Move4 =4? ?Move5 =5? ? Move3 is twice as likely to be played. Even if you make a circular? list.? ? Christoph? ? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] a few more questions
Thanks for all the comments so far. Hopefully you don't mind a few more questions. 1) Do UCT bots check for atari and urgency? my understanding was that first generation Mogo did this to some extent IIRC. I am curious if anyone does this it seems like it might be important but so far I cannot think of a fast detection scheme. 2) When generating random variables for the case where the values of placing a stone on different points on the board are different. Are there good ways to throw and determine which point has been selected for the next move by the random player? Thanks again, Carter. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
[EMAIL PROTECTED] wrote: For those currently coding this up, I think the most important thing about this playout algorithm is that it is *temporary*. You will almost certainly be?replacing it with something different and better just a little bit down the road. Creating an MC-UCT bot has a well worn path and its kind of an ontology recapitulates phylogeny deal. First you implement light playouts (the random non-eyefilling moves people are describing), then you implement UCT, then you throw away the light playouts and replace them?with heavy playouts, then you start extreme modifications to UCT... So you probably don't want to worry about hair-splitting tweaks except as an academic exercise. Yes, I agree. Also my hair brained scheme of pre-generated tables of list traversal orderings was just an academic exercise as you say. - Don - Dave Hillis -Original Message- From: Christoph Birk [EMAIL PROTECTED] To: computer-go computer-go@computer-go.org Sent: Tue, 13 May 2008 3:40 pm Subject: Re: [computer-go] 10k UCT bots On Tue, 13 May 2008, Mark Boon wrote:? If this asymmetry really bothers you, you could very easily fix this by? wrapping the search around. There's no asymmetry in a circle.? That doesn't fix anything.? ? Why not? The whole argument is about a bias against points towards the end. In a circular list there is no 'end'.? ? No, it was a bias towards moves behind illegal moves.? Those moves are twice as likely to be played than other moves. Consider a list with 5 moves:? ? [Move1] [Move2] [Move3] [Move4] [Move5]? ? You create a random number between 1 and 5. If Move2 is illegeal? for example, then you will play? ?Move1 if random#=1? ?Move3 if random#=2 or 3,? ?Move4 =4? ?Move5 =5? ? Move3 is twice as likely to be played. Even if you make a circular? list.? ? Christoph? ? ___? computer-go mailing list? [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
On Tue, May 13, 2008 at 01:34:37PM -0400, Don Dailey wrote: The point after an illegal move is quite a bit more likely to be selected. If the list had just 1 illegal point, then the point after it in the list is twice as likely to be selected as any other point. Perhaps if you added a little noise after each move (perhaps swapping to points at random) you could minimize any seriously bad effects? I think the problem becomes more pronounced near the end of the simulation, when most of the boars is filled with stones or 1-point eyes. If, at that time, there are two legal moves left, one near the beginning of the list, and one near the end, then the second one can end up as much more likely to be chosen. And if the first one is the key point to the position, it may never even be noticed. No idea how likely this scenario is in real games, but at least it is possible. And if your search tree is large enough, once-in-a-thousand situations occur all the time... -H -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] a few more questions
Álvaro Begué wrote: On Tue, May 13, 2008 at 4:22 PM, Carter Cheng [EMAIL PROTECTED] wrote: 2) When generating random variables for the case where the values of placing a stone on different points on the board are different. Are there good ways to throw and determine which point has been selected for the next move by the random player? I can answer this. The best way I know of getting a random point with different weights (proportional to probabilities) for each point is to keep a sum of weights per row and a total sum of weights for the entire board. These are easy to update dynamically when an individual weight changes. In order to pick a point, start with a random number between 0 and the total sum. Go down row by row subtracting the weight of each row, until you would get a negative number. Then walk the row subtracting the weight of each point until you would get a negative number. The place where you stop is the selected point. This method may not be very robust if you use floating-point numbers to represent weights (because you can't rely on associativity of addition), but it works fine if you use an integer type. If the description is not good enough, I can write some C++ code for you. Or you can read the source code of GNU Go 3.7.12, lines 1629-1658 of engine/montecarlo.c. The only difference is that GNU Go doesn't bother about actual rows but instead makes a two level structure where it uses the N lowest bits of the board coordinate to determine a partition number. N varies from 1 to 4 depending on the maximum board size GNU Go is compiled for (3 for 9x9, 4 for 19x19). And I agree, don't even think of doing this with floating point numbers. /Gunnar ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] a few more questions
This could be extended rather easily to an n-ary tree. With 9x9 a natural choice is 3, but unfortunately 19 is prime. It's basically a tradeoff between how many adds and how many compares you want to do. I suppose you would do one update for every pick (unless you pick an illegal point and want to remain uniformly random, hehe). So it seems that it would be best to make n as small as possible, at least theoretically... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] a few more questions
On Tue, May 13, 2008 at 7:08 PM, Gunnar Farnebäck [EMAIL PROTECTED] wrote: And I agree, don't even think of doing this with floating point numbers. This is a bit tangential to computer go. But you have piqued my curiosity Offhand, that sounds rather extreme to me. Perhaps I haven't really thought this through completely, but it seems as if there is a simple modification to the stated algorithm, that would ensure that it works fine with floating point numbers. Or at least well enough for any conceivable purpose: Make sure that you select each random number from a slightly larger range than prescribed by the exact totals. (For example, always round up while calculating the totals. If you do not have control over the rounding mode, just add on some small fraction of the largest weight.) Then, in the event that subtracting the weights of the points/rows/buckets never reduced the selected number to a non-positive value, just select a new random number and start over. Would this work fine, or is there some flaw in my thinking that Álvaro and Gunnar have already considered? Weston ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] a few more questions
On Tue, May 13, 2008 at 8:10 PM, Weston Markham [EMAIL PROTECTED] wrote: On Tue, May 13, 2008 at 7:08 PM, Gunnar Farnebäck [EMAIL PROTECTED] wrote: And I agree, don't even think of doing this with floating point numbers. This is a bit tangential to computer go. But you have piqued my curiosity Offhand, that sounds rather extreme to me. Perhaps I haven't really thought this through completely, but it seems as if there is a simple modification to the stated algorithm, that would ensure that it works fine with floating point numbers. Or at least well enough for any conceivable purpose: Make sure that you select each random number from a slightly larger range than prescribed by the exact totals. (For example, always round up while calculating the totals. If you do not have control over the rounding mode, just add on some small fraction of the largest weight.) Then, in the event that subtracting the weights of the points/rows/buckets never reduced the selected number to a non-positive value, just select a new random number and start over. Would this work fine, or is there some flaw in my thinking that Álvaro and Gunnar have already considered? John Tromp and I spent quite a bit of time patching the floating-point implementation and hunting down the subsequent bugs. I am not saying that making it robust is impossible, but I ended up so frustrated that I don't think I will ever try again. Integers are much better behaved and are sufficient for everything we wanted to do. As to the optimal branching factor for the tree (assuming there are many many leaves), I believe it's n=3, if all you care about is picking random points fast. Outline of the proof: On a particular level of the tree you will make about n/2 comparisons before you pick your branch and descend. The number of leaves is l = pow(n,k), so the number of levels is k = log(l)/log(n). The total number of comparisons will be something like (log(l)/2)*(n/log(n)). The first part is fixed, so we need to minimize n/log(n). That function reaches its minimum at n=e, but unfortunately n has to be an integer value. Among the integers, n=3 is the best, with n=2 and n=4 tied at as close seconds. Of course, in practice you will also need to do updates to the weights (actually more often than picking random points), so the number of levels in the tree becomes more relevant, which favors larger values of n. My original implementation had n=2 and the tree was beautifully implemented as an array, in the same style as the usual implementation of a heap. However, Rémi pointer out that cheaper updates were important. I think the two-level solution using rows is easy to implement and probably close to optimal. Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 10k UCT bots
Álvaro Begué: [EMAIL PROTECTED]: Ooops! I hit sent before I finished writing the pseudo code. Sorry. int pick(Move *empties, int num_empties) { int num_candidates = num_empties; int picked; while(1) { picked = rand()%num_candidates; This code introduces few bias unless num_candidates is a power of two, though. #Assuming rand() returns [0 .. power of two). -Hideki if(!acceptable(empties[picked])) { num_candidates--; swap(empties[picked],empties[num_candidates]); } else break; } return picked; } On Tue, May 13, 2008 at 1:57 PM, Álvaro Begué [EMAIL PROTECTED] wrote: On Tue, May 13, 2008 at 1:51 PM, Mark Boon [EMAIL PROTECTED] wrote: On 13-mei-08, at 14:10, Álvaro Begué wrote: What others do is the right thing to do. Your method will introduce some biases. Could you elaborate what bias it could lead to? I also do the same as Jason. I did consider the possibility of a bias but couldn't immediately think of one. This has been explained already. After the first eye appears on the board, the first empty point after the eye has a probability of being picked that is twice the probability of any other empty point. What good does moving it to the end of the list do? Next time around it's just as likely to be picked as when left in place. Or do you only process the 'end' after the 'front' moves are all tried? You move it to the end and you reduce the number of candidates by one. The code is roughly this: int pick(Move *empties, int num_empties) { int num_candidates = num_empties; int picked; do { picked = rand()%num_candidates; num_candidates--; } while(!acceptable(empties[picked]); return picked; } Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/