Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
I could be drawing wrong inferences from incomplete information, but as Darren pointed out, this paper does leave the impression Alpha Zero is not as strong as the real AlphaGo Zero, in which case it would be clearer to say so explicitly. Of course the chess and shogi results are impressive regardless. (In chess, the 28/100 wins is good, but 0 losses is even better. Entering a drawn sequence starting from an inferior position -- such as playing black -- is a desirable result for even a perfect program without contempt, so failing to win as black is not a good indicator of strength.) Comparing the Elo charts in this new paper and the Nature paper on AlphaGo Zero, and assigning AlphaGo Lee a reference rating of 0 Elo, it appears that the order in strength of go play is Alpha Zero (~900 Elo), AlphaGo Master (~1400 Elo), then the full-strength AlphaGo Zero (~1500 Elo). I would also think Alpha Zero's 8 hours of training with the help of an immense network of 5,000 first generation TPUs is more expensive, and only faster in a strictly chronological sense, than AlphaGo Zero 20-block 3-day's training with 4 second generation TPUs. On Wed, Dec 6, 2017 at 4:29 PM, Brian Sheppard via Computer-go < computer-go@computer-go.org> wrote: > The chess result is 64-36: a 100 rating point edge! I think the Stockfish > open source project improved Stockfish by ~20 rating points in the last > year. Given the number of people/computers involved, Stockfish’s annual > effort level seems comparable to the AZ effort. > > > > Stockfish is really, really tweaked out to do exactly what it does. It is > very hard to improve anything about Stockfish. To be clear: I am not > disparaging the code or people or project in any way. The code is great, > people are great, project is great. It is really easy to work on Stockfish, > but very hard to make progress given the extraordinarily fine balance of > resources that already exists. I tried hard for about 6 months last year > without any successes. I tried dozens (maybe 100?) experiments, including > several that were motivated by automated tuning or automated searching for > opportunities. No luck. > > > > AZ would dominate the current TCEC. Stockfish didn’t lose a game in the > semi-final, failing to make the final because of too many draws against the > weaker players. > > > > The Stockfish team will have some self-examination going forward for sure. > I wonder what they will decide to do. > > > > I hope this isn’t the last we see of these DeepMind programs. > > > > *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On > Behalf Of *Richard Lorentz > *Sent:* Wednesday, December 6, 2017 12:50 PM > *To:* computer-go@computer-go.org > *Subject:* Re: [Computer-go] Mastering Chess and Shogi by Self-Play with > a General Reinforcement Learning Algorithm > > > > One chess result stood out for me, namely, just how much easier it was for > AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with > black (3 wins, 47 draws, 0 losses). > > Maybe we should not give up on the idea of White to play and win in chess! > > On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote: > > Hi, > > DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero > method. > > Mastering Chess and Shogi by Self-Play with a General Reinforcement > Learning Algorithm > https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv. > org_pdf_1712.01815.pdf=DwIGaQ=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp- > j14fFwh71U=i0hg-cKH69CA5MsdosvezQ=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlN > u40BOCWfsO-gQ=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo= > > AlphaZero(Chess) outperformed Stockfish after 4 hours, > AlphaZero(Shogi) outperformed elmo after 2 hours. > > Search is MCTS. > AlphaZero(Chess) searches 80,000 positions/sec. > Stockfishsearches 70,000,000 positions/sec. > AlphaZero(Shogi) searches 40,000 positions/sec. > elmo searches 35,000,000 positions/sec. > > Thanks, > Hiroshi Yamashita > > ___ > Computer-go mailing list > Computer-go@computer-go.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__ > computer-2Dgo.org_mailman_listinfo_computer-2Dgo=DwIGaQ=Oo8bPJf7k7r_ > cPTz1JF7vEiFxvFRfQtp-j14fFwh71U=i0hg-cKH69CA5MsdosvezQ= > w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ= > Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY= > > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Learning related stuff
Could you be reading too much into my comment? AlphaGo Zero is an amazing achievement, and I might guess its programmers will succeed in applying their methods to other fields. Nonetheless, I thought it was interesting, and it would appear the programmers did too, that before improving to superhuman level, AlphaGo was temporarily stuck in a rut of playing literally the worst first move on the board (excluding pass). That doesn't mean I think I could do better. On Tue, Nov 28, 2017 at 4:50 AM, uurtamo . <uurt...@gmail.com> wrote: > This is starting to feel like asking along the lines of, "how can I > explain this to myself or improve on what's already been done in a way that > will make this whole process work faster on my hardware". > > It really doesn't look like there are a bunch of obvious shortcuts. That's > the whole point of decision-trees imposed by humans for 20+ years on the > game; it wasn't really better. > > Probably what would be good to convince oneself of these things would be > to challenge each assumption in divergent branches (suggested earlier) and > watch the resulting players' strength over time. Yes, this might take a > year or more on your hardware. > > I feel like maybe a lot of this is sour grapes; let's please again > acknowledge that the hobbyists aren't there yet without trying to tear down > the accomplishments of others. > > s. > > On Nov 27, 2017 7:36 PM, "Eric Boesch" <ericboe...@gmail.com> wrote: > >> I imagine implementation determines whether transferred knowledge is >> helpful. It's like asking whether forgetting is a problem -- it often is, >> but evidently not for AlphaGo Zero. >> >> One crude way to encourage stability is to include an explicit or >> implicit age parameter that forces the program to perform smaller >> modifications to its state during later stages. If the parameters you copy >> from problem A to problem B also include that age parameter, so the network >> acts old even though it is faced with a new problem, then its initial >> exploration may be inefficient. For an MCTS based example, if a MCTS node >> is initialized to a 10877-6771 win/loss record based on evaluations under >> slightly different game rules, then with a naive implementation, even if >> the program discovers the right refutation under the new rules right away, >> it would still need to revisit that node thousands of times to convince >> itself the node is now probably a losing position. >> >> But unlearning bad plans in a reasonable time frame is already a feature >> you need from a good learning algorithm. Even AlphaGo almost fell into trap >> states; from their paper, it appears that it stuck with 1-1 as an opening >> move for much longer than you would expect from a program probably already >> much better than 40 kyu. Even if it's unrealistic for Go specifically, you >> could imagine some other game where after days of analysis, the program >> suddenly discovers a reliable trick that adds one point for white to every >> single game. The effect would be the same as your komi change -- a mature >> network now needs to adapt to a general shift in the final score. So the >> task of adapting to handle similar games may be similar to the task of >> adapting to analysis reversals within a single game, and improvements to >> one could lead to improvements to the other. >> >> >> >> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K <stephan.ku...@gmail.com> >> wrote: >> >>> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>: >>> > My understanding is that the AlphaGo hardware is standing >>> > somewhere in London, idle and waitung for new action... >>> > >>> > Ingo. >>> >>> The announcement at >>> https://deepmind.com/blog/applying-machine-learning-mammography/ seems >>> to disagree: >>> >>> "Our partners in this project wanted researchers at both DeepMind and >>> Google involved in this research so that the project could take >>> advantage of the AI expertise in both teams, as well as Google’s >>> supercomputing infrastructure - widely regarded as one of the best in >>> the world, and the same global infrastructure that powered DeepMind’s >>> victory over the world champion at the ancient game of Go." >>> ___ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Learning related stuff
I imagine implementation determines whether transferred knowledge is helpful. It's like asking whether forgetting is a problem -- it often is, but evidently not for AlphaGo Zero. One crude way to encourage stability is to include an explicit or implicit age parameter that forces the program to perform smaller modifications to its state during later stages. If the parameters you copy from problem A to problem B also include that age parameter, so the network acts old even though it is faced with a new problem, then its initial exploration may be inefficient. For an MCTS based example, if a MCTS node is initialized to a 10877-6771 win/loss record based on evaluations under slightly different game rules, then with a naive implementation, even if the program discovers the right refutation under the new rules right away, it would still need to revisit that node thousands of times to convince itself the node is now probably a losing position. But unlearning bad plans in a reasonable time frame is already a feature you need from a good learning algorithm. Even AlphaGo almost fell into trap states; from their paper, it appears that it stuck with 1-1 as an opening move for much longer than you would expect from a program probably already much better than 40 kyu. Even if it's unrealistic for Go specifically, you could imagine some other game where after days of analysis, the program suddenly discovers a reliable trick that adds one point for white to every single game. The effect would be the same as your komi change -- a mature network now needs to adapt to a general shift in the final score. So the task of adapting to handle similar games may be similar to the task of adapting to analysis reversals within a single game, and improvements to one could lead to improvements to the other. On Fri, Nov 24, 2017 at 7:54 AM, Stephan Kwrote: > 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>: > > My understanding is that the AlphaGo hardware is standing > > somewhere in London, idle and waitung for new action... > > > > Ingo. > > The announcement at > https://deepmind.com/blog/applying-machine-learning-mammography/ seems > to disagree: > > "Our partners in this project wanted researchers at both DeepMind and > Google involved in this research so that the project could take > advantage of the AI expertise in both teams, as well as Google’s > supercomputing infrastructure - widely regarded as one of the best in > the world, and the same global infrastructure that powered DeepMind’s > victory over the world champion at the ancient game of Go." > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [computer-go] Kinds of Zobrist hashes
On Tue, Dec 8, 2009 at 5:49 PM, Petr Baudis pa...@ucw.cz wrote: Hi! In most papers I've read, three-valued Zobrist hashes are used - per intersection, values for empty, black and white coloring [1]. I'm not clear on why is the empty value needed, however, shouldn't only black, white values work just fine? Is the hash behaving better with extra values for empty intersections? Has anyone measured it? You can mathematically prove the two systems are almost the same, so there's no need to test. For simplicity, suppose you're doing a 3-color hash of just hashing two intersections, and your Zobrist values are b1, w1, e1, b2, w2, and e2 (where e.g. e1 means the first intersection is empty and b2 means the second intersection is black). This is equiavlent to using the Zobrist values b1' = (b1 xor e1), w1' = (w1 xor e1), e1' = 0, b2' = (b2 xor e2), w2' = (w2 xor e2), e2' = 0, and then when you're finished, xoring the result with (e1 xor e2). So a 3-color Zobrist with one set of constants is equivalent to a 2-color Zobrist (and zero for empty) with a different set of constants, plus a constant. Assuming your Zobrist values are random, the random 3-color Zobrist is equivalent to a random 2-color Zobrist (failing to hash the empty squares) plus a random constant. [1] In pattern-matching, it is desirable to also use edge coloring. Thanks, Petr Pasky Baudis ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MPI vs Thread-safe
On Fri, Oct 30, 2009 at 12:50 PM, Brian Sheppard sheppar...@aol.com wrote: Parallelization *cannot* provide super-linear speed-up. I don't see that at all. This is standard computer science stuff, true of all parallel programs and not just Go players. No parallel program can be better than N times a serial version. The result follows from a simulation argument. Suppose that you had a parallel process that performed better than N times a serial program. Construct a new serial program that simulates the parallel process. There is a contradiction. Technically, this argument only establishes the fact up to multiplicative constant. But in the case of parallel Go players, I cannot accept that simulating a parallel process using serial code would result in a slowdown. (If anything, serialization would be faster.) At the risk of belaboring the obvious, extra memory associated with each processor or node (cache, main memory, hard disk, whatever) is one reason adding nodes can in practice give a speedup greater than the increase in processing power. It may not be a typical result, but it's not rare, either. It's like when you have two processes that hum along when each given their own machine, but when you try to run them both on the same machine, the system thrashes and runs ten times slower. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] First ever win of a computer against a pro 9P as black (game of Go, 9x9).
Am I remembering correctly (maybe not) that Mogo communicates between nodes three times per second? That isn't a lot of communication opportunities if each turn lasts a few seconds. Olivier, have you tested parallel Mogo's ability to scale with core count at blitz speeds? I might imagine, for example, playing a series against itself with pondering turned off and one side playing blitz with 100 cores, and the other side playing with 10 cores each given 5 times as much thinking time. As others have said, congratulations... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Complicated seki with Ko
Not sure this helps, but... I found send-two receive-one, as in your example, to be the most common non-simple ko cycle in my program. The program was heavy, so forbidding send two added little to the total cost. Specifically, the rule forbade playing on an isolated empty intersection that neighbored a lone friendly stone that had started with two liberties. With light playouts, your mileage will probably vary. The reason I bothered with the prohibition is that I didn't properly handle graph-history interaction. Forbidding send-two greatly reduced GHI problems (simple ko was handled by counting the ko ban point, if any, as part of the position). In my program, not only did send-two receive-one outnumber all other long cycles combined four to one, but it happened that even when the longer cycles did occur, they rarely led to erroneous output, which was not the case for send-two receive-one. Double ko and triple ko seem more prominent in real games because they affect best play lines. Unbalanced ko cycles are usually very stupid plays for one side or the other, so people don't think about them very much. For a program dumb enough to be trapped by it, send-two receive-one can be important. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] April KGS bot tournament: results
On Mon, Apr 6, 2009 at 11:32 AM, Jason House jason.james.ho...@gmail.com wrote: I disagree with your comment about AyaMC's move 28 in round 4. The move looks to me like it primarilly aims to build a large framework along the bottom/center. All basic fights seem to favor white.. I'm only KGS 3k, so take my comments with a grain of salt... (Idle kibitzing, no real go programming content...) Yeah, I'm only KGS 1k so I'm not sure either, but I kind of agree. I think that at this point killing the stones would have been smaller than Aya's move 28, but maybe locally someplace else, like the spot immediately down and right of the chosen move would have been even better -- or not. I'm positive 28 is either a bad half-measure all around (to quote Lessons in the Fundamentals of Go) or a good multi-purpose move, I just don't understand the tactics well enough to confidently say which one it is. Much easier to say the opening was unusual but decent for kyu level and to point to the obvious tactical mistakes that came later, like the broken ladder or the Oh move where Aya connected two stones to the killable top group instead of saving the group (both sides could be criticized there, one for bothering to attack two unimportant stones and the other for bothering to defend them, when the life of the whole group was unsettled -- presumably this is the old seki-reading problem that many programs have). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] CGT approximating the sum of subgames
On Tue, Feb 17, 2009 at 4:39 PM, dave.de...@planet.nl wrote: I've been looking into CGT lately and I stumbled on some articles about approximating strategies for determining the sum of subgames (Thermostrat, MixedStrat, HotStrat etc.) It is not clear to me why approximating strategies are needed. What is the problem? Is Ko the problem? Is an exact computation too slow? Can anyone shed some light on this? I'm sure someone else could give a better answer, but it does come down to slowness. Same thing as assuming the value of a gote move equals the position value if black moves first averaged with its value if white moves first, and the game score equals original score plus half the value of the largest move on the board -- these assumptions are wrong, and they estimates are not guaranteed to yield either the correct score or the biggest play on the board, but you have to do something if you can't perfectly read out the rest of the game. If CGT values were all nice real numbers that summed normally when combining subgames, then CGT would be a lot simpler. The possibility of wanting to play a ko threat in another part of the board prevents the two areas from being separate subgames in the technical sense -- the information sets are shared. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Human Learning against MoGo
An amateur 5D also beat Mogo with 3 handicap. I would love to see more serious games between top programs and roughly evenly matched human opponents. On Sun, Feb 15, 2009 at 8:08 AM, Ingo Althöfer 3-hirn-ver...@gmx.de wrote: Hello, During the last week (February 10 - 13, 2009) there were several exhibition games between program MoGo and some professional go players from Taiwan (Jun-Xun Zhou 9p; Li-Chen Chien (12 years old) 1p; Shih Chin 2p). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 7.5-komi for 9x9 in Beijing
On Wed, Oct 8, 2008 at 3:46 PM, Don Dailey [EMAIL PROTECTED] wrote: Is there any way to prove that with best play the game cannot end in seki? It seems like most reasonable sequences in Chinese rules 4x4 go end in a whole-board seki. I would expect that for 19x19 go, some avenues of best play can also lead to seki, because the perfect-play game tree is likely to be immense, and seki is common in 19x19 games (the larger the board, the greater the number of groups that could possibly end in seki). I do not have any proof, and I think a proof either way would be extremely difficult to do. I think a proof would also be very difficult (nearly as bad as solving the empty board) for 9x9. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Test position set for MC programs
Huh is right. You wrote C2 C1 and so did I, but I somehow did so while looking at C1 C2. I'm not sure how I managed that myself. On Sat, Apr 26, 2008 at 5:24 PM, Gunnar Farnebäck [EMAIL PROTECTED] wrote: Huh? A seki doesn't get much more basic than this. 4 X X X X X . X . X 4 3 . O O O O X X O . 3 2 O O O . O O X . . 2 1 . X X X O X X . . 1 A B C D E F G H J /Gunnar Eric Boesch wrote: Black doesn't need the lower-right corner as territory after killing the lower left. After B B1 W C2 B C1 W H3 B G3, w gains about 7 points in the lower right but has already lost like 31 in the lower left, and black wins by 15. Either I'm missing something obvious, or you overlooked that the lower left is not a seki, or something like that. On Sat, Apr 26, 2008 at 2:58 PM, Gunnar Farnebäck [EMAIL PROTECTED] wrote: Yamato wrote: I attached the fixed version to this email. Thanks for your help. Another correction. In 119 black has a serious weakness on the right. Is there any way for black to win after B B1, W C2, B C1, W H3? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 19x19 MC improvement
On Jan 23, 2008 7:39 PM, Jason House [EMAIL PROTECTED] wrote: On Wed, 2008-01-23 at 18:57 -0500, Eric Boesch wrote: I am curious if any of those of you who have heavy-playout programs would find a benefit from the following modification: exp_param = sqrt(0.2); // sqrt(2) times the original parameter value. uct = exp_param * sqrt( log(sum of all children playout) * (child-win-rate-2) / (number of child playout) ); uct_value = (child winning rate) + uct; where child-win-rate-2 is defined as (#wins + 1) / (#wins + #losses + 2) I'm surprised to see that this works as listed, because the math looks all wrong to me... Argh. I have to retract the claim that this helps. I didn't optimize the libego parameters correctly before I tested it. Sorry about that -- I thought I did. There's a lot more I could add, but I thought I'd get that out there before anyone wasted (probably) any more time on my error. By the way, does anybody know of any nifty tools or heuristics for efficient probabilistic multi-parameter optimization? In other words, like multi-dimensional optimization, except instead of your function returning a deterministic value, it returns the result of a Bernoulli trial, and the heuristic uses those trial results to converge as rapidly as possible to parameter values that roughly maximize the success probability. The obvious approach is to cycle through all dimensions in sequence, treating it as a one-dimensional optimization problem -- though the best way to optimize in one dimension isn't obvious to me either -- but just as with the deterministic version of optimization, I assume it's possible to do better than that. It might be fun problem to play with, but if good tools already exist then it wouldn't be very productive for me to waste time reinventing the broken wheel. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] 19x19 MC improvement
On Jan 23, 2008 3:11 AM, Hiroshi Yamashita [EMAIL PROTECTED] wrote: 2. Change UCT exploration parameter exp_param = sqrt(2.0); uct = exp_param * sqrt( log(sum of all children playout) / (number of child playout) ); uct_value = (child winning rate) + uct; I changed exp_param sqrt(2.0) to sqrt(0.1). 2.0 to 0.1? That's a pretty big step. I did notice that libego worked better with lower constants than recommended by the formula -- in libego, the original UCB1 formula corresponds to an explore_rate coefficient of 8, but the default explore_rate coefficient of 1 seemed to be about the best. I am curious if any of those of you who have heavy-playout programs would find a benefit from the following modification: exp_param = sqrt(0.2); // sqrt(2) times the original parameter value. uct = exp_param * sqrt( log(sum of all children playout) * (child-win-rate-2) / (number of child playout) ); uct_value = (child winning rate) + uct; where child-win-rate-2 is defined as (#wins + 1) / (#wins + #losses + 2) If it happens that you do the equivalent of initializing #wins to 1 and #losses to 1 (in libego, setting the initial bias to 2), then you can just use your original (child winning rate) value. (W+1)/(W+L+2) is the mean of a beta(W+1,L+1) random variable, which is what you get when you start with a uniform(0,1) distribution and condition it by the observation of a W-L record. The explore parameter inside the square root is doubled so that when you have an average child-win-rate-2 value of 0.5, the formula returns the same value as before. (Using an initial bias of 2.0 seems like a good thing anyhow.) Adding this extra term seemed to help a bit (57% +/- 4.5% win rate over unmodified program) when the basic playouts were uniform. This change tends to make the formula stick with proven winners more than before, so you might need to increase the explore rate by a little more than sqrt(2). Your mileage may vary -- I'd also like to know if you try this change and find it unhelpful. The justification is that the standard deviation of a beta(1,21) is lower than the standard deviation of a beta(11,11) variable. The error term of a beta(21,1) is lower too, but applying the (L+1)/(W+L+2) term to the UCB1 formula can yield nonsensical results where when two moves have the same bias, the one with the worse win-loss record is assigned the higher UCB1 value. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Is MC-UCT really scalable against humans?
On Jan 22, 2008 1:43 AM, Petri Pitkanen [EMAIL PROTECTED] wrote: Even top MC programs fail to see that a group with 3 liberties with no eyes is dead. A 3-liberty group with no eyes has a 100% chance to die during playouts unless a surrounding group dies first. 100% chance to die is as good a job of seeing deadness as a generic MC playout can ever do. MC playouts explicitly understand eyes at the don't fill a 1-eye in playouts level and implicitly have a crude understanding of the 2-eye rule. They essentially give you Benson's algorithm -- clearly an eye-counting algorithm -- for free. Benson's-alive chains almost never die in playouts while chains that lack eyespace and are surrounded by Benson's-alive chains almost always do. Under-the-stones chains can kill Benson's-alive chains during playouts, -- -- -- | . O O # . . .| | . O O # . . .| | . O O # # . .| | O . O # # # #| | O . O # # # #| | O . O # # # #| | O O O # # . .| | O O O # # . .| | O O O # # # .| | # # # O # # #| | # # # . # # #| | # # # O # # #| | . # . O O # .| | . # # . . # .| | a # # O . # .| -- -- -- but I think this scenario is so unlikely that it's not worth a second thought. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Please have your bot resign, for your own good
In chess, playing the game out to the bitter end can be defensible, but in go, it isn't. The meaning of the end of the game in go is fluid, but it's not when it's no longer possible to play a move. In absolute time limit games, when significant per-move lag exists (which is true in all human matches and some computer ones), it is impossible to schedule correctly to deal with the possibility that the opponent will continue playing for as long as possible after the game is already over. Either you allocate too little time for the real game, putting yourself at a disadvantage if you and your opponent play reasonably, or you leave yourself without enough time to handle unreasonable play. All go players know how to keep playing after the game ends, but it's as childish and outside the spirit of the rules as starting a no time limit game, realizing you have fallen behind, and never returning to the board. In the eyes of most players, a time win in a lost position well after the game has ended isn't even a win-with-an-asterisk -- it's a dead loss, even moreso if the player with the winning position played as quickly as lag would permit. This is why byo yomi can actually shorten games: compared to absolute time, it removes much of the incentive for children to keep playing after the game is over. All methods of compensating for net lag require some level of trust. Even if a fraud-proof way to detect net lag existed, it would interfere with time controls and scheduling. If it takes a second before program B knows what move program A played, then that's a second during which program A has better information about the game than program B has. You can say that it hopefully averages out in the wash, but the possibility that a program might gain an advantage from its poor network connection is still there. Any compensation for net lag at all means that a program that ponders gains a greater advantage than the official time controls would suggest: imagine 1 second per move, plus 2 seconds of net lag, in which case the programs have 1 second per move to think, plus 2+1+2 seconds between moves to ponder. So I think Peter was right to direct his request to other programmers, instead of asking Don to compensate for net lag. I was going to suggest copying KGS's no time cost for a pass within a reasonable number of seconds rule, but I see Erik just did that. Well, I'll just second the suggestion. Unfortunately, the reasonable number of seconds would probably have to be low, just so it doesn't increase the game durations to unreasonable levels. One player passing over and over while the other keeps playing dumb moves is a scenario that is certain to occur over and over, and the only reason it's not a bigger issue is that long delays by the passing player ought to be infrequent. With a little bit of go knowledge, it is possible for the server to punish programs that play time-wasting moves, but if you can't count on the players to be smart enough to know when they're wasting time, that could mean assigning a loss to programs that are actually winning. A simpler approach would be if the server just ended the game as soon as it became statically solvable -- though exotic sekis aren't statically solvable, so if they appeared then you'd have to fall back on the two-passes-end-the-game rule. (In theory, superko can break static analysis -- superko might forbid capturing a surrounded chain, for example -- but in practice, I bet it never would.) Finally, I'm still not aware of any go programs that keep playing after they have obtained a dead lost position because the programmer wants them to do it. It's just easier to write a program that doesn't know when to resign than to write a program that does. On this point, I disagree with Erik. We get enough stupid games by accident that we don't need programs that play asinine games on purpose. Peter already identified some reasons. Also, if programs prolong the game as much as possible, it means that the next round of ALL programs will be delayed, and the more programs you have that deliberately refuse to resign, the greater the probability that you will get two that, because of a failure on the part of the leader to correctly bring the game to a close, play a match that goes on basically forever. In that case, only a double-forfeit is sufficient -- switching to true absolute time limits would not be good enough, because two players in a Who can play stupid moves fastest? battle could swamp the network and cause lag for programs playing real games. One could even temporarily block the accounts so other programs can continue while the programmers in question debug their broken programs. On Jan 2, 2008 8:22 AM, Don Dailey [EMAIL PROTECTED] wrote: Hi Peter, CGOS doesn't count the first 1/4 second of thinking time and this could help a little. This isn't the same as Fischer time however because you are not given the time if it adds to your surplus. It is designed so that if you
Re: [computer-go] rotate board
Taking the min of the 8 rotated and reflected values is safe enough. Yes, the probability density will be eight times higher at the low end, so you're left with 61 bits and change worth of collision protection instead of 64. If that's not enough, then you can use a bigger hash size, as has been mentioned. And since all practical hash table sizes are far less than 2^61, let alone 2^64, I think that (minimum hash % hash_table_size) should work fine as a key to your hash table, while -- and this may be different from what Jason had in mind -- the formula ((bit-reverse of mininum hash) % hash_table_size)) will, if hash_table_size is a multiple of 8, needlessly favor hash values that are even or multiples of 4 or 8. On Dec 20, 2007 1:33 PM, Don Dailey [EMAIL PROTECTED] wrote: If you are going to compute all 8 hash keys, you can just add them up at the end instead of picking the minimum. Wouldn't that be better? I think that's pretty workable.XOR is definitely wrong here. If you use xor, then the empty board would hash to the same value as the position after a stone (of either color) is placed on e5 as well as any other symmetry like this.I also think symetries like putting a black stone on 2 points across from each other (such as in diagonal corners) would create a zero hash because you have 2 sets of 4 hashes that cancel each other.I think addition as Álvaro suggests fixes this. No, the problem you identified applies to addition too. There is no 100% certainty of collision, but there is a very elevated probability of it. The eight symmetries include reflections and 180 degree rotations, all of which have the property that s(s(p)) = p. Suppose your symmetry transformation exchanges points a and c, and points b and d. How does the sum of the Zobrist hashes compare for the set {a,b} versus the set {a,d}? They will collide if (a XOR b) + (c XOR d) = (a XOR d) + (c XOR b) If (but not only if) ((a XOR c) AND (b XOR d)) == 0 then a collision is guaranteed. The probability of this is closer to 2^-32 than to 2^-64. I suggest that those who are interested follow Erik's link (http://computer-go.org/pipermail/computer-go/2007-February/thread.html#8653), as this is not the first or second (or probably even third) time this issue has come up, and as people have warned several times before, it is easy to get wrong. I vaguely remember that somebody found a safe set of Zobrist values that allows reflections to be detected without recomputation and without greatly increasing the collision probability was found, but I don't remember the details. I also vaguely remember hearing that the random values with rotated nybbles approach is broken too. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] rotate board
I wrote: If (but not only if) ((a XOR c) AND (b XOR d)) == 0 then a collision is guaranteed. The probability of this is closer to 2^-32 than to 2^-64. Before anybody else feels the need to correct me here -- to be more precise, the probability of collision is at least E(0.5**binomial_variable(64, 0.5)) ~= 1/100,000,000. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Hall of fame for CGOS
For what it's worth, Hall of Fame can mean anything from the Baseball Hall of Fame (serious business) to the Mullet Hall of Fame (a total joke). To me this looks like a pretty clear misunderstanding. Don and Hideki have both contributed usefully to the mailing list, and it would be too bad if this incident spoiled that from either end. I think most people who even know what CGOS is, probably know the context in which it is used, so they're not going to take poor results too seriously. Nobody would deny that applying static ratings to a program that changes may not be representative, but if that is an issue, then people can create new IDs for their 100% debugged bots so that static ratings will give them all the credit they deserve. The new bot would not get established immediately, but I doubt there are that many people outside this mailing list who are even aware of the CGOS all-time 9x9 rating list yet. As for any lower-rated versions of the same bot, I think that as long as there exists a highly rated version of a given program that has an established rating, people will have enough sense to ignore lower-rated versions that may represent experiments, buggy prototypes, and the like. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How does MC do with ladders?
On 12/11/07, Mark Boon [EMAIL PROTECTED] wrote: Question: how do MC programs perform with a long ladder on the board? My understandig of MC is limited but thinking about it, a crucial long ladder would automatically make the chances of any playout winning 50-50, regardless of the actual outcome of the ladder. No, 50/50 would make too much sense. It might be higher or it might be lower, depending on whose move it is in the ladder and what style of playouts you use, but exactly 50/50 would be like flipping a coin and having it land on its edge. In test cases, MC-UCT evaluations tend to cluster near 50/50 in any case, because MC-UCT, especially dumb uniform MC-UCT, tends to be conservative about predicting the winner, especially in 19x19 where the opportunities for luck to overwhelm the actual advantage on the board are greater. But if you accept this as just a moot scaling issue -- that a clearly lopsided exchange can mean just a 2% increment in winning percentage even if read more or less correctly -- then the numbers may not look so even after all. I's certainly possible for MC-UCT to climb a broken ladder in a winning position (and climbing a broken ladder in an even position is at least half as bad as that anyhow). I tried testing this on 19x19 using libego at 1 million playouts per move. The behavior was not consistent, but the numbers trended in the defender's favor as the sides played out the ladder. In one bizarre case, the attacker played out the ladder until there were just 17 plies left, and then backed off. Why would the attacker give up a winning ladder? It appears the MC-UCT was never actually reading the ladder to begin with; just four or five plies in, sometimes just a few thousand simulations were still following the key line. 1 million playouts were not nearly enough for that in this case; maybe 100 million would be enough, but I couldn't test that. Also, after enough simulations, decisively inferior moves lead to fewer losses than slightly inferior ones. Suppose you have three moves available: one wins 75% of the time, one 50%, and one 25%. In the long run, the 75% move will be simulated almost all the time, but the middle move will be simulated roughly four times as often as the 25% one that, compared to the best move available, is twice as bad, and four times the simulations with half the loss per simulation adds up to twice the excess losses compared to the 25% move. That is apropos here, because giving up on an open-field ladder once it has been played out for a dozen moves is much more painful for the defender than for the attacker. The longer the ladder got, the more the evaluations trended in the defender's favor, and my best explanation would be the fact that -- until you actually read the ladder all the way out and find that the defender is dead -- every move except pulling out of atari is so obviously bad that even uniform MC-UCT did a better job of focusing on that one good move. (Incidentally, the conservative nature of MC-UCT ratings largely explains why maximizing winning probabilities alone is not a bad strategy, at least in even games. The classic beginner mistake, when you already have a clear lead in theory, is to fail to fight hard to grab still more points as blunder insurance. But an MC-UCT evaluation of 90% typically means a 90% probability of actually winning against even opposition, not just a 90% likelihood of a theoretical win. Assigning a 65% evaluation to an obvious THEORETICAL win allows plenty of room to assign higher evaluations to even more lopsided advantages. As Don said, when MC-UCT starts blatantly throwing away points for no obvious reason, it's almost certainly because the game is REALLY over, because MC-UCT's errors tend to be probabilistic instead of absolute -- it may in effect evaluate a dead group as 75% alive, but it won't call it 100% alive except in the rare cases when the underlying random playout rules forbid the correct line of play.) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How does MC do with ladders?
Make sure that you use the -19 argument when starting 19x19 Mogo, and restart GoGui (in order to restart Mogo indirectly) after you change the settings. Somewhat confusingly, Mogo does not automatically play 19x19 style just because it receives a request for 19x19 board. Poor ladder handling and squeezing the toothpaste are both behaviors that Mogo can exhibit when playing 9x9-style on the 19x19 board. If, assuming you're in GoGui, the GTP shell window shows shishoCheck is called comments, then you're really playing 19x19 style. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Computer Go tournaments - various
On Nov 27, 2007 8:29 PM, Don Dailey [EMAIL PROTECTED] wrote: Ian Osgood wrote: Checking the participants, I see that MoGo and CrazyStone were specifically invited. Also playing is a version of GNU Go (presumably), as well as veterans Aya and Katsunari, and two dozen others. What boggles my mind is the lack of participation in these events from commercial players like KCC Igo, Haruka, Go4++, Handtalk, and Many Faces. It's because these programs will get killed by the top Monte Carlo programs. It's risky competing when your reputation is involved.In fact, it's better not to compete than to compete and score poorly. I was about to call you on the Many Faces case if Dave didn't. He has never hesitated to admit it when other programs were stronger, and Many Faces 11 plays on 19x19 CGOS when the site is up. Also, I don't know if Handtalk is active development anymore. But you're basically right, and your direct language is justified to cut through the hemming and hawing. Some programs do not compete because they are no longer being maintained -- but these obsolescent programs would lose anyway. Other programs do not compete because they would lose. Self-promotion while ducking stronger competition still works, but hopefully more people are starting to smell the trick's age. CrazyStone and Mogo win by winning, not by hiding. I haven't even seen any reason to believe that right now there exist any commercial programs that can make as strong a claim to third place as the latest GnuGo or MonteGnu. Aside: fair descriptions of your program and how it works, and possibly what didn't work, can be very useful. The Mogo and CrazyStone papers are excellent, and it would be great to see more of those one-sentence program descriptions on Sensei's turned into links to actual web pages. If any of you do this, be sure to inform the mailing list. That kind of information is definitely not the sort of self-promotion I was criticizing. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Tweaking the Pseudo Liberty Code
On Nov 24, 2007 10:38 AM, [EMAIL PROTECTED] wrote: * For liberty values, I use the following equation: values[i] = 25000 + (16 * (i+1)) + ((i+1) * (i+1)); Some bits are constant, some are linear, and some are quadratic, which guarantees that the sum of up to 4 values will only be in the set if they are from the same liberty. I have more confidence in an equation than in random numbers. It's embarrassing how long it took me to notice that you were exploiting the theorem that if you have n terms that sum to m, then the sum of the squares of the terms will exceed (m**2)/n unless each term equals exactly m/n. (This is essentially the same statistics theorem that says that the expected value of the square equals the square of the expected value plus the variance.) Instead of smushing the three parts of your triple together, why not keep them separate? So when you add a 1-dimensional coordinate, you do { libs++; sum += coordinate; sumSquares += coordinate * coordinate; } which is admittedly a little more complicated than a table lookup, but then the atari test becomes (sum * sum == sumSquares * libs) Computing the triple-sum and testing for atari are just a couple adds and multiplies. The process involves no initialization, lookup tables, 64-bit numbers, loops, branching, divisions, or remainders, and (even though this doesn't matter for normal go, it's still nice) there's almost no limit to how many duplicates of the same value you can detect. I like it. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Speed of generating random playouts
On 11/15/07, Chris Fant [EMAIL PROTECTED] wrote: On Nov 15, 2007 3:20 PM, Eric Boesch [EMAIL PROTECTED] wrote: On 11/14/07, Chris Fant [EMAIL PROTECTED] wrote: Based on more recent emails, this may not be useful anymore, but I have a list of 361 32-bit numbers that satisfy these properties in case anyone is interested. I'd be interested in your implementation tricks. Where did the number 17 come from? 17 didn't come from me. Here's how I made my list: Start with 500 random numbers. Throw out the ones that violate the tests. Hope that you are left with enough (361). This actually worked all the way down to 15-bit numbers. Neat. Was the 15-bit version for 81 values or 361? At the risk of putting my foot in my mouth, I don't think there exist 361 15-bit numbers that satisfy minimum requirements (if the floating-point average of any four code values is a code value, then the four code values are identical). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Speed of generating random playouts
On 11/16/07, John Tromp [EMAIL PROTECTED] wrote: On Nov 16, 2007 10:05 AM, Chris Fant [EMAIL PROTECTED] wrote: Neat. Was the 15-bit version for 81 values or 361? At the risk of putting my foot in my mouth, I don't think there exist 361 15-bit numbers that satisfy minimum requirements (if the floating-point average of any four code values is a code value, then the four code values are identical). It was 361 values. So either you are wrong or I have a bug. I probably have a bug. Here's the list. If it violates the rules, please let me know. Yep, I think I had a bug. I just removed an optimization that I thought was valid and now I'm getting smaller lists. So I guess it was not valid. Let me see how small I can get the numbers without that optimization... No, it was far from valid; e.g. 14+14+14+3022 = 4 * 766 I don't think you can get 81 15-bit values either... I think we are applying different standards. For the minimum requirements I mentioned, 14 bits will do, for a strictly limited defintion of do. 81 equals 1010001 base 2, and 1010001 base 5 is a hair under 2^14. But you'll need to do your arithmetic in base 16, and you'll need some external way (e.g. a separate pseudoliberty count) to distinguish 1+5+6+6 from 6+6+6. So one could argue that these minimum requirements aren't good enough, and really I would agree with that. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Speed of generating random playouts
Sorry, I didn't mean to send that one yet. I pressed tab, meaning to print a tab character, and return soon after, which caused gmail to, first, change the focus to the send button, and second, send the mail. That last bit was supposed to be if (code_sum 5 * threshold) { int pseudoliberty_count = code_sum / threshold; if (code_sum % pseudoliberty_count == 0) { int average = code_sum / pseudoliberty_count; if (average is in my original code_value set) then there is just one liberty. } } The if average is in my original code_value set seems like a bottleneck here, requiring about #bits (i.e. about 9, since 361 is a 9-bit number) operations no matter how you do it as far as I can tell (unless you use a stupidly gigantic lookup table), and there's a solution to that, too, if you don't mind wasting a little more space. Use base 8 instead of base 5. Unfortunately, then it is no longer possible (without using a separate pseudoliberty count) to squeeze the result into a 32-bit number and be sure that, for chains with 19 liberties, for example, you don't get overflow. But it does permit using a bitmask to convert the in my original code_value set test to constant-time: if ((average 0b110110110110110110110110110) == 0) { then there is just one liberty } Here, 0bfoo is the binary number foo (yes, I know that's not legitimate C code) and I'm supposing that #bits == 9. I hope I got this right -- I'm sort of hurrying to correct it before anybody wasted too much time trying to decode it. (Sorry, Jason :) On 11/14/07, Eric Boesch [EMAIL PROTECTED] wrote: Encode each number by swapping base 2 with base 5, without changing the digits. So binary 11011 (27) is represented as base-5 11011 (756). This allows you to sum up to 4 such numbers without experiencing overflow from one digit to the next (since 4 1's adds up to just 4). Essentially, you are performing b additions in parallel. If the numbers you are summing are all the same, and the pseudoliberty count (#libs) is no more than 4, then each base-5 digit of the sum will equal 0 or #libs. With a slight modification, you can also eliminate the pseudoliberty count completely and just use a single number. Take the largest code value you will need, add 1, multiply by four, round up to the next power of 5, and add this value to every code value, and now you can just use the test ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Speed of generating random playouts
Encode each number by swapping base 2 with base 5, without changing the digits. So binary 11011 (27) is represented as base-5 11011 (756). This allows you to sum up to 4 such numbers without experiencing overflow from one digit to the next (since 4 1's adds up to just 4). Essentially, you are performing b additions in parallel. If the numbers you are summing are all the same, and the pseudoliberty count (#libs) is no more than 4, then each base-5 digit of the sum will equal 0 or #libs. With a slight modification, you can also eliminate the pseudoliberty count completely and just use a single number. Take the largest code value you will need, add 1, multiply by four, round up to the next power of 5, and add this value to every code value, and now you can just use the test if (code_sum = threshold) { ((code_sum code_sum div threshold ! All you have to do is to add a sufficiently large fixed value to every single representation. Intersperse double-zeroes inside your input value, so binary 11011 is encoded as 100101001. Now summing up to 7 such values is guaranteed not to overflow one triad of bits to another, so essentially you're performing b binary additions in parallel. That's it. Divide the sum by a little bit more than 3b bits. 3b bits seems to be very much This could be made more compact, but then bitmasking wouldn't be as easy. . Any coordinate is just a sequence of bits. Each bit can be encoded separately. So the problem reduces to how to encode a single bit (0 or 1) so that the sum of up to 4 values equals a given number (say 0) only if the bit is equal each time. For bitmasking ease, you can use 3 bits per bit. Spread each input number out like this: the Up to 4 times 1 (mod 0) equals 0 only if On 11/14/07, John Tromp [EMAIL PROTECTED] wrote: On Nov 14, 2007 5:03 PM, Imran Hendley [EMAIL PROTECTED] wrote: On Nov 14, 2007 3:19 PM, John Tromp [EMAIL PROTECTED] wrote: On Nov 14, 2007 2:00 PM, John Tromp [EMAIL PROTECTED] wrote: My solution doesn't make use of that, and satisfies the stronger property: 0 = a_i = 4 and sum a_i * n_i is in 1*nums union 2*nums union 3*nums union 4*nums = only one a_i is nonzero. that was not quite correct. it should say: let a_i be the number of adjacencies to a liberty at point i. if sum a_i = 4, and sum (a_i * n_i) is in {1,2,3,4} * nums, then only one a_i is nonzero. I'm really lost now. a_i is the number of stones we have adjacent to a liberty at intersection i? Do we need to know the location of our liberties to update sum a_i? How is this easier than just remembering For every string, you can keep track of this sum incrementally. When the string establishes a new adjacency to an empty point i, you add code[i] into the sum. the locations of all real liberties you saw? How do we know that the stones around i are from the same group? What are the n_i in n_i was the name that Tom gave to my code[i]. sum(a_i*n_i)? is {1,2,3,4}*nums supposed to be a cartesian product of no, it's just my shorthand for the union of the 4 sets nums, 2*nums, 3*nums and 4*nums. regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Speed of generating random playouts
On 11/12/07, Petr Baudis [EMAIL PROTECTED] wrote: Does any frequently playing real-world bot use libEGO? It seems still order of magnitude faster, but it looks like that is because it simplifies some things too much. For example, its board::merge_chains() does not seem to take account for merging chains that might have had liberties in common. But maybe I just overlooked something? I believe those are pseudo-liberty counts, not actual liberty counts. If either count equals zero, then the other one will too, so pseudo-liberties are good enough for identifying suicide and captures. Check the mailing list archive. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Former Deep Blue Research working on Go
On 10/9/07, Andrés Domínguez [EMAIL PROTECTED] wrote: 2007/10/9, Eric Boesch [EMAIL PROTECTED]: Naive null move is unhelpful because throughout much of a go game, almost every move is better than passing, I think this is not the point of null move. Null move is if pass is good enough to an alpha cut, then will be a _better_ move. It is not important if pass is the worse move, is important that there is a better (=) move than pass (not zugzwang). Then you bet searching not so deep. Sorry, that was sloppy writing in several places. I was not trying to argue why null-move pruning (NMP) would give the wrong answer, but why, even if NMP performs as intended and horizon effects don't spoil its evaluation, it might not prune many moves. The hope is to prune variations that are bad and lopsided enough that starting at some point, one side loses the equivalent of a whole move compared to, er, more or less, correct play by both sides, right? The fraction of variations that fit that description will increase with the depth of the tree and the variability of move quality. The depth and variability are both likely to be lower in global go search than in global chess search. (As for local go search, as I already explained, I think that even if NMP is effective when compared with bare-bones alpha-beta, it is still less effective than other approaches like lambda search.) If all moves except pass for both players are better than nothing, then if NMP works as intended, no moves will be pruned in the first two plies of the search (it takes at least two moves by the same player to fall a full move behind). If an average move is more than two-thirds as valuable as the best move -- which is usually true in go for, very roughly, the first 20 moves of a typical 19x19 game -- you'll have to go six levels deep before you see many NMP cutoffs (even if white's sequence is below average and cumulatively a move worse than best, it may not lose a full move to black's imperfect responses, so only a minority of 6-ply sequences will be eligible, and then you have to consider how many of those sequences would be cut off by alpha-beta anyhow -- I would assume the sequences that NMP might prune would be cut off by ordinary alpha-beta at a greater rate than more balanced sequences would be). You won't see NMP cutoffs at the bottom of the tree, either, because it's too late to prune then. If NMP doesn't prune much near the root or the deepest nodes, and the tree is not very deep because the branching factor and static evaluation cost are high enough that there isn't time to search very deeply, then NMP isn't doing much, period. I think that is at least part of what has limited the benefits of null move pruning for full-breadth global search in go. Selective global search allows deeper searches, but a good selector should prune away most of the sequences NMP might otherwise have caught. None of this is an argument that NMP would be literally useless, just that it's unlikely to lead to a dramatic strength improvement. Even in chess, Chrilly Donninger said NMP was good for, what, 100 Elo points? The only alpha-beta tweak that can add 400 Elo to a chess program on its own is transposition tables, and everybody already has those. That makes it difficult to understand why non-go-programmers are sometimes so willing to believe that just souping up an alpha-beta search could turn today's top go programs, which I would say are at about the go equivalent of class B at 19x19, into the go equivalent of grandmasters. A simple-but-fast full-breadth alpha-beta go solver would have even further to go to reach grandmaster level, because it would need to reach the level of being competitive with the top tier of extant programs first (which no such program currently is). Either way, in terms of performance measured in human terms, the jump from the state of the art to world-champion-caliber play would be a far bigger leap beyond the state of the art than Deep Thought and Deep Blue ever made. (The leap to dan level, if gaining just two stones can be called that, surely requires only throwing a little more hardware at existing programs.) Okay, enough of that. If people aren't persuaded by other programmers' experience trying to map computer chess methods to computer go in a straightforward way, then they're not likely to be convinced by my hand-waving arguments either. [Regarding programmers' experience: when a top chess programmer (Chrilly) and a successful go programmer (Peter Woitke) collaborated on a chess-style go program, the result fell -- at last report, anyhow -- about 600 Elo points short of the top tier of programs at 9x9, and presumably much farther short at real go. (The 600 figure is derived from Chrilly's claims of a 60% success record against GnuGo, and GnuGo's placement nearly 700 Elo points behind Mogo on CGOS -- 9x9 is not GnuGo's long suit.) That should dispel any residual hopes that applying state-of-the-art chess-search
Re: [computer-go] Former Deep Blue Research working on Go
On 10/8/07, Tapani Raiko [EMAIL PROTECTED] wrote: May sound unpolite. But Deep Blue reached a very important step in IA. They will be known for ever. But, from a research point of view, they didn't much really. It was mainly a technological/technical achivement. Maybe they will reimplement Mogo, try a null-move tweak, use a supercomputer, and claim to have the strongest computer Go player ever. :-) Naive null move is unhelpful because throughout much of a go game, almost every move is better than passing, but generalizations of null move can help in local fights, where most of the board really doesn't matter. Thomsen calls lambda search an extension of null move. I implemented a local search that involved a pass to fill outside liberties move that acted as a stand-in for all nonlocal moves. Maybe Feng-hsiung has something similar in mind. For programs that read out local goals in the first place, it's natural to implement some method -- lambda proof-number search with inversions, as in Thomsen's MadLab, is probably one of the better ones -- to insure you're not searching the whole board just to solve, say, a lousy crane's nest (http://senseis.xmp.net/?CranesNestTesuji). I think Mogo and CrazyStone do not do this, instead using very good whole-board vision to compensate for relatively weak local tactics. Even MadLab can be slow to solve the kind of tactical problems you would think it. MadLab's search is admissible (though a bit buggy in case of ko), and it seems that admissible search is often very hard even when making a guess that is probably right is easy. With many harder problems (MadLab did solve some some tricky, let's say 3 dan level, problems very quickly, when the key variation stayed reasonably narrow all the way to the end) I concluded that MadLab was finding the tesujis that you would normally call the solution, but then getting bogged down in the easier (to human eyes) life and death problem of mopping up cut-off chains. There are endless practical examples of easy to guess, hard to prove positions, with wide-branching (even after narrowing the search region down to intersections that really matter), deep, boring, straightforward grinds towards inevitable victory, where a glance or 100 Monte Carlo simulations might reveal the correct answer. For example, can a black stone in the center of an empty 19x19 board live? Of course the answer is yes. Okay, now try to prove it -- or don't, because it's my bet that even with computer help, no one will succeed in doing so in the next five years. In running battles with sketchy boundaries and nothing resembling an eye yet, you can usually forget about trying to prove who will win. (If the aforementioned stone in the center of the board had the 17x17 region above the first line all to itself, it might still be dead -- strong players say that if just the border of the 19x19 board is filled with stones of one color, then with correct play by both sides, the other player cannot live anywhere inside.) Even in the closed and semi-closed go problems the Smart Tools team examined in their paper, they said (I'm paraphrasing from memory, but I hope I get the gist right) that often, proving the correct answer with their tsume-go solver took far longer than just being 95% sure. Similar issues also arise in chess, but are easier to handle within a classic alpha-beta framework -- if proving checkmate is hard but recognizing the sure win is easy, it's usually because one side forces a material advantage, which even the crudest static evaluator can recognize. If you're writing a generalize go playing program, there's plenty of opportunity to admissibly optimize tactical searches, but don't expect tweaking the admissible elements of your search to the limit to adequately compensate for lack of guessing skill when proof is not practical, even if the search is meant only for clearly tactical problems and not for direct application to opening play, strategic decisions, or loose positions. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Heuristics for MC/UCT with all-or-nothing payouts
The UCT heuristic of trying every child of a node once before trying any child twice is reasonable when the payoff distribution is unknown. Why try the lever that paid $5 a second time if there might be another lever that pays $1,000,000? But when the set of possible payoffs is known to be {1, -1} -- you either win or you lose -- it makes no sense to abandon a move with a perfect winning record. Suppose there are 1000 candidate moves -- if the first move you try happens to win, are you going to waste the next 999 playouts on moves that (because you only try them once) can't possibly yield candidates with better records than the one you just found? The repeat-winners heuristic (keep trying a move until it loses) allows you to find exceptionally promising moves without having to identify 500 mildly promising ones first. It may make sense to revisit promising children that have both won and lost at least once before visiting some other children at all. In a game with an infinite branching factor, this is obviously the case -- otherwise the tree would never grow past the first two levels, so you'd never advance past naive play -- and the same may hold if the branching factor is just very high, even if you have no a priori reason to prefer one child to another. (I *think* that's a difference from the iterative/progressive widening/unpruning. Heuristics are good; it's no surprise that a good pattern database leads to a much stronger MC player -- but using a progressive approach without heuristics requires an expansion strategy that won't just end up trying new children all the time like the repeat-winners heuristic when the children's pre-evaluations are all equal, or even worse, retry random old children without regard to their win/loss record). If you have already examined 100 child nodes, then you'd expect to have to examine about 100 more before you find a more promising candidate. Before doing that, it might make sense to expend some effort on reexamining the most promising candidates so far, to sort the really good candidates from the ones that started out on a lucky streak and to sort the robust moves from the ones that only work against inferior replies. As a dead-simple heuristic, one might, when there some children that remain unexpanded, allocate half of the effort to reexamining the most promising previously expanded children (not counting children with perfect winning records which would be immediately reexamined anyhow). This approach only doubles the number of playouts needed before all children are explored (to repeat myself, I don't consider the children to be properly explored until each child has lost at least once), but it can lead to a less trivial and hopefully smarter tree topology -- in short, it is more resistant to pathological behavior (such as never expanding any part of the tree past depth 2 because you're too busy trying new children) than the repeat-winners heuristic. The extra effort is not wasted once all children are explored, either, because the extra playouts are mostly performed on nodes that probably would soon be reexamined anyhow. It is also easy to imagine why MC programmers in games with {1, -1} payoff sets would discover that the choose-best-average-payoff heuristic is not as good as the choose-the-move-with-the-most-wins heuristic when selecting the best move to play. The older Mogo paper mentioned some plausible reasons, and it's easy to imagine others. For typical distributions, a 10-2 record is better than a 6-1 record, which is better than a 1-0 record. Finally, with the repeat-winners or progressive-widening-type (not sure what to call it; a quick glance at the papers suggested to me -- perhaps wrongly -- that it's not precisely either of the heuristics called progressive widening or progressive unpruning, but a more obvious and naive cousin of both) heuristics, the odd/even ply bias that Jason brought up is not likely to be much of an issue (whether or not it is ever a big issue for the try-every-child-once heuristic), because the search tree becomes imbalanced so quickly. My guess is that there is likely to very rapidly develop an expected roughly 50/50 mix of tree leaves at even and odd depths if wins and losses are roughly equally frequent even after minimaxing. So the normal UCT protocol makes sense with unknown sets of possible payout values, but it also makes sense to use slightly different choice heuristics when the payout value set is {1, -1}. Of course most of you have probably done that already, along with possibly more radical changes -- I know some of these heuristic suggestions are not new (because I copied them from this mailing list), and I would bet that none of them are. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Paper presents results on proximity heuristic
On 2/7/07, David Doshay [EMAIL PROTECTED] wrote: On 7, Feb 2007, at 1:35 PM, Don Dailey wrote: Although suicide can occasionally be the best move in some rule-sets, I think it weakens your program to include it, At best you are going to get a ko threat, so it requires a pretty sophisticated program to know how and when to use it. There are weird cases where suicide is more than just a ko threat. http://www.goban.demon.co.uk/go/bestiary/rule_challenge.html (Just picking nits.) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Effective Go Library v0.101
On 2/7/07, steve uurtamo [EMAIL PROTECTED] wrote: And of course don't forget about this no_variable_in_for thing... i'll have to read up on what you're describing. The following pseudocode block for (int i = 0; i 10; ++i) { ... code ... } // i's lifetime ends after the for loop does is valid in just about any version of C++ and in the 1999 ISO C standard (also known as C99), but it is not valid in most older C standards. (Some older versions of gcc would accept this code but assign the wrong lifetime (according to the standard) to variable i. If you want to test this code with gcc, then use the -std=c99 flag, which, as of quite recently at least, was not enabled by default.) There are at least a couple other C++-isms -- offhand, the // single-line comment form and variable declarations in the middle of code blocks come to mind -- that are also valid in C99 but invalid in at least some of the older C standards. I'm not trying to run off on a language standards tangent! I just feared that we might be headed towards one anyway, and I wanted to make sure the information in it was not 8 years out of date. Sorry, now back to go (I hope)... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/