Re: [computer-go] Dynamic komi at high handicaps
Maybe I should ask first, for clarity sake, is MCTS performance in handicap games currently a problem? Mark Yes, it's a big problem. And thats not a matter of opinion. MC bots, leading a game by a large margin, will give away their advantage lighly except for the last half point. Even on a 9*9 board, even if the bot wins more games on even with 7.5 komi, that doesn't mean that it's impossible for the human to win, giving a 2 stone handicap. All it needs is a single bot missjudgement after the game got close. Granted, bots are really excellent at defending the last half point advantage tooth and claw. I'm just saying that it should be impossible for the human to win on 2 stones, and it isn't. If they are behind by a large margin they will play either random or ko threat type moves. So there is a kind of symmetry here. Beeing too far ahead or behind ruins the bots plays. The biggest practical problem right now is poor play against pros on a 19*19 board, taking a large handicap. Special fuseki patterns are only a patch. When, after a decent opening, the regular patterns take over, they usually immediately start to work against the bots own previous moves. Looking into the horses mouth, instead of invoking Aristotle, is really the only way to find out. I had hoped that programmers would find the idea interesting enough to try it out. Instead, I found myself in a hand waving contest. Granted, I started it, so I can't complain. Thanks to Ingo for simulating dynamic komi by hand to give programmers something less speculative. Btw, I played 2 games (as gogonuts) on KGS against goIngo(really ManyFaces). I won both on 5 stones. But in the first one, with komi adjusted by Ingo, I had to make a very critical invasion that should not really have worked. In the second game I won without problems. At the time, Ingo adjusted the win rate for w to 50%. Since then, with his limited trials, Ingo found out that adjusting the komi to give each side a 50% win rate isn't optimal. His current rule is to adjust to 42% for w. This is ofcourse only a crude start, but sophistication can only be introduced by programmers. Stefan ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
I admit I had trouble understanding the details of the paper. What I think is the biggest problem for applying this to bigger (up to 19x19) games is that you somehow need access to the true value of a move, i.e. it's a win or a loss. On the 5x5 board they used, this might be approximated pretty well, but there's no chance on 19x19 to do so. Am 13.08.2009 um 05:14 schrieb Michael Williams: After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the playouts are close to 50/50. Adjusting the komi would make black play greedily, while random passing during playouts would make black play safe (rich men don't pick fights). Tapani Raiko Christoph Birk wrote: I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Tapani Raiko, tapani.ra...@tkk.fi, +358 50 5225750 http://www.iki.fi/raiko/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
In future papers they should avoid using a strong authority like Fuego for the training and instead force it to learn from a naive uniform random playout policy (with 100x or 1000x more playouts) and then build on that with an iterative approach (as was suggested in the paper). I also had another thought. Since they are training the policy to maximize the balance and not the winrate, wouldn't you be able to extract more information from each trial by using the score instead of the game result? The normal pitfalls to doing so do not apply here. Isaac Deutsch wrote: I admit I had trouble understanding the details of the paper. What I think is the biggest problem for applying this to bigger (up to 19x19) games is that you somehow need access to the true value of a move, i.e. it's a win or a loss. On the 5x5 board they used, this might be approximated pretty well, but there's no chance on 19x19 to do so. Am 13.08.2009 um 05:14 schrieb Michael Williams: After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
I'd like to say that the problem comes from the fact the model of the opponent in the simulations is not enough accurate in MCTS flamework. So, the solution is to make the model being more precise but this has practically no sense. What is Komi or handicap? Since W is stronger than B, W must gain some points in future at any position in a game. Handicap can be thought as that points at the initial position (assuming handicap stones can be converted to handicap points). Hence, handicap points could be used to correct the model of the opponent. For example, assuming 7 stones handicap is equivalent to 70 points, we can use 70 for the hidden komi at the beginning and decrease it towards the end of the game. Does this make sense? Hideki Don Dailey: 5212e61a0908121411y3198e9d9m55441378fa01...@mail.gmail.com: The problem with MCTS programs is that they like to consolidate. You set the komi and thereby give them a goal and they very quickly make moves which commit to that specific goal. Commiting to less than you need to actually win will often involve sacrificing chances to win.Sometime it won't, but you cannot have a scalable algorithm which is this arbitrary. However, if the handicap is too high, the program thinks every line is a loss and it plays randomly. That's why we even consider doing this. Dynamically changing komi could be of some benefit in that situation if there is no alternative reasonable strategy, but it does not address the real problem - which is what I call the committal consolidation problem. You are giving the program an arbitrary short term goal which may, or may not be compatible with the long term goal of winning the game. Whether it's compatible or not is based on your own credulity - not anything predictible or that you can scale. And as the base program gets stronger this aspect of the program becomes more and more of a wart. If this can be made to work in the short term, it should be considered a temporary hack which should be fixed as soon as possible. We have to think about this anyway sooner or later because if programs continue to develop and the predictive ability of the playouts and tree search gets several hundred ELO better, these programs may start to see more and more positions as either dead won or dead lost. I'm sure we will want some kind of robust mechanism for dealing with this which is better at estimating chances that the opponent will go wrong as opposed to doing something that is a random benefit or hindrance. - Don 2009/8/12 terry mcintyre terrymcint...@yahoo.com Ingo suggested something interesting - instead of changing the komi according to the move number, or some other fixed schedule, it varies according to the estimated winrate. It also, implicitly, depends on one's guess of the ability of the opponent. An interesting test would be to take an opponent known to be weaker, offer it a handicap, and tweak the dynamic komi per Ingo's suggestion. At what handicap does the ratio balance at 50:50? Can the number of handicap stones be increased with such an adaptive algorithm? Even better, play against a stronger opponent; can one increase the win rate versus strong opponents? The usual range of computer opponents is fairly narrow. None approach high-dan levels on 19x19 boards - yet. Terry McIntyre terrymcint...@yahoo.com We hang the petty thieves and appoint the great ones to public office. -- Aesop -- *From:* Brian Sheppard sheppar...@aol.com *To:* computer-go@computer-go.org *Sent:* Wednesday, August 12, 2009 12:33:13 PM *Subject:* [computer-go] Dynamic komi at high handicaps The small samples is probably the least of the problems with this. Do you actually believe that you can play games against it and not be subjective in your observations or how you play against it? These are computer-vs-computer games. Ingo is manually transferring moves between two computer opponents. The result does support Ingo's belief that dynamic Komi will help programs play high handicap games. Due to small sample size it isn't very strong evidence. But maybe it is enough to induce a programmer who actually plays in such games to create a more exhaustive test. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ inline file ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- g...@nue.ci.i.u-tokyo.ac.jp (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
A web search turned up a 2 page and an 8 page version. I read the short one. I agree that it's promising work that requires some follow- up research. Now that you've read it so many times, what excites you about it? Can you envision a way to scale it to larger patterns and boards on modern hardware? Sent from my iPhone On Aug 12, 2009, at 11:14 PM, Michael Williams michaelwilliam...@gmail.com wrote: After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo search with no tree on small boards. Abstract In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy. We develop two algorithms for balancing a simulation policy by gradient descent. The first algorithm optimises the balance of complete simulations, using a policy gradient algorithm; whereas the second algorithm optimises the balance over every two steps of simulation. We compare our algorithms to reinforcement learning and supervised learning algorithms for maximising the strength of the simulation policy. We test each algorithm in the domain of 5x5 and 6x6 Computer Go, using a softmax policy that is parameterised by weights for a hundred simple patterns. When used in a simple Monte- Carlo search, the policies learnt by simulation balancing achieved significantly better performance, with half the mean squared error of a uniform random policy, and equal overall performance to a sophisticated Go engine. -Dave --- - ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Monte-Carlo Simulation Balancing
Is anyone (besides the authors) doing research based on this? Well, Pebbles does apply reinforcement learning (RL) to improve its playout policy. But not in the manner described in that paper. There are practical obstacles to directly applying that paper. To directly apply that paper, you must have a CrazyStone playout design, wherein you maintain 3x3 neighborhoods around each point. Pebbles has a Mogo playout design, where you check for patterns only around the last move (or two). To directly pursue this would require a rewrite. Right now, there is no published evidence that the Mogo design is inferior. In fact, two of the world's best programs use the Mogo design (Mogo, Fuego). So I am unwilling to make that commitment. I would also have to research how to scale that paper to realistic conditions, including 1) 9x9 boards at a minimum. 2) Self-play, instead of assuming an oracle. 3) Playout after a UCT/RAVE search rather than pure MC. 4) Pattern sets that have ~1 million parameters. 5) Pattern sets that have more general geometry than 3x3, perhaps. My guess is that all of these research problems are solvable. But that's a lot of work to do. If I had to face this task list, I would put it off until later, because there is always an easier way to make progress. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
This idea makes much more sense to me than adjusting komi does.At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea is not to pass but to play some percentage of random moves - which probably would work in programs with strong playout strategies. Of course this would be meaningless for bots that have weak (and already random) playout strategies. - Don On Thu, Aug 13, 2009 at 6:17 AM, Tapani Raiko pra...@cis.hut.fi wrote: I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the playouts are close to 50/50. Adjusting the komi would make black play greedily, while random passing during playouts would make black play safe (rich men don't pick fights). Tapani Raiko Christoph Birk wrote: I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Tapani Raiko, tapani.ra...@tkk.fi, +358 50 5225750 http://www.iki.fi/raiko/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
On Thu, Aug 13, 2009 at 1:39 AM, Christoph Birk b...@ociw.edu wrote: On Aug 12, 2009, at 3:43 PM, Don Dailey wrote: I believe the only thing wrong with the current MCTS strategy is that you cannot get a statistical meaningful number of samples when almost all games are won or lost.You can get more meanful NUMBER of samples by adjusting komi, but unfortunately you are sampling the wrong thing - an approximation of the actual goal. Since the approximation may be wrong or right, your algorithm is not scalable. You could run on a billion processors sampling billions of nodes per seconds and with no flaw to the search or the playouts still play a move that gives you no chances of winning. I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. I'm not trying to define the problem - that has already been done and I agree with you - if the situation is hopeless the computer will play randomly regardless of the number of playouts. I'm explaining why this solution is imperfect and not scalable. I did not say it would make it play worse than nothing at all. - Don Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
One reason dynamic komi seems a bit odd is that the numbers are pulled out of thin air. Why should the komi be X instead of Y? When should the value be changed? Going back to the original thought experiment: the komi at the start of the game should reflect the expert assessment of how far ahead black is compared to white. A rational program should periodically change that assessment, as black blunders through the game. So, my next question is, what sort of experimentation has been done to assess the likely score at various parts of the game? Any results? It seems natural that a strong player, looking at a lot of handicap stones, will recognize that the position against an equally strong player would entail a loss - but of about n*10 stones, not of n*20 stones - and set an interim goal to acquire that much, while leaving opportunities for black to fumble. As such fumbles occur, white opportunistically consolidates more territory, and expectations are adjusted upwards -- only a 40 point loss now ... only 10 points now ... striking distance ... black is torpedoed now! ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
With crazystone-like playouts, you can just put noise over the possibilites. the more noise, the more random the playout is, which is weaker. The best move in the tree is then the one that requires the least amount of noise for the other player to reach 50% win chance if behind, or the one that requires the most amount of noise for me if ahead. Would that work? Am 13.08.2009 um 16:02 schrieb Don Dailey: This idea makes much more sense to me than adjusting komi does. At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea is not to pass but to play some percentage of random moves - which probably would work in programs with strong playout strategies. Of course this would be meaningless for bots that have weak (and already random) playout strategies. - Don On Thu, Aug 13, 2009 at 6:17 AM, Tapani Raiko pra...@cis.hut.fi wrote: I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the playouts are close to 50/50. Adjusting the komi would make black play greedily, while random passing during playouts would make black play safe (rich men don't pick fights). Tapani Raiko Christoph Birk wrote: I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Tapani Raiko, tapani.ra...@tkk.fi, +358 50 5225750 http://www.iki.fi/raiko/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
Modeling the opponents mistakes is indeed an alternative to introducing komi. But it would have to be a lot more exact than simply rolling the dice or skipping a move here and there. Successful opponent modeling would implement the overplay school of thought - playing tactically refutable combinations that are beyond the opponents skill to punish them. Introducing komi at the 50% win rate level would implement the honte school of thought - play as if against yourself. At a win rate of less than 50% it implements the almost honte school of thought. :-) I'm not trying to moralize. In love and go anything is fair. I'm just saying that while both approaches are legitimate, adjusting the komi is much easier to do. Different subject, suggestion for a komi adjustment scheme: 1. Make a regular evaluation(no extra komi) 2. If the win rate of the best move is within certain bounds you're done (Say between 30 and 70 percent.Just a guess ofcourse.Also, this might shift as the game progresses) 3. If not, make a komi adjustment dependant on how far out of bounds the win rate is. (No numerical suggestion here. Please experiment.) 4. Make a new search with this komi. 5. If the new result is in bounds calculate winrate_nokomi * factor + winrate_komi for each candidate and choose the highest one. (factor around 10 maybe) 6. If not, go back to 3 The idea is to choose a move that doesnt contradict the long term goal(no komi search) while trying for a short term goal(komi search) if no long term goal is available.( Or if every move satisfies the long term goal in case of taking handicap) Stefan - Original Message - From: Don Dailey To: tapani.ra...@tkk.fi ; computer-go Sent: Thursday, August 13, 2009 4:02 PM Subject: Re: [computer-go] Dynamic komi at high handicaps This idea makes much more sense to me than adjusting komi does.At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea is not to pass but to play some percentage of random moves - which probably would work in programs with strong playout strategies. Of course this would be meaningless for bots that have weak (and already random) playout strategies. - Don On Thu, Aug 13, 2009 at 6:17 AM, Tapani Raiko pra...@cis.hut.fi wrote: I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the playouts are close to 50/50. Adjusting the komi would make black play greedily, while random passing during playouts would make black play safe (rich men don't pick fights). Tapani Raiko Christoph Birk wrote: I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Tapani Raiko, tapani.ra...@tkk.fi, +358 50 5225750 http://www.iki.fi/raiko/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
There is one crude way to measure goal compatibility. See if you can make the same move work with different komi.If I'm on the east coast of the US traveling to the west coast, I will probably start off on the same road regardless of whether I'm going to Seattle or San Diego.If the same road does not work, then I'm facing a critical decision point. So it's probably safe to search for a move that works reasonable well with different komi. If you cannot do this, you probably have goals that are not compatible.But if you find a move that works well when the score is 50-50 (by manipulating the komi) then you should see if it's compatible with a tougher goal.This will at least be some evidence that you are looking at a common sense move and not a move that commits you to the wrong plan. But if you have a move that returns a really high score with one komi, but raising it up just a bit makes it drop to zero, you are in trouble with that move. Try to find a move that may not be quite as good in the first case, but is much better in the second case. Unfortunately, I don't think there is a simple way to implement this. Has anyone tried scoring where the total area was folded in to the main score, perhaps as much less signifant bits of the score?This makes winning big a secondary goal. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
2009/8/13 Stefan Kaitschick stefan.kaitsch...@hamburg.de Modeling the opponents mistakes is indeed an alternative to introducing komi. But it would have to be a lot more exact than simply rolling the dice or skipping a move here and there. Successful opponent modeling would implement the overplay school of thought - playing tactically refutable combinations that are beyond the opponents skill to punish them. I cannot believe you are being so technically precise about doing this correctly while advocating something on the other hand which is so obviously incorrect. You probably have something here though.I think the play-out policy is a more fruitful area to explore than dynamically changing komi. I would start simple, just trying the simplest approach first then gradually refining it. Random occasional pass moves is certainly easy to implement as a first step. - Don Introducing komi at the 50% win rate level would implement the honte school of thought - play as if against yourself. At a win rate of less than 50% it implements the almost honte school of thought. :-) I'm not trying to moralize. In love and go anything is fair. I'm just saying that while both approaches are legitimate, adjusting the komi is much easier to do. Different subject, suggestion for a komi adjustment scheme: 1. Make a regular evaluation(no extra komi) 2. If the win rate of the best move is within certain bounds you're done (Say between 30 and 70 percent.Just a guess ofcourse.Also, this might shift as the game progresses) 3. If not, make a komi adjustment dependant on how far out of bounds the win rate is. (No numerical suggestion here. Please experiment.) 4. Make a new search with this komi. 5. If the new result is in bounds calculate winrate_nokomi * factor + winrate_komi for each candidate and choose the highest one. (factor around 10 maybe) 6. If not, go back to 3 The idea is to choose a move that doesnt contradict the long term goal(no komi search) while trying for a short term goal(komi search) if no long term goal is available.( Or if every move satisfies the long term goal in case of taking handicap) Stefan - Original Message - *From:* Don Dailey dailey@gmail.com *To:* tapani.ra...@tkk.fi ; computer-go computer-go@computer-go.org *Sent:* Thursday, August 13, 2009 4:02 PM *Subject:* Re: [computer-go] Dynamic komi at high handicaps This idea makes much more sense to me than adjusting komi does.At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea is not to pass but to play some percentage of random moves - which probably would work in programs with strong playout strategies. Of course this would be meaningless for bots that have weak (and already random) playout strategies. - Don On Thu, Aug 13, 2009 at 6:17 AM, Tapani Raiko pra...@cis.hut.fi wrote: I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the playouts are close to 50/50. Adjusting the komi would make black play greedily, while random passing during playouts would make black play safe (rich men don't pick fights). Tapani Raiko Christoph Birk wrote: I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Tapani Raiko, tapani.ra...@tkk.fi, +358 50 5225750 http://www.iki.fi/raiko/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Dynamic komi at high handicaps
The dynamic komi is perhaps a misnomer; it's by accident that changing komi reflects something which we do want to measure, namely the predicted score. An algorithm which does not make use of the predicted score would not make use of all available information. On a 19x19 board, it is common for some areas to become settled; whether unconditionally alive, or ( more likely ) alive under the assumption of alternating play. Many moves trade the prospect of territory here versus there. Bad moves give up too much for too little. Good moves exploit bad or slack moves, and provide an equitable balance against good play. Terry McIntyre terrymcint...@yahoo.com “We hang the petty thieves and appoint the great ones to public office.” -- Aesop From: Don Dailey dailey@gmail.com To: computer-go computer-go@computer-go.org Sent: Thursday, August 13, 2009 9:27:11 AM Subject: Re: [computer-go] Dynamic komi at high handicaps 2009/8/13 Stefan Kaitschick stefan.kaitsch...@hamburg.de Modeling the opponents mistakes is indeed an alternative to introducing komi. But it would have to be a lot more exact than simply rolling the dice or skipping a move here and there. Successful opponent modeling would implement the overplay school of thought - playing tactically refutable combinations that are beyond the opponents skill to punish them. I cannot believe you are being so technically precise about doing this correctly while advocating something on the other hand which is so obviously incorrect. You probably have something here though.I think the play-out policy is a more fruitful area to explore than dynamically changing komi. I would start simple, just trying the simplest approach first then gradually refining it. Random occasional pass moves is certainly easy to implement as a first step. - Don Introducing komi at the 50% win rate level would implement the honte school of thought - play as if against yourself. At a win rate of less than 50% it implements the almost honte school of thought. :-) I'm not trying to moralize. In love and go anything is fair. I'm just saying that while both approaches are legitimate, adjusting the komi is much easier to do. Different subject, suggestion for a komi adjustment scheme: 1. Make a regular evaluation(no extra komi) 2. If the win rate of the best move is within certain bounds you're done (Say between 30 and 70 percent.Just a guess ofcourse.Also, this might shift as the game progresses) 3. If not, make a komi adjustment dependant on how far out of bounds the win rate is. (No numerical suggestion here. Please experiment.) 4. Make a new search with this komi. 5. If the new result is in bounds calculate winrate_nokomi * factor + winrate_komi for each candidate and choose the highest one. (factor around 10 maybe) 6. If not, go back to 3 The idea is to choose a move that doesnt contradict the long term goal(no komi search) while trying for a short term goal(komi search) if no long term goal is available.( Or if every move satisfies the long term goal in case of taking handicap) Stefan - Original Message - From: Don Dailey To: tapani.ra...@tkk.fi ; computer-go Sent: Thursday, August 13, 2009 4:02 PM Subject: Re: [computer-go] Dynamic komi at high handicaps This idea makes much more sense to me than adjusting komi does.At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea is not to pass but to play some percentage of random moves - which probably would work in programs with strong playout strategies. Of course this would be meaningless for bots that have weak (and already random) playout strategies. - Don On Thu, Aug 13, 2009 at 6:17 AM, Tapani Raiko pra...@cis.hut.fi wrote: I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the playouts are close to 50/50. Adjusting the komi would make black play greedily, while random passing during playouts would make black play safe (rich men don't pick fights). Tapani Raiko Christoph Birk wrote: I think you got it the wrong way round. Without dynamic komi (in high ha ndicap games) even trillions of simulations with _not_ find a move that creates a winning line, because the is none, if the opponet has the same strength as you. WHITE has to assume that BLACK will make mistakes, otherwise there would be no handicap. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Tapani Raiko,
Re: [computer-go] Dynamic komi at high handicaps
2009/8/13 terry mcintyre terrymcint...@yahoo.com The dynamic komi is perhaps a misnomer; it's by accident that changing komi reflects something which we do want to measure, namely the predicted score. An algorithm which does not make use of the predicted score would not make use of all available information. You imply that it's some kind of travesty not to make use of available information, but the information is only available because you did extra work to generate it. And even if the information was free, it matters that you use it correctly. Just implying that it's wrong not to do this because you are throwing away available information is not going to cut it. On a 19x19 board, it is common for some areas to become settled; whether unconditionally alive, or ( more likely ) alive under the assumption of alternating play. Many moves trade the prospect of territory here versus there. Bad moves give up too much for too little. Good moves exploit bad or slack moves, and provide an equitable balance against good play. I think you have describe very well why dynamic komi is so hard. You take ALL these factors and ignore them all except for a single number. You don't know anything about the composition of that number.If your concern is about throwing away infromation, then dynamic komi should be a big problem for you. - Don Terry McIntyre terrymcint...@yahoo.com “We hang the petty thieves and appoint the great ones to public office.” -- Aesop -- *From:* Don Dailey dailey@gmail.com *To:* computer-go computer-go@computer-go.org *Sent:* Thursday, August 13, 2009 9:27:11 AM *Subject:* Re: [computer-go] Dynamic komi at high handicaps 2009/8/13 Stefan Kaitschick stefan.kaitsch...@hamburg.de Modeling the opponents mistakes is indeed an alternative to introducing komi. But it would have to be a lot more exact than simply rolling the dice or skipping a move here and there. Successful opponent modeling would implement the overplay school of thought - playing tactically refutable combinations that are beyond the opponents skill to punish them. I cannot believe you are being so technically precise about doing this correctly while advocating something on the other hand which is so obviously incorrect. You probably have something here though.I think the play-out policy is a more fruitful area to explore than dynamically changing komi. I would start simple, just trying the simplest approach first then gradually refining it. Random occasional pass moves is certainly easy to implement as a first step. - Don Introducing komi at the 50% win rate level would implement the honte school of thought - play as if against yourself. At a win rate of less than 50% it implements the almost honte school of thought. :-) I'm not trying to moralize. In love and go anything is fair. I'm just saying that while both approaches are legitimate, adjusting the komi is much easier to do. Different subject, suggestion for a komi adjustment scheme: 1. Make a regular evaluation(no extra komi) 2. If the win rate of the best move is within certain bounds you're done (Say between 30 and 70 percent.Just a guess ofcourse.Also, this might shift as the game progresses) 3. If not, make a komi adjustment dependant on how far out of bounds the win rate is. (No numerical suggestion here. Please experiment.) 4. Make a new search with this komi. 5. If the new result is in bounds calculate winrate_nokomi * factor + winrate_komi for each candidate and choose the highest one. (factor around 10 maybe) 6. If not, go back to 3 The idea is to choose a move that doesnt contradict the long term goal(no komi search) while trying for a short term goal(komi search) if no long term goal is available.( Or if every move satisfies the long term goal in case of taking handicap) Stefan - Original Message - *From:* Don Dailey dailey@gmail.com *To:* tapani.ra...@tkk.fi ; computer-go computer-go@computer-go.org *Sent:* Thursday, August 13, 2009 4:02 PM *Subject:* Re: [computer-go] Dynamic komi at high handicaps This idea makes much more sense to me than adjusting komi does.At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea is not to pass but to play some percentage of random moves - which probably would work in programs with strong playout strategies. Of course this would be meaningless for bots that have weak (and already random) playout strategies. - Don On Thu, Aug 13, 2009 at 6:17 AM, Tapani Raiko pra...@cis.hut.fi wrote: I don't think the komi should be adjusted. Instead: Wouldn't random passing by black during the playouts model black making mistakes much more accurately? The number of random passes should be adjusted such that the
Re: [computer-go] Dynamic komi at high handicaps
I have never heard a pro say I estimate my chances of winning this game to be 50.3%, but you will hear black is ahead by 3 points or white wins by 1/2 point. -- they'll make this evaluation based on the alternation of equally competent play. Terry McIntyre terrymcint...@yahoo.com “We hang the petty thieves and appoint the great ones to public office.” -- Aesop From: Don Dailey dailey@gmail.com To: computer-go computer-go@computer-go.org Sent: Thursday, August 13, 2009 10:20:58 AM Subject: Re: [computer-go] Dynamic komi at high handicaps 2009/8/13 terry mcintyre terrymcint...@yahoo.com The dynamic komi is perhaps a misnomer; it's by accident that changing komi reflects something which we do want to measure, namely the predicted score. An algorithm which does not make use of the predicted score would not make use of all available information. You imply that it's some kind of travesty not to make use of available information, but the information is only available because you did extra work to generate it. And even if the information was free, it matters that you use it correctly. Just implying that it's wrong not to do this because you are throwing away available information is not going to cut it. On a 19x19 board, it is common for some areas to become settled; whether unconditionally alive, or ( more likely ) alive under the assumption of alternating play. Many moves trade the prospect of territory here versus there. Bad moves give up too much for too little. Good moves exploit bad or slack moves, and provide an equitable balance against good play. I think you have describe very well why dynamic komi is so hard. You take ALL these factors and ignore them all except for a single number. You don't know anything about the composition of that number.If your concern is about throwing away infromation, then dynamic komi should be a big problem for you. - Don Terry McIntyre terrymcint...@yahoo.com “We hang the petty thieves and appoint the great ones to public office.” -- Aesop From: Don Dailey dailey@gmail.com To: computer-go computer-go@computer-go.org Sent: Thursday, August 13, 2009 9:27:11 AM Subject: Re: [computer-go] Dynamic komi at high handicaps 2009/8/13 Stefan Kaitschick stefan.kaitsch...@hamburg.de Modeling the opponents mistakes is indeed an alternative to introducing komi. But it would have to be a lot more exact than simply rolling the dice or skipping a move here and there. Successful opponent modeling would implement the overplay school of thought - playing tactically refutable combinations that are beyond the opponents skill to punish them. I cannot believe you are being so technically precise about doing this correctly while advocating something on the other hand which is so obviously incorrect. You probably have something here though.I think the play-out policy is a more fruitful area to explore than dynamically changing komi. I would start simple, just trying the simplest approach first then gradually refining it. Random occasional pass moves is certainly easy to implement as a first step. - Don Introducing komi at the 50% win rate level would implement the honte school of thought - play as if against yourself. At a win rate of less than 50% it implements the almost honte school of thought. :-) I'm not trying to moralize. In love and go anything is fair. I'm just saying that while both approaches are legitimate, adjusting the komi is much easier to do. Different subject, suggestion for a komi adjustment scheme: 1. Make a regular evaluation(no extra komi) 2. If the win rate of the best move is within certain bounds you're done (Say between 30 and 70 percent.Just a guess ofcourse.Also, this might shift as the game progresses) 3. If not, make a komi adjustment dependant on how far out of bounds the win rate is. (No numerical suggestion here. Please experiment.) 4. Make a new search with this komi. 5. If the new result is in bounds calculate winrate_nokomi * factor + winrate_komi for each candidate and choose the highest one. (factor around 10 maybe) 6. If not, go back to 3 The idea is to choose a move that doesnt contradict the long term goal(no komi search) while trying for a short term goal(komi search) if no long term goal is available.( Or if every move satisfies the long term goal in case of taking handicap) Stefan - Original Message - From: Don Dailey To: tapani.ra...@tkk.fi ; computer-go Sent: Thursday, August 13, 2009 4:02 PM Subject: Re: [computer-go] Dynamic komi at high handicaps This idea makes much more sense to me than adjusting komi does.At least it's an attempt at opponent modeling, which is the actual problem that should be addressed. Whether it will actually work is something that could be tested. Another similar idea
Re: [computer-go] Monte-Carlo Simulation Balancing
. Pebbles has a Mogo playout design, where you check for patterns only around the last move (or two). In MoGo, it's not only around the last move (at least with some probability and when there are empty spaces in the board); this is the fill board modification. (this provides a big improvement in 19x19 with big numbers of simulations, see http://www.lri.fr/~rimmel/publi/EK_explo.pdf , Fig 3 page 8 - not only quantitative improvement, but also, according to players, a qualitative improvement in the way mogo plays) Best regards, Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Monte-Carlo Simulation Balancing
. Pebbles has a Mogo playout design, where you check for patterns only around the last move (or two). In MoGo, it's not only around the last move (at least with some probability and when there are empty spaces in the board); this is the fill board modification. Just to clarify: I was not saying that Mogo's policy consisted *solely* of looking for patterns around the last move. Merely that it does not look for patterns around *every* point, which other playout policies (e.g., CrazyStone, if I understand Remi's papers correctly) appear to do. The RL paper seems to require that playout design. If you want to prioritize patterns around every point then you need a more sophisticated board representation than Pebbles uses. BTW, FillBoard seems to help Pebbles, too. A few percent better on 9x9 games. No testing on larger boards. YMMV, and like everything about computer go: all predictions are guaranteed to be wrong, or your money back. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte-Carlo Simulation Balancing
Just to clarify: I was not saying that Mogo's policy consisted *solely* of looking for patterns around the last move. Merely that it does not look for patterns around *every* point, which other playout policies (e.g., CrazyStone, if I understand Remi's papers correctly) appear to do. The RL paper seems to require that playout design. Fine! BTW, FillBoard seems to help Pebbles, too. A few percent better on 9x9 games. No testing on larger boards. YMMV, and like everything about computer go: all predictions are guaranteed to be wrong, or your money back. For us the improvement is essential in 19x19 - I'll find that for the generality of fillboard if it helps also for you :-) the loss of diversity due to patterns is really clear in some situations, so the problem solved by fillboard is understandable, so I believe it should work also for you - but, as you say, all predictions in computer-go are almost guaranteed to be wrong :-) Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Heavier playouts
A couple of weeks ago I made the playouts slightly heavier by adding a few 2-liberty local rules. It made a big difference in the program's strength (from strong 3 kyu to weak 1 kyu). www.gokgs.com/servlet/graph/ManyFaces-en_US.png David ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Heavier playouts
David Fotland wrote: made the playouts slightly heavier by adding a few 2-liberty local rules. What does heavier mean here and could you please give an example of such a rule? Do you have an understanding why they make your program stronger? -- robert jasiek ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Heavier playouts
Heavier means more analysis in the playouts about what move to make - less pure random. I don't understand why its stronger, but I'm happy with the result. Playouts are pretty much try something and test it. David -Original Message- From: computer-go-boun...@computer-go.org [mailto:computer-go- boun...@computer-go.org] On Behalf Of Robert Jasiek Sent: Thursday, August 13, 2009 1:08 PM To: computer-go Subject: Re: [computer-go] Heavier playouts David Fotland wrote: made the playouts slightly heavier by adding a few 2-liberty local rules. What does heavier mean here and could you please give an example of such a rule? Do you have an understanding why they make your program stronger? -- robert jasiek ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Heavier playouts
David Fotland: 091c01ca1c4f$9dea69e0$d9bf3d...@com: A couple of weeks ago I made the playouts slightly heavier by adding a few 2-liberty local rules. It made a big difference in the program's strength (from strong 3 kyu to weak 1 kyu). www.gokgs.com/servlet/graph/ManyFaces-en_US.png Is this URL correct? I can't see that picture (IE8.0/WindowsXP). Is that the rank graph of MFG? Hideki -- g...@nue.ci.i.u-tokyo.ac.jp (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Re: Heavier playouts
Works for me. It's the rank graph. You can also get it on KGS, user info for ManyFaces -Original Message- From: computer-go-boun...@computer-go.org [mailto:computer-go- boun...@computer-go.org] On Behalf Of Hideki Kato Sent: Thursday, August 13, 2009 9:42 PM To: computer-go Subject: [computer-go] Re: Heavier playouts David Fotland: 091c01ca1c4f$9dea69e0$d9bf3d...@com: A couple of weeks ago I made the playouts slightly heavier by adding a few 2-liberty local rules. It made a big difference in the program's strength (from strong 3 kyu to weak 1 kyu). www.gokgs.com/servlet/graph/ManyFaces-en_US.png Is this URL correct? I can't see that picture (IE8.0/WindowsXP). Is that the rank graph of MFG? Hideki -- g...@nue.ci.i.u-tokyo.ac.jp (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/