Re: [computer-go] The effect of the UCT-constant on Valkyria
David Fotland wrote: So I'm curious then. With simple UCT (no rave, no priors, no progressive widening), many people said the best constant was about 0.45. What are the new concepts that let you avoid the constant? Actually it's closer to 0.46. Just kidding, I have no idea. But great questions. Looking forward to the answers. Is it RAVE, because the information gathered during the search lets you focus the search accurately without the UCT term? Many people have said that RAVE has no benefit for them. Do most of the strongest programs use RAVE? I think from Crazystone's papers, that it does not use RAVE. Gnugomc does not use rave. Is it the prior values from go knowledge, like opening books, reading tactics before the search etc? Do all of the top programs have opening books now? I know mogo does. Do most of the top programs read tactics before the search? I know Aya does. Does it matter how prior values are used to guide the search? I think mogo uses prior knowledge to initialize the RAVE values. Do other programs include it some other way, by initializing the FPU value, or by initializing the UCT visits and confidence, or some extra, prior term in the equation? Are there other techniques (not RAVE) that people are using to get information from the search to guide the move ordering? I think crazystone estimates ownership of each point and uses it to set prior values in some way. Regards, David ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The effect of the UCT-constant on Valkyria
David Fotland wrote: So I'm curious then. With simple UCT (no rave, no priors, no progressive widening), many people said the best constant was about 0.45. What are the new concepts that let you avoid the constant? Whatever concepts are used it must indirectly be a question of improved move ordering. The better the move ordering, the smaller the need to do exploration. Is it RAVE, because the information gathered during the search lets you focus the search accurately without the UCT term? Many people have said that RAVE has no benefit for them. Do most of the strongest programs use RAVE? I think from Crazystone's papers, that it does not use RAVE. Gnugomc does not use rave. I've never had success with RAVE but I might make a new attempt for GNU Go some time. Is it the prior values from go knowledge, like opening books, reading tactics before the search etc? Do all of the top programs have opening books now? I know mogo does. The MonteGNU account on CGOS (9x9) has a self-learnt opening book with currently slightly more than 16000 moves. Over the last 1000 games it has played on average 4 moves (own moves that is, opponent moves not counted) from the book. The record is 22 moves from book. Do most of the top programs read tactics before the search? I know Aya does. GNU Go in Monte Carlo mode reads lots of tactics before the MC search. But it doesn't use the tactics for the MC search. :-/ Does it matter how prior values are used to guide the search? I think mogo uses prior knowledge to initialize the RAVE values. Do other programs include it some other way, by initializing the FPU value, or by initializing the UCT visits and confidence, or some extra, prior term in the equation? Are there other techniques (not RAVE) that people are using to get information from the search to guide the move ordering? I think crazystone estimates ownership of each point and uses it to set prior values in some way. GNU Go uses a global move ordering shared by all nodes in the tree and initialized from the results of the normal move generation. /Gunnar ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] The effect of the UCT-constant on Valkyria
I have already posted the following results. The results shows the winrates of Valkyria 3.2.0 against gnugo at default strength. 512 Simulations per move UCT_K Winrate SERR 0 58.82.1 (Winrate only) 0.0156.82.2 0.1 60.92.2 0.5 54.22.2 1 50.62.2 With 512 simulations there is not much work done in the tree. So I extend the test to 2048 simulations and also added the parameter value 2 to see what happens when search get really wide. 2048 Simulations per move UCT_K Winrate SERR 0 80.72.3 (Winrate only) 0.0183.32.2 0.1 83.72.1 0.5 77.32.4 1 71.33 2 62 4.9 The number of games are 300 for parameters 0 to 0.5 and a little less for parameter values 1 and 2 The results confirm that Valkyria still benefits from using confidence bounds with UCT, although the effect is really small. Also the effect of the constant might be a little greater with 2048 simulations rather than for 512. Still the curves look more or less the same. Does anyone have experience doing a test with different amounts of simulations where the best parameter value depend on the number of simulations? I prefer to use a low amount of simulations since it is simply faster, and also if the winrate of Valkyria gets to close to 100% it becomes harder to measure the effect of different parameter settings. Maybe I should quit testing against gnugo, and try something stronger. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The effect of the UCT-constant on Valkyria
The results confirm that Valkyria still benefits from using confidence bounds with UCT, although the effect is really small. The standard deviation is a bit large for concluding. I'll try to get similar numbers for mogo. For the moment everything leads to 0 as the best constant, but perhaps it will be different with larger numbers of sims/second. Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] The effect of the UCT-constant on Valkyria
So I'm curious then. With simple UCT (no rave, no priors, no progressive widening), many people said the best constant was about 0.45. What are the new concepts that let you avoid the constant? Is it RAVE, because the information gathered during the search lets you focus the search accurately without the UCT term? Many people have said that RAVE has no benefit for them. Do most of the strongest programs use RAVE? I think from Crazystone's papers, that it does not use RAVE. Gnugomc does not use rave. Is it the prior values from go knowledge, like opening books, reading tactics before the search etc? Do all of the top programs have opening books now? I know mogo does. Do most of the top programs read tactics before the search? I know Aya does. Does it matter how prior values are used to guide the search? I think mogo uses prior knowledge to initialize the RAVE values. Do other programs include it some other way, by initializing the FPU value, or by initializing the UCT visits and confidence, or some extra, prior term in the equation? Are there other techniques (not RAVE) that people are using to get information from the search to guide the move ordering? I think crazystone estimates ownership of each point and uses it to set prior values in some way. Regards, David -Original Message- From: [EMAIL PROTECTED] [mailto:computer-go- [EMAIL PROTECTED] On Behalf Of Olivier Teytaud Sent: Saturday, May 03, 2008 3:10 AM To: computer-go Subject: Re: [computer-go] The effect of the UCT-constant on Valkyria The results confirm that Valkyria still benefits from using confidence bounds with UCT, although the effect is really small. The standard deviation is a bit large for concluding. I'll try to get similar numbers for mogo. For the moment everything leads to 0 as the best constant, but perhaps it will be different with larger numbers of sims/second. Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] The effect of the UCT-constant on Valkyria
Quoting David Fotland [EMAIL PROTECTED]: So I'm curious then. With simple UCT (no rave, no priors, no progressive widening), many people said the best constant was about 0.45. What are the new concepts that let you avoid the constant? Is it RAVE, because the information gathered during the search lets you focus the search accurately without the UCT term? Many people have said that RAVE has no benefit for them. Yes, it is RAVE, and mor specifil as it was last presented here recently in the mailing list by the Mogo team, and not how it is was originally presented in the mogo paper. Also there may be several minor details that are peculiar to my implementation. Actually I did not understand some aspects of the Mogo method mailed here and just guessed some details. It suddenly worked and I could feel that the search was unusually strong and selective, and since then I just adjusted some parameters. I used to do progressive widening but that is now turned off. RAVE is free to pick any move that is not pruned right away. Currently I believe that RAVE is only effective if one gets other parameters right. For me it meant changing the uct parameter from 0.8 into 0.1. I also know of many pathological situations where Valkyria currently will not find the best move, but rather the second best. It is possible that other programs suffers even more than Valkyria from similar problems and that this to some extent has to do with that the nature of the playouts may interfere with AMAF. For example V either plays forced moves or uniformly random among moves that are not pruned. Other programs may rely on patterns to pick all moves in the playouts and this might be bad for AMAF (this is a wild speculation). Do most of the strongest programs use RAVE? I think from Crazystone's papers, that it does not use RAVE. Gnugomc does not use rave. You might not need it if you have strong pattern matching priors for the tree part similar to Crazystone. RAVE makes it possible to ignore most bad moves in a given positions. The weakness is that often some good (with a chance of being the best possible move) are also ignored completely. Is it the prior values from go knowledge, like opening books, reading tactics before the search etc? Do all of the top programs have opening books now? I know mogo does. Valkyria has just 4 moves in a hardcoded openingbook. Previous versions used a book with several 1000's of positions that was both self learned and modified by hand, but as long as the program changes the book tend become inaccurate, so right now I do not use it and is planning to write something more efficient than the old one which kept each position as file on the harddrive. Do most of the top programs read tactics before the search? I know Aya does. Valkyria only does some simple tactics in the playouts. It is stronger than anything I ever programmed (on 9x9 at least) so currently I cannot see how to integrate precomputed tactical results in the later search. I think Aya is special because it was very strong doing search before it went MC. Does it matter how prior values are used to guide the search? I think mogo uses prior knowledge to initialize the RAVE values. Do other programs include it some other way, by initializing the FPU value, or by initializing the UCT visits and confidence, or some extra, prior term in the equation? Right know Valkyria sets priors for AMAF so that moves that are a good local response to the last move have a prior 100% winrate with 20-100 visits depending on the priority of the triggered pattern. I think Mogo has a fixed number of visits for the priorities but modifies the winrate, but I never saw this described in a way that made it clear. Previously I biased the UCT values after everyting else was computed but found that this led to some bad behavior. By biasing the AMAF values these biases will get less influential as the true winrate has more weight than the AMAF-scores. Are there other techniques (not RAVE) that people are using to get information from the search to guide the move ordering? I think crazystone estimates ownership of each point and uses it to set prior values in some way. I used to do that long time ago in Viking (the precursor to Valkyria) that used alphabeta + MC-eval. As I remember it then it had a great impact on move ordering that was quite bad (or even nonexistent) for Viking. I have tried it in Valkyria but was never able to see an improvement. But I did not try hard enough to tell for sure. Both ownership and AMAF use the same information (playouts), so trying to use it twice is perhaps partially a waste of effort. -Magnus ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/