Re: [computer-go] Optiizing combinations of flags
Brian Sheppard wrote: In this strategy, one chooses a random number p, and then select the strategy with highest historical mean if p epsilon, and the strategy taken least often otherwise. If epsilon = C*log(n)/n, where n is the number of experiments so far, then the strategy has zero asymptotic regret. Can you explain this in more detail? From what I understand, for each parameter you take with some high probability the best so far, and with some lower probability the least tried one. This requires (manually) enumerating all parameters on some integer scale, if I got it correctly. I'm interested in the following general problem: Given a set of parameters each with a range of values, and a set of game results between different parameter sets (or a score average for each set versus a reference), how to determine: a) the most likely optimal set of parameters b) the most interesting parameter set to test next Remi Coulom has done some work in this area: http://remi.coulom.free.fr/QLR/ It sounds very interesting (v-optimal sampling). But I don't understand it enough to implement it. Your idea sounds simpler, but the enumeration would be a problem, for parameters with wide ranges where we don't know where to start. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optiizing combinations of flags
Gian-Carlo Pascutto wrote: Remi Coulom has done some work in this area: http://remi.coulom.free.fr/QLR/ It sounds very interesting (v-optimal sampling). But I don't understand it enough to implement it. Your idea sounds simpler, but the enumeration would be a problem, for parameters with wide ranges where we don't know where to start. I am working on a tool for automatic parameter optimization. Right now it is in a very experimental state. You can find it there: http://remi.coulom.free.fr/QLR/QLR.tar.bz2 Right now only BAST is a practical method to optimize parameters with my tool. It is UCT over a recursive binary partitioning of the parameter space. It does not work well for high-dimensional optimization. Even for 1-dimensional problems, I expect QLR will outperform it significantly. If you have access to Springer publications, that paper by Ken Chen describes another method: http://www.springerlink.com/content/759660u175x64663/ It combines UCB ideas with a complex genetic algorithm. It seems very complicated for my taste. It is probably more convenient to wait a little that I improve my parameter-optimization system, and you'll have a ready-made tool. My tool also allows to easily implement new parameter-optimization algorithms, test them on artificial problems, before applying them to a Go program. Right now, only binary outcomes are supported. I will add win/loss/draw results later. Rémi ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Optiizing combinations of flags
On that topic, I have around 17 flag who enable or not features in my pure playouts bots, and I want to search the best combinations of them. I known this is almost a dream but does anyone know the best way to approximate this. Pebbles randomly chooses (using a zero asymptotic regret strategy) parameter values before each game. I literally never manually tune parameters for Pebbles. I just set up experiments, and put them on a parameter for my optimizer to manage. After a few hundred games it is clear what the right choices are. My favorite exploration strategy is a declining epsilon greedy strategy. I like it because it is a randomized strategy, so I can optimize all parameters concurrently using a single stream of games. In this strategy, one chooses a random number p, and then select the strategy with highest historical mean if p epsilon, and the strategy taken least often otherwise. If epsilon = C*log(n)/n, where n is the number of experiments so far, then the strategy has zero asymptotic regret. Pebbles has about 50 parameters right now. Most are pretty settled because they have thousands of games of experience. All are potentially modified before each game. Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/