Re: [computer-go] Optimizing combinations of flags
That doesn't seem to directly support deriving information from random trials. For computer go tuning, would you play multiple games with each parameter set in order to get a meaningful figure? That seems likely to be less efficient than treating it as a bandit problem. you'd decide how many experiments you wanted to run, take a stab at what you thought the interactions between parameters were (i.e. independent between #2 and #3, and no worse than quadratic between #1 and #4), generate an optimal design, run enough experiments at each setting of the parameters (as specified by the design) to keep error low, then fit the specified model with the resulting data. the way to consider the model is that you want to model winrate versus (whomever -- self-play, on kgs, whatever you want), and you have some idea about interactions between parameters (i think that this cutoff is only appropriate between these two ranges, and have reason to believe has only a very weak interaction with this other parameter, whereas these 12 parameters might all be related in some horrible quadratic way), but you don't want to blindly run thousands of random tests. you'd rather run thousands of tests that were specifically designed to maximize the amount of information that you can get about the model that you're trying to fit. alternatively, it does sphere packing over the direct product of open or closed (but bounded) intervals and discrete sets, so you can get a set of points that is slightly better than a random set of experiments (i.e. guaranteed to cover the space well). s. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
sorry to self-reply, but: alternatively, it does sphere packing over the direct product of open or closed (but bounded) intervals and discrete sets, so you can get a set of points that is slightly better than a random set of experiments (i.e. guaranteed to cover the space well). arguably it is not slightly better, but much, much better. especially when the dimensionality (number of parameters) is high. s. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
A few months ago there was a post in the computer chess forums about optimizing combinations of features. It was called orthogonal multi-testing. Did I mention that on this forum already? If not, here is a brief on how it works: Suppose you have 1 feature you want to test - you might normally play 2 versions of the program a 1,000 game match for instance, one version with the feature on and one with the feature off. However, if you have 6 features, there are 64 combinations.What you can do is play every combination of feature in a round robin tournament. For every feature, half the games played have that feature on, and the other half have this feature off. So you can get a separate score for each feature by considering only the games played between opponents with this feature ON and OFF. This idea of course assumes that each feature is independent (there is no interaction between features.) This is probably not true but it's amazing how often it works despite this and we find that methods such as naive bayes classifiers work surprisingly well even when the attributes of the thing being classified is not independent. It reminds me a lot of all-moves-as-first, which assumes the order of the moves is not relevant but is a wonderful cheat because you multiply your sample size by an order of magnitude.So you can play a few thousand games with a huge number of features and get a huge amount of data by reusing the same samples for different things. You could of course just play games where you choose each player randomly. If you have 256 feature you have a ridiculous number of combinations, more than you could possibly test but before each test game you just pick a combination of features randomly for each player.In this way about 1/4 of the games will be relevant for each feature, regardless of how many features you are testing.(1/2 will have the feature on and half of those games will be against opponents who have the feature off.) Even if the lack of independence bothers you, one might use this as an approximation or a way to get a little closer to the appropriate feature set, in the same way all-moves-as-first nicely approximates the value of much larger samples. On Tue, Nov 24, 2009 at 11:35 PM, Brian Sheppard sheppar...@aol.com wrote: What do you do when you add a new parameter? Do you retain your existing 'history', considering each game to have been played with the value of the new parameter set to zero? Yes, exactly. If you have 50 parameters already, doesn't adding a new parameter create a rather large number of new parameter sets, most of which there will never be time to investigate? Yes. So the new parameter will drift to its optimal value against the existing parameter values. But here's the thing: declining epsilon greedy policies are zero regret starting from any initial state. So if the setting of the new parameter affects old parameter settings, then the established parameters will start to move as well. If the objective function is a convex function of the parameters (which is generally the case, based on the curves that I have seen) then the whole system will drift to a global optimum. I have been using UCB and UCT to tune engine settings, but I don't think these methods work well to tune more than a handful of parameters at the same time. Such systems have trouble because their exploration is a *deterministic* function of the sequence of wins. That is, all parameters will lock into the same set of feedback. If you use UCT, then you have to optimize *combinations* of parameters, which is unwieldy. Declining epsilon greedy is a randomized exploration strategy, but still zero-regret. Now the same sequence of wins/losses can be used to tune all parameters concurrently. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
the way to do all of this exactly is with experimental design. to design experiments correctly that handle inter-term interactions of moderate degree, this tool is quite useful: http://www2.research.att.com/~njas/gosset/index.html s. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
I know there are heuristics for trying to understand the interactions and without looking too hard I assume this package is just a more comprehensive version of this. On Wed, Nov 25, 2009 at 9:11 AM, steve uurtamo uurt...@gmail.com wrote: the way to do all of this exactly is with experimental design. to design experiments correctly that handle inter-term interactions of moderate degree, this tool is quite useful: http://www2.research.att.com/~njas/gosset/index.htmlhttp://www2.research.att.com/%7Enjas/gosset/index.html s. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
On Wed, Nov 25, 2009 at 09:01:22AM -0500, Don Dailey wrote: You could of course just play games where you choose each player randomly. If you have 256 feature you have a ridiculous number of combinations, more than you could possibly test but before each test game you just pick a combination of features randomly for each player.In this way about 1/4 of the games will be relevant for each feature, regardless of how many features you are testing.(1/2 will have the feature on and half of those games will be against opponents who have the feature off.) Wouldn't it be more effective to choose one player randomly, and make the other a mirror image of it? That way, every game would give some information of the relevance of one setting. At least in the very beginning... -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
On Wed, Nov 25, 2009 at 10:44 AM, Heikki Levanto hei...@lsd.dk wrote: On Wed, Nov 25, 2009 at 09:01:22AM -0500, Don Dailey wrote: You could of course just play games where you choose each player randomly. If you have 256 feature you have a ridiculous number of combinations, more than you could possibly test but before each test game you just pick a combination of features randomly for each player.In this way about 1/4 of the games will be relevant for each feature, regardless of how many features you are testing.(1/2 will have the feature on and half of those games will be against opponents who have the feature off.) Wouldn't it be more effective to choose one player randomly, and make the other a mirror image of it? That way, every game would give some information of the relevance of one setting. At least in the very beginning... That would not be effective at all. With 256 features you are (for all practical purposes) never going to see that exact combination of features again. In very general terms, you are probably going to be more interested in how a few terms interact than how many interact. Of course this method only tries to understand each feature independently, but I think this has some validity. I don't claim it will cover all the bases however but it might be a good place to start. Of course there is nothing that prevents you from observing from the data the interaction of any specific combination of features, but the amount of data you will get for specific combinations of feature is going to be drastically reduced with the number of features you wish to look at. I will assert that looking at each feature independently is half the battle, and looking at combinations of 2 features is going to cover a higher percentage of the remaining cases, and so on. And if you find interesting interactions, you can COMBINE them in a separate test - by basically considering feature x and y together as a single compound feature. You would only do this when you have a high degree of confidence that these 2 features definitely have some kind of synergy. You might even do this with features that have a negative interaction, perhaps taking care that they are never tested together. - Don -- Heikki Levanto In Murphy We Turst heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
steve uurtamo wrote: the way to do all of this exactly is with experimental design. to design experiments correctly that handle inter-term interactions of moderate degree, this tool is quite useful: http://www2.research.att.com/~njas/gosset/index.html That doesn't seem to directly support deriving information from random trials. For computer go tuning, would you play multiple games with each parameter set in order to get a meaningful figure? That seems likely to be less efficient than treating it as a bandit problem. -M- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
On Wed, Nov 25, 2009 at 2:00 PM, Matthew Woodcraft matt...@woodcraft.me.ukwrote: steve uurtamo wrote: the way to do all of this exactly is with experimental design. to design experiments correctly that handle inter-term interactions of moderate degree, this tool is quite useful: http://www2.research.att.com/~njas/gosset/index.htmlhttp://www2.research.att.com/%7Enjas/gosset/index.html That doesn't seem to directly support deriving information from random trials. For computer go tuning, would you play multiple games with each parameter set in order to get a meaningful figure? That seems likely to be less efficient than treating it as a bandit problem. This does not replace bandit, it's a way to tune parameters. You might have 50 parameters and so you play a few thousand games using random combinations of these parameters for instance. And then you have data based on the win/loss records of the different parameters and use this to settle on a good set of parameters to be used. - Don -M- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
Brian Sheppard wrote: I think that I am assuming only that the objective function is convex. The parameters in Go programs are always inter-dependent. What do you do when you add a new parameter? Do you retain your existing 'history', considering each game to have been played with the value of the new parameter set to zero? If you have 50 parameters already, doesn't adding a new parameter create a rather large number of new parameter sets, most of which there will never be time to investigate? I have been using UCB and UCT to tune engine settings, but I don't think these methods work well to tune more than a handful of parameters at the same time. -M- ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Optimizing combinations of flags
What do you do when you add a new parameter? Do you retain your existing 'history', considering each game to have been played with the value of the new parameter set to zero? Yes, exactly. If you have 50 parameters already, doesn't adding a new parameter create a rather large number of new parameter sets, most of which there will never be time to investigate? Yes. So the new parameter will drift to its optimal value against the existing parameter values. But here's the thing: declining epsilon greedy policies are zero regret starting from any initial state. So if the setting of the new parameter affects old parameter settings, then the established parameters will start to move as well. If the objective function is a convex function of the parameters (which is generally the case, based on the curves that I have seen) then the whole system will drift to a global optimum. I have been using UCB and UCT to tune engine settings, but I don't think these methods work well to tune more than a handful of parameters at the same time. Such systems have trouble because their exploration is a *deterministic* function of the sequence of wins. That is, all parameters will lock into the same set of feedback. If you use UCT, then you have to optimize *combinations* of parameters, which is unwieldy. Declining epsilon greedy is a randomized exploration strategy, but still zero-regret. Now the same sequence of wins/losses can be used to tune all parameters concurrently. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Optimizing combinations of flags
From what I understand, for each parameter you take with some high probability the best so far, and with some lower probability the least tried one. This requires (manually) enumerating all parameters on some integer scale, if I got it correctly. Yes, for each parameter you make a range of values that you believe brackets the optimal value. Then the optimizer takes over and (eventually) concentrates attention on the value that works best. For instance, if your program includes a parameter for the number of RAVE wins to initially assign to an atari move, then you might test 0, 1, 2, ... 10. (Note that parameters do not have to be integers. Just an integer number of choices.) Note that a value of 0 sometimes turns off a feature, in effect. When a parameter is turned off I will put the entire code within an if-statement, to approximate the CPU gain from omitting the code entirely: if (theParameter != 0) { evaluation += theParameter * someFunction(); } For the domain of Go, I typically find a broad range of near-optimal values. In fact, there is often insignificant variations over the entire range: perhaps a 1% difference. But I do see 10% gains for certain features, and I never know what will be important. I have tested systems that automatically create new values by interpolation. But I haven't seen any benefit. Given a set of parameters each with a range of values, and a set of game results between different parameter sets (or a score average for each set versus a reference), how to determine: a) the most likely optimal set of parameters b) the most interesting parameter set to test next The system above will choose the most interesting parameter set to test next. Now, whether the policy is zero-regret is more complicated. In the case of a single parameter, the method is zero regret. If you want to automatically introduce new values to test, then you can make a zero-regret policy if the objective function is bounded and continuous. But the more interesting case is where you have multiple parameters. In that case, I think that my policy will find the global optimum if the objective function is convex. Convexity is not guaranteed in our domain, but is often the case. That is, the benefit is small if a parameter is either too small or too large, and the optimum is somewhere in between. I could be wrong about some of these statements, because I haven't paused to try to prove anything from a mathematical perspective. Instead, I am convinced by some experimental observation of some pragmatic outcomes: 1) does the optimizer tweak parameters towards optimality? (yes) 2) does the optimizer work without needing my attention? (yes) 3) if I change the program in other ways, will it adapt? (yes) 4) if I code a bad feature, will the optimizer stop using it? (yes) All together, this is not bad value. I should mention one tweak, relating to the fact that your program might sometimes face strong players, and other times weak players. I keep a record of average rating for the opponents faced by each parameter value. Then when I explore I can choose either the least-taken parameter value or the parameter value whose average opponent would be most affected by playing this game. This policy keeps average opponent ratings in line. Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimizing combinations of flags
Your system seems very interesting but it seems to me that you assume that each parameters are independant. What happen if, for example, two parameters works well when only one of the is active and badly if the two are actives at the same time ? Tom -- Thomas LavergneEntia non sunt multiplicanda praeter necessitatem. (Guillaume d'Ockham) thomas.laver...@reveurs.orghttp://oniros.org ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Optimizing combinations of flags
Your system seems very interesting but it seems to me that you assume that each parameters are independant. What happen if, for example, two parameters works well when only one of the is active and badly if the two are actives at the same time ? I think that I am assuming only that the objective function is convex. The parameters in Go programs are always inter-dependent. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/