Re: [computer-go] Optimizing combinations of flags

2009-11-26 Thread steve uurtamo
 That doesn't seem to directly support deriving information from random
 trials. For computer go tuning, would you play multiple games with each
 parameter set in order to get a meaningful figure? That seems likely to
 be less efficient than treating it as a bandit problem.

you'd decide how many experiments you wanted to run, take a stab at
what you thought the interactions between parameters were (i.e.
independent between #2 and #3, and no worse than quadratic between #1
and #4), generate an optimal design, run enough experiments at each
setting of the parameters (as specified by the design) to keep error
low, then fit the specified model with the resulting data.

the way to consider the model is that you want to model winrate versus
(whomever -- self-play, on kgs, whatever you want), and you have some
idea about interactions between parameters (i think that this cutoff
is only appropriate between these two ranges, and have reason to
believe has only a very weak interaction with this other parameter,
whereas these 12 parameters might all be related in some horrible
quadratic way), but you don't want to blindly run thousands of random
tests.  you'd rather run thousands of tests that were specifically
designed to maximize the amount of information that you can get about
the model that you're trying to fit.

alternatively, it does sphere packing over the direct product of open
or closed (but bounded) intervals and discrete sets, so you can get a
set of points that is slightly better than a random set of experiments
(i.e. guaranteed to cover the space well).

s.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimizing combinations of flags

2009-11-26 Thread steve uurtamo
sorry to self-reply, but:

 alternatively, it does sphere packing over the direct product of open
 or closed (but bounded) intervals and discrete sets, so you can get a
 set of points that is slightly better than a random set of experiments
 (i.e. guaranteed to cover the space well).

arguably it is not slightly better, but much, much better.  especially
when the dimensionality (number of parameters) is high.

s.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread Don Dailey
A few months ago there was a post in the computer chess forums about
optimizing combinations of features. It was called orthogonal
multi-testing.

Did I mention that on this forum already?   If not, here is a brief on how
it works:

Suppose you have 1 feature you want to test - you might normally play 2
versions of the program a 1,000 game match for instance,  one version with
the feature on and one with the feature off.

However,  if you have 6 features,  there are 64 combinations.What you
can do is play every combination of feature in a round robin tournament.
For every feature,  half the games played have that feature on, and the
other half have this feature off.   So you can get a separate score for each
feature by considering only the games played between opponents with this
feature ON and OFF.

This idea of course assumes that each feature is independent (there is no
interaction between features.) This is probably not true but it's
amazing how often it works despite this and we find that methods such as
naive bayes classifiers work surprisingly well even when the attributes of
the thing being classified is not independent.

It reminds me a lot of all-moves-as-first, which assumes the order of the
moves is not relevant but is a wonderful cheat because you multiply your
sample size by an order of magnitude.So you can play a few thousand
games with a huge number of features and get a huge amount of data by
reusing the same samples for different things.

You could of course just play games where you choose each player randomly.
If you have 256 feature you have a ridiculous number of combinations, more
than you could possibly test but before each test game you just pick a
combination of features randomly for each player.In this way about 1/4
of the games will be relevant for each feature, regardless of how many
features you are testing.(1/2 will have the feature on and half of those
games will be against opponents who have the feature off.)

Even if the lack of independence bothers you,  one might use this as an
approximation or a way to get a little closer to the appropriate feature
set,   in the same way all-moves-as-first nicely approximates the value of
much larger samples.


On Tue, Nov 24, 2009 at 11:35 PM, Brian Sheppard sheppar...@aol.com wrote:

 What do you do when you add a new parameter? Do you retain your existing
 'history', considering each game to have been played with the value of
 the new parameter set to zero?

 Yes, exactly.

 If you have 50 parameters already, doesn't adding a new parameter create
 a rather large number of new parameter sets, most of which there will
 never be time to investigate?

 Yes. So the new parameter will drift to its optimal value against the
 existing parameter values.

 But here's the thing: declining epsilon greedy policies are zero regret
 starting from any initial state. So if the setting of the new parameter
 affects old parameter settings, then the established parameters will start
 to move as well.

 If the objective function is a convex function of the parameters (which is
 generally the case, based on the curves that I have seen) then the whole
 system will drift to a global optimum.

 I have been using UCB and UCT to tune engine settings, but I don't think
 these methods work well to tune more than a handful of parameters at the
 same time.

 Such systems have trouble because their exploration is a *deterministic*
 function of the sequence of wins. That is, all parameters will lock into
 the
 same set of feedback. If you use UCT, then you have to optimize
 *combinations* of parameters, which is unwieldy.

 Declining epsilon greedy is a randomized exploration strategy, but still
 zero-regret. Now the same sequence of wins/losses can be used to tune all
 parameters concurrently.


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread steve uurtamo
the way to do all of this exactly is with experimental design.

to design experiments correctly that handle inter-term interactions of
moderate degree, this tool is quite useful:

http://www2.research.att.com/~njas/gosset/index.html

s.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread Don Dailey
I know there are heuristics for trying to understand the interactions and
without looking too hard I assume this package is just a more comprehensive
version of this.


On Wed, Nov 25, 2009 at 9:11 AM, steve uurtamo uurt...@gmail.com wrote:

 the way to do all of this exactly is with experimental design.

 to design experiments correctly that handle inter-term interactions of
 moderate degree, this tool is quite useful:

 http://www2.research.att.com/~njas/gosset/index.htmlhttp://www2.research.att.com/%7Enjas/gosset/index.html

 s.
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread Heikki Levanto
On Wed, Nov 25, 2009 at 09:01:22AM -0500, Don Dailey wrote:
 You could of course just play games where you choose each player randomly.
 If you have 256 feature you have a ridiculous number of combinations, more
 than you could possibly test but before each test game you just pick a
 combination of features randomly for each player.In this way about 1/4
 of the games will be relevant for each feature, regardless of how many
 features you are testing.(1/2 will have the feature on and half of those
 games will be against opponents who have the feature off.)

Wouldn't it be more effective to choose one player randomly, and make the
other a mirror image of it? That way, every game would give some
information of the relevance of one setting. At least in the very
beginning...

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread Don Dailey
On Wed, Nov 25, 2009 at 10:44 AM, Heikki Levanto hei...@lsd.dk wrote:

 On Wed, Nov 25, 2009 at 09:01:22AM -0500, Don Dailey wrote:
  You could of course just play games where you choose each player
 randomly.
  If you have 256 feature you have a ridiculous number of combinations,
 more
  than you could possibly test but before each test game you just pick a
  combination of features randomly for each player.In this way about
 1/4
  of the games will be relevant for each feature, regardless of how many
  features you are testing.(1/2 will have the feature on and half of
 those
  games will be against opponents who have the feature off.)

 Wouldn't it be more effective to choose one player randomly, and make the
 other a mirror image of it? That way, every game would give some
 information of the relevance of one setting. At least in the very
 beginning...


That would not be effective at all.   With 256 features you are (for all
practical purposes)  never going to see that exact combination of features
again.   In very general terms,  you are probably going to be more
interested in how a few terms interact than how many interact. Of course
this method only tries to understand each feature independently,  but I
think this has some validity.   I don't claim it will cover all the bases
however but it might be a good place to start.

Of course there is nothing that prevents you from observing from the data
the interaction of any specific combination of features,  but the amount of
data you will get for specific combinations of feature is going to be
drastically reduced with the number of features you wish to look at.

I will assert that looking at each feature independently is half the
battle,  and looking at combinations of 2 features is going to cover a
higher percentage of the remaining cases,  and so on.  And if you find
interesting interactions,   you can COMBINE them in a separate test - by
basically considering feature x and y together as a single compound
feature.   You would only do this when you have a high degree of confidence
that these 2 features definitely have some kind of synergy.  You might
even do this with features that have a negative interaction,  perhaps taking
care that they are never tested together.

- Don





 --
 Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread Matthew Woodcraft
steve uurtamo wrote:
 the way to do all of this exactly is with experimental design.

 to design experiments correctly that handle inter-term interactions of
 moderate degree, this tool is quite useful:

 http://www2.research.att.com/~njas/gosset/index.html

That doesn't seem to directly support deriving information from random
trials. For computer go tuning, would you play multiple games with each
parameter set in order to get a meaningful figure? That seems likely to
be less efficient than treating it as a bandit problem.

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimizing combinations of flags

2009-11-25 Thread Don Dailey
On Wed, Nov 25, 2009 at 2:00 PM, Matthew Woodcraft
matt...@woodcraft.me.ukwrote:

 steve uurtamo wrote:
  the way to do all of this exactly is with experimental design.
 
  to design experiments correctly that handle inter-term interactions of
  moderate degree, this tool is quite useful:
 
  http://www2.research.att.com/~njas/gosset/index.htmlhttp://www2.research.att.com/%7Enjas/gosset/index.html

 That doesn't seem to directly support deriving information from random
 trials. For computer go tuning, would you play multiple games with each
 parameter set in order to get a meaningful figure? That seems likely to
 be less efficient than treating it as a bandit problem.


This does not replace bandit,  it's a way to tune parameters.  You might
have 50 parameters and so you play a few thousand games using random
combinations of these parameters for instance.   And then you have data
based on the win/loss records of the different parameters and use this to
settle on a good set of parameters to be used.

- Don






 -M-
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimizing combinations of flags

2009-11-24 Thread Matthew Woodcraft
Brian Sheppard wrote:
 I think that I am assuming only that the objective function is convex. The
 parameters in Go programs are always inter-dependent.

What do you do when you add a new parameter? Do you retain your existing
'history', considering each game to have been played with the value of
the new parameter set to zero?

If you have 50 parameters already, doesn't adding a new parameter create
a rather large number of new parameter sets, most of which there will
never be time to investigate?


I have been using UCB and UCT to tune engine settings, but I don't think
these methods work well to tune more than a handful of parameters at the
same time.

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimizing combinations of flags

2009-11-24 Thread Brian Sheppard
What do you do when you add a new parameter? Do you retain your existing
'history', considering each game to have been played with the value of
the new parameter set to zero?

Yes, exactly.

If you have 50 parameters already, doesn't adding a new parameter create
a rather large number of new parameter sets, most of which there will
never be time to investigate?

Yes. So the new parameter will drift to its optimal value against the
existing parameter values.

But here's the thing: declining epsilon greedy policies are zero regret
starting from any initial state. So if the setting of the new parameter
affects old parameter settings, then the established parameters will start
to move as well.

If the objective function is a convex function of the parameters (which is
generally the case, based on the curves that I have seen) then the whole
system will drift to a global optimum.

I have been using UCB and UCT to tune engine settings, but I don't think
these methods work well to tune more than a handful of parameters at the
same time.

Such systems have trouble because their exploration is a *deterministic*
function of the sequence of wins. That is, all parameters will lock into the
same set of feedback. If you use UCT, then you have to optimize
*combinations* of parameters, which is unwieldy.

Declining epsilon greedy is a randomized exploration strategy, but still
zero-regret. Now the same sequence of wins/losses can be used to tune all
parameters concurrently.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimizing combinations of flags

2009-11-23 Thread Brian Sheppard
From what I understand, for each parameter you take with some high
probability the best so far, and with some lower probability the least
tried one. This requires (manually) enumerating all parameters on some
integer scale, if I got it correctly.

Yes, for each parameter you make a range of values that you believe brackets
the optimal value. Then the optimizer takes over and (eventually)
concentrates attention on the value that works best.

For instance, if your program includes a parameter for the number of RAVE
wins to initially assign to an atari move, then you might test 0, 1, 2, ...
10. (Note that parameters do not have to be integers. Just an integer number
of choices.)

Note that a value of 0 sometimes turns off a feature, in effect. When a
parameter is turned off I will put the entire code within an if-statement,
to approximate the CPU gain from omitting the code entirely:

if (theParameter != 0) {
   evaluation += theParameter * someFunction();
}

For the domain of Go, I typically find a broad range of near-optimal values.
In fact, there is often insignificant variations over the entire range:
perhaps a 1% difference. But I do see 10% gains for certain features, and I
never know what will be important.

I have tested systems that automatically create new values by interpolation.
But I haven't seen any benefit.


Given a set of parameters each with a range of values, and a set of game
results between different parameter sets (or a score average for each
set versus a reference), how to determine:

a) the most likely optimal set of parameters
b) the most interesting parameter set to test next

The system above will choose the most interesting parameter set to test
next. Now, whether the policy is zero-regret is more complicated.

In the case of a single parameter, the method is zero regret. If you want to
automatically introduce new values to test, then you can make a zero-regret
policy if the objective function is bounded and continuous.

But the more interesting case is where you have multiple parameters. In that
case, I think that my policy will find the global optimum if the objective
function is convex.

Convexity is not guaranteed in our domain, but is often the case. That is,
the benefit is small if a parameter is either too small or too large, and
the optimum is somewhere in between.

I could be wrong about some of these statements, because I haven't paused to
try to prove anything from a mathematical perspective. Instead, I am
convinced by some experimental observation of some pragmatic outcomes:

  1) does the optimizer tweak parameters towards optimality? (yes)

  2) does the optimizer work without needing my attention? (yes)

  3) if I change the program in other ways, will it adapt? (yes)

  4) if I code a bad feature, will the optimizer stop using it? (yes)

All together, this is not bad value.

I should mention one tweak, relating to the fact that your program might
sometimes face strong players, and other times weak players. I keep a record
of average rating for the opponents faced by each parameter value. Then when
I explore I can choose either the least-taken parameter value or the
parameter value whose average opponent would be most affected by playing
this game. This policy keeps average opponent ratings in line.

Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimizing combinations of flags

2009-11-23 Thread Thomas Lavergne
Your system seems very interesting but it seems to me that you assume
that each parameters are independant.
What happen if, for example, two parameters works well when only one of
the is active and badly if the two are actives at the same time ?

Tom

-- 
Thomas LavergneEntia non sunt multiplicanda praeter
 necessitatem. (Guillaume d'Ockham)
thomas.laver...@reveurs.orghttp://oniros.org
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimizing combinations of flags

2009-11-23 Thread Brian Sheppard
Your system seems very interesting but it seems to me that you assume
that each parameters are independant.
What happen if, for example, two parameters works well when only one of
the is active and badly if the two are actives at the same time ?

I think that I am assuming only that the objective function is convex. The
parameters in Go programs are always inter-dependent.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/