subject:"\[computer\-go\] Optiizing combinations of flags"

Re: [computer-go] Optiizing combinations of flags

2009-11-23 Thread Gian-Carlo Pascutto

Brian Sheppard wrote:

 In this strategy, one chooses a random number p, and then select the 
 strategy with highest historical mean if p  epsilon, and the 
 strategy taken least often otherwise. If epsilon = C*log(n)/n, where 
 n is the number of experiments so far, then the strategy has zero
 asymptotic regret.

Can you explain this in more detail?

From what I understand, for each parameter you take with some high
probability the best so far, and with some lower probability the least
tried one. This requires (manually) enumerating all parameters on some
integer scale, if I got it correctly.

I'm interested in the following general problem:

Given a set of parameters each with a range of values, and a set of game
results between different parameter sets (or a score average for each
set versus a reference), how to determine:

a) the most likely optimal set of parameters
b) the most interesting parameter set to test next

Remi Coulom has done some work in this area:

http://remi.coulom.free.fr/QLR/

It sounds very interesting (v-optimal sampling). But I don't understand
it enough to implement it. Your idea sounds simpler, but the enumeration
would be a problem, for parameters with wide ranges where we don't know
where to start.

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optiizing combinations of flags

2009-11-23 Thread Rémi Coulom


Gian-Carlo Pascutto wrote:

Remi Coulom has done some work in this area:

http://remi.coulom.free.fr/QLR/

It sounds very interesting (v-optimal sampling). But I don't understand
it enough to implement it. Your idea sounds simpler, but the enumeration
would be a problem, for parameters with wide ranges where we don't know
where to start.
  


I am working on a tool for automatic parameter optimization. Right now 
it is in a very experimental state. You can find it there:

http://remi.coulom.free.fr/QLR/QLR.tar.bz2

Right now only BAST is a practical method to optimize parameters with my 
tool. It is UCT over a recursive binary partitioning of the parameter 
space. It does not work well for high-dimensional optimization. Even for 
1-dimensional problems, I expect QLR will outperform it significantly.


If you have access to Springer publications, that paper by Ken Chen 
describes another method:

http://www.springerlink.com/content/759660u175x64663/
It combines UCB ideas with a complex genetic algorithm. It seems very 
complicated for my taste.


It is probably more convenient to wait a little that I improve my 
parameter-optimization system, and you'll have a ready-made tool.


My tool also allows to easily implement new parameter-optimization 
algorithms, test them on artificial problems, before applying them to a 
Go program.


Right now, only binary outcomes are supported. I will add win/loss/draw 
results later.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Optiizing combinations of flags

2009-11-20 Thread Brian Sheppard

On that topic, I have around 17 flag who enable or not features in my
pure playouts bots, and I want to search the best combinations of them.
I known this is almost a dream but does anyone know the best way to
approximate this.

Pebbles randomly chooses (using a zero asymptotic regret strategy) parameter
values before each game. I literally never manually tune parameters for
Pebbles. I just set up experiments, and put them on a parameter for my
optimizer to manage. After a few hundred games it is clear what the right
choices are.

My favorite exploration strategy is a declining epsilon greedy strategy. I
like it because it is a randomized strategy, so I can optimize all
parameters concurrently using a single stream of games. In this strategy,
one chooses a random number p, and then select the strategy with highest
historical mean if p  epsilon, and the strategy taken least often
otherwise. If epsilon = C*log(n)/n, where n is the number of experiments so
far, then the strategy has zero asymptotic regret.

Pebbles has about 50 parameters right now. Most are pretty settled because
they have thousands of games of experience. All are potentially modified
before each game.

Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optiizing combinations of flags

Re: [computer-go] Optiizing combinations of flags

[computer-go] Optiizing combinations of flags

3 matches

Site Navigation

Mail list logo

Footer information