Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Magnus Persson

Quoting Don Dailey [EMAIL PROTECTED]:


When the child nodes are allocated, they are done all at once with
this code - where cc is the number of fully legal child nodes:


In valkyria3 I have supernodes that contains an array of moveinfo  
for all possible moves. In the moveinfo I also store win/visits and  
end position ownership statistics so my data structures are memory  
intensive. As a consequence I expand each move individually, and my  
threshold seems to be best at 7-10 visits in test against Gnugo. 40  
visits could be possible but at 100 there is a major loss in playing  
strength.


Valkyria3 is also superselective using my implementation of mixing  
AMAF with UCT as the mogo team recently described. The UCT constant is  
0.01 (outside of the square root).


When it comes to parameters please remember that they may not have  
independent effects on the playing strength. If one parameter is  
changed a lot then the best value for other parameters may also  
change. And what makes things worse is probably that best parameters  
change as a function of the playouts. I believe that ideally the  
better the MC-eval is the more selective one can expand the tree for  
example.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Jonas Kahn
On Tue, Mar 11, 2008 at 09:05:01AM +0100, Magnus Persson wrote:
 Quoting Don Dailey [EMAIL PROTECTED]:

 When the child nodes are allocated, they are done all at once with
 this code - where cc is the number of fully legal child nodes:

 In valkyria3 I have supernodes that contains an array of moveinfo for 
 all possible moves. In the moveinfo I also store win/visits and end 
 position ownership statistics so my data structures are memory intensive. 
 As a consequence I expand each move individually, and my threshold seems to 
 be best at 7-10 visits in test against Gnugo. 40 visits could be possible 
 but at 100 there is a major loss in playing strength.

 Valkyria3 is also superselective using my implementation of mixing AMAF 
 with UCT as the mogo team recently described. The UCT constant is 0.01 
 (outside of the square root).

 When it comes to parameters please remember that they may not have 
 independent effects on the playing strength. If one parameter is changed a 
 lot then the best value for other parameters may also change. And what 
 makes things worse is probably that best parameters change as a function of 
 the playouts. I believe that ideally the better the MC-eval is the more 
 selective one can expand the tree for example.

Typically, how many parameters do you have to tune ? Real or two-level ?

If you consider a yet reasonable subset of parameters, an efficient way
to estimate them is  to use fractional factorial design for the linear
part, and central composite design for quadratic part (once you know you
are already in the right area). You are much more precise than with
change-one-at-a-time strategies if there is no interaction between
parameters, and you can detect interactions.

Since anyhow computers are used, it might be possible to choose
sequentially automatically new values of the parameter that optimize
your efficiency. That's a very interesting problem, with much work on it
in the statistical community, but I do not know that very well (neither
the former designs, but those are easy).

Alternatively, especially with a very high number of real parameters,
derivatives of MC techniques can be efficient and easy to implement:
particle filtering or swarm optimization in particular.

Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Magnus Persson

Quoting Jonas Kahn [EMAIL PROTECTED]:


On Tue, Mar 11, 2008 at 09:05:01AM +0100, Magnus Persson wrote:

Quoting Don Dailey [EMAIL PROTECTED]:


When the child nodes are allocated, they are done all at once with
this code - where cc is the number of fully legal child nodes:


In valkyria3 I have supernodes that contains an array of moveinfo for
all possible moves. In the moveinfo I also store win/visits and end
position ownership statistics so my data structures are memory intensive.
As a consequence I expand each move individually, and my threshold seems to
be best at 7-10 visits in test against Gnugo. 40 visits could be possible
but at 100 there is a major loss in playing strength.

Valkyria3 is also superselective using my implementation of mixing AMAF
with UCT as the mogo team recently described. The UCT constant is 0.01
(outside of the square root).

When it comes to parameters please remember that they may not have
independent effects on the playing strength. If one parameter is changed a
lot then the best value for other parameters may also change. And what
makes things worse is probably that best parameters change as a function of
the playouts. I believe that ideally the better the MC-eval is the more
selective one can expand the tree for example.


Typically, how many parameters do you have to tune ? Real or two-level ?


I guess I have 10 real valued and 10 binary ones. There are probably a  
lot of stuff that are ahrd coded and could be parameterized.


Here I am also completely ignoring playouts that have hundreds of  
handtuned parameters.




If you consider a yet reasonable subset of parameters, an efficient way
to estimate them is  to use fractional factorial design for the linear
part, and central composite design for quadratic part (once you know you
are already in the right area). You are much more precise than with
change-one-at-a-time strategies if there is no interaction between
parameters, and you can detect interactions.


I once met this guy:

http://meche.mit.edu/people/faculty/index.html?id=27

his research is a mix of testing formal methods and how well they work  
in practice and also studying how engineers (who often do not use  
these methods) actually do in practice.


He seemed to argue that doing parameter optimization intuitively by  
hand is not as bad as one might think compared to fractional factorial  
design. So I use that as an excuse for just doing it as I always did.  
For me it is important to keep a careful record of what I do and plot  
the result with confidence intervals to avoid tricking myself.



Alternatively, especially with a very high number of real parameters,
derivatives of MC techniques can be efficient and easy to implement:
particle filtering or swarm optimization in particular.


That would be tempting (I once implemented a fitting method inspired  
by simulating annealing and it was very efficient) but it would  
require a completely different test setup than the one I use right now.


It also a matter of time and patience. I want new results every day.  
If I would test all parameters at once using formal methods I would  
still have to wait for weeks.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Don Dailey


Don Dailey wrote:
 Petr Baudis wrote:
   
 On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:
   
 
 I think you may still have a bug.  You should get well over 1700 with
 110,000 playouts, even if they are light playouts.  
 
   
 Hmmm... That is going to be some tough debugging I suspect.
 
Perhaps I spoke too early.I am preparing a very pure version of UCT
which I will test.In fact, it would be good if anyone else who
want's to compare numbers test too.It will be uniform random
play-outs and no special techniques in the tree such as AMAF,  RAVE or
anything else like that.

Since 110,000 play-outs may be too many for some programs,  I suggest we
use a more modest value that we can standardize on that will be
convenient for modest computers and easy on resources (so that you may
be able to run 2 or more tests on better hardware.)  I suggest
exactly 25,000 play-outs that we should standardize on.50,000 will
tax my spare computer which I like to use for modest CGOS tests.  

If it is agreed,  I will start a 25k test.My prediction is that this
will finish around 1600 ELO on CGOS.   

I can also run a special all-time rating list if you participate which I
believe returns more accurate ratings - you need at least 200 games to
get on the list and I believe 500 is far better to get an accurate
rating.I'm going to try to get at least 500.

- Don


P.S.

I discovered that you can set the level substantially higher (and get
away with it) if you have code to cut down the number of playouts a lot
in endings  which are nearly won or lost. Even better is to
progressively cut down the play-outs as a function of distance from the
opening - but then we would all have to standardize on this too and it
would probably be difficult to agree on how we should do that.  But
the issue is what rating can you expect with a relatively generic
UCT-like MC implementation given some number of play-outs.




   
 
 I'm working on a new program, starting from scratch.   When it gets far
 enough along I will compare my numbers (and ELO) to yours.


   
   
 
   I'm pretty sure my code is fairly well debugged now, but of course
 there may be still bugs lurking; when I have put my bots on CGOS for the
 first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
 ELO rating did pure UCT bots get historically with how many playouts?
   
   
 
 FatMan does 20k playouts and has heavy play-outs, very similar to the
 first paper where mogo described it's play-out strategy - basically
 playing a random out of atari move or a local move that fits one of
 their patterns.   It is rated 1800 on CGOS.The tree expansion policy
 for nodes is based on the parent count,   not the child itself.So
 once the parent has 100 play-outs children are expanded regardless of
 the number of games they have seen.(Near the end of the game in 9x9
 go some moves could be tried a few times before being expanded.) 
 
   
 Oh, interesting! I must have misread libEGO code which seems to use
 similar thresholds.

 What is the justification of using the parent playout count instead of
 the node playout count itself?

   
 
 I don't know if it makes much difference how this is done, and I don't
 know how everybody else is doing it.  I allocate all the children at
 once and do not have to store pointers to each child,  just one pointer
 to the first child and a counter so that I know how many children there
 are.   On average I'm probably expanding every other time in the middle
 of the game. 

 I preallocate all the memory for the nodes when the program starts
 instead of using malloc and free, and I assume most others are doing
 something similar.

 Here is my basic data structure:

 typedef struct ntag
 {
   intwins;
   intgames;
   intchild_count;  // zero until parent wins  100
   struct  ntag   *children;// a pointer to a place in the big node
 pool
   mv_t   mv;   // move that got us here.

 } node_t;

 When the child nodes are allocated, they are done all at once with
 this code - where cc is the number of fully legal child nodes:

// reserve space for cc entries
 n-child = (pool[pool_count]);
 pool_count += cc;

 overflow checking is done outside of the search routine.

 There are almost certainly better schemes,   I just used what occurred
 to me to be the easiest and quickest to implement without working too
 hard at it.

 Some programs hash each position and the tree is more abstract, no
 pointers  just positions leading to other positions by zobrist hash keys
 in a hash table.  

 My scheme probably wastes a lot of space on nodes that are left
 unvisited at the leaves of the tree.   But I don't waste much on storing
 pointers since I keep them in an array. 

 What is the state of the art on this?   How is everyone else doing it?


 - Don









   
 

Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Christoph Birk

On Tue, 11 Mar 2008, Don Dailey wrote:

If it is agreed,  I will start a 25k test.My prediction is that this
will finish around 1600 ELO on CGOS.


I have long term rating for simple random playouts:

myCtest-10k and myCtest-50k.

I keep them active since Sept/2006. Please don't use 25k.

Christoph
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Jason House
If the speed was lowered to 10k, I'd also participate. One of these  
days, I'll speed up my engine...


Sent from my iPhone

On Mar 11, 2008, at 11:18 AM, Don Dailey [EMAIL PROTECTED] wrote:




Don Dailey wrote:

Petr Baudis wrote:


On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:


I think you may still have a bug.  You should get well over 1700  
with

110,000 playouts, even if they are light playouts.



Hmmm... That is going to be some tough debugging I suspect.

Perhaps I spoke too early.I am preparing a very pure version of  
UCT

which I will test.In fact, it would be good if anyone else who
want's to compare numbers test too.It will be uniform random
play-outs and no special techniques in the tree such as AMAF,  RAVE or
anything else like that.

Since 110,000 play-outs may be too many for some programs,  I  
suggest we

use a more modest value that we can standardize on that will be
convenient for modest computers and easy on resources (so that you may
be able to run 2 or more tests on better hardware.)  I suggest
exactly 25,000 play-outs that we should standardize on.50,000 will
tax my spare computer which I like to use for modest CGOS tests.

If it is agreed,  I will start a 25k test.My prediction is that  
this

will finish around 1600 ELO on CGOS.

I can also run a special all-time rating list if you participate  
which I

believe returns more accurate ratings - you need at least 200 games to
get on the list and I believe 500 is far better to get an accurate
rating.I'm going to try to get at least 500.

- Don


P.S.

I discovered that you can set the level substantially higher (and get
away with it) if you have code to cut down the number of playouts a  
lot

in endings  which are nearly won or lost. Even better is to
progressively cut down the play-outs as a function of distance from  
the

opening - but then we would all have to standardize on this too and it
would probably be difficult to agree on how we should do that.   
But

the issue is what rating can you expect with a relatively generic
UCT-like MC implementation given some number of play-outs.







I'm working on a new program, starting from scratch.   When it gets  
far

enough along I will compare my numbers (and ELO) to yours.






 I'm pretty sure my code is fairly well debugged now, but of  
course
there may be still bugs lurking; when I have put my bots on CGOS  
for the
first time it was awfully bug-ridden (and about 800 ELO  
worse ;-). What
ELO rating did pure UCT bots get historically with how many  
playouts?




FatMan does 20k playouts and has heavy play-outs, very similar to  
the

first paper where mogo described it's play-out strategy - basically
playing a random out of atari move or a local move that fits one of
their patterns.   It is rated 1800 on CGOS.The tree expansion  
policy
for nodes is based on the parent count,   not the child  
itself.So
once the parent has 100 play-outs children are expanded  
regardless of
the number of games they have seen.(Near the end of the game  
in 9x9

go some moves could be tried a few times before being expanded.)



Oh, interesting! I must have misread libEGO code which seems to use
similar thresholds.

What is the justification of using the parent playout count  
instead of

the node playout count itself?



I don't know if it makes much difference how this is done, and I  
don't
know how everybody else is doing it.  I allocate all the  
children at
once and do not have to store pointers to each child,  just one  
pointer
to the first child and a counter so that I know how many children  
there
are.   On average I'm probably expanding every other time in the  
middle

of the game.

I preallocate all the memory for the nodes when the program starts
instead of using malloc and free, and I assume most others are doing
something similar.

Here is my basic data structure:

typedef struct ntag
{
 intwins;
 intgames;
 intchild_count;  // zero until parent wins  100
 struct  ntag   *children;// a pointer to a place in the big  
node

pool
 mv_t   mv;   // move that got us here.

} node_t;

When the child nodes are allocated, they are done all at once with
this code - where cc is the number of fully legal child nodes:

  // reserve space for cc entries
   n-child = (pool[pool_count]);
   pool_count += cc;

overflow checking is done outside of the search routine.

There are almost certainly better schemes,   I just used what  
occurred

to me to be the easiest and quickest to implement without working too
hard at it.

Some programs hash each position and the tree is more abstract, no
pointers  just positions leading to other positions by zobrist hash  
keys

in a hash table.

My scheme probably wastes a lot of space on nodes that are left
unvisited at the leaves of the tree.   But I don't waste much on  
storing

pointers since I keep them in an array.

What is 

Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Don Dailey
This isn't simple random play-outs.It's monte carlo with UCT tree
search. 

Ok,  I will use 50k to match your test.It means I  probably cannot
run 2 tests on that machine and is why I hoped it would be minimal
resource usage,  but since you have already started I will restart my test.

- Don




Christoph Birk wrote:
 On Tue, 11 Mar 2008, Don Dailey wrote:
 If it is agreed,  I will start a 25k test.My prediction is that this
 will finish around 1600 ELO on CGOS.

 I have long term rating for simple random playouts:

 myCtest-10k and myCtest-50k.

 I keep them active since Sept/2006. Please don't use 25k.

 Christoph
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Christoph Birk

On Tue, 11 Mar 2008, Don Dailey wrote:

This isn't simple random play-outs.It's monte carlo with UCT tree
search.

Ok,  I will use 50k to match your test.It means I  probably cannot
run 2 tests on that machine and is why I hoped it would be minimal
resource usage,  but since you have already started I will restart my test.


You can also use 10k as Jason suggested.

myCtest-10k-UCT uses 10k (random/light ) playouts with UCT.

Christoph

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Don Dailey
I am going to keep the 25k playouts running and add a 10k play-out
version of UCT. I want to establish a standard testing size so that
I can watch the evolution of the program and 50k is too much if my
program triples in run time as I introduce very heavy play-outs.(I
don't want to count on getting a faster computer, because they are not
getting much faster these days,   they are adding processing cores
instead although I still expect gradual speedups.)

- Don



Christoph Birk wrote:
 On Tue, 11 Mar 2008, Don Dailey wrote:
 If it is agreed,  I will start a 25k test.My prediction is that this
 will finish around 1600 ELO on CGOS.

 I have long term rating for simple random playouts:

 myCtest-10k and myCtest-50k.

 I keep them active since Sept/2006. Please don't use 25k.

 Christoph
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Christoph Birk

On Tue, 11 Mar 2008, Don Dailey wrote:

I am going to keep the 25k playouts running and add a 10k play-out
version of UCT. I want to establish a standard testing size so that


Great! That way Jason can also participate.

myCtest-10k-UCT has a long-term rating of about 1250.
For the 50k version I have just started a test series that experiments
with various thresholds before creating a new node.

Christoph

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Petr Baudis
On Mon, Mar 10, 2008 at 10:04:18PM -0400, Don Dailey wrote:
 Your method is to allocate 1 node when it's been visited once or twice -
 very natural I agree.   My method is to allocate all the children at
 once, and wait until the parent has been visited some number of times
 (currently 100).   If there are 50 legal moves, that gives on average
 about 1 node allocation every 2 visits, which is what you said you do.  

I _expand_ 1 node when it's been visited twice, not allocate.

To clarify further:

...
if (tree_leaf_node(n)) {
if (n-playouts = u-expand_p /* 2 */)
tree_expand_node(t, n, b2);
/* Just now occured to me that I should probably
 * descend to one of the children now yet. */

result = play_random_game(b2, stone_other(color), u-gamelen);
break;
}
...

void
tree_expand_node(struct tree *t, struct tree_node *node, struct board *b)
{
tree_add_child(node, tree_init_node(pass));
for (int i = 1; i  t-board-size; i++)
for (int j = 1; j  t-board-size; j++)
if (board_atxy(b, i, j) == S_NONE)
tree_add_child(node, tree_init_node(coord(i, 
j)));
}

-- 
Petr Pasky Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Hybrid theory

2008-03-11 Thread Alain Baeckeroot
Le vendredi 1 février 2008, David Doshay a écrit :
 This is the direction in which we are moving with SlugGo. We also
 expect it to be difficult to integrate different approaches, but this
 has always been our research direction: when there are multiple
 codes which will each give an evaluation of a situation, how does
 one design an arbitrator that makes the final decision?

I asked how one do in my lab in speach recognition. They use home made
very simple method, but deeply linked to the internal of our tools.
The good news is that the phD student is going to study a little
voting methods and alike before his the end of his thesis !
Maybe in some monthes i'll have more info :)

Alain

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Jonas Kahn
 Typically, how many parameters do you have to tune ? Real or two-level ?

 I guess I have 10 real valued and 10 binary ones. There are probably a lot 
 of stuff that are ahrd coded and could be parameterized.

 Here I am also completely ignoring playouts that have hundreds of handtuned 
 parameters.

There automatic techniques like those describes by Rémi Coulom are
probably useful.

 If you consider a yet reasonable subset of parameters, an efficient way
 to estimate them is  to use fractional factorial design for the linear
 part, and central composite design for quadratic part (once you know you
 are already in the right area). You are much more precise than with
 change-one-at-a-time strategies if there is no interaction between
 parameters, and you can detect interactions.

 I once met this guy:

 http://meche.mit.edu/people/faculty/index.html?id=27

 his research is a mix of testing formal methods and how well they work in 
 practice and also studying how engineers (who often do not use these 
 methods) actually do in practice.

 He seemed to argue that doing parameter optimization intuitively by hand is 
 not as bad as one might think compared to fractional factorial design. So I 
 use that as an excuse for just doing it as I always did. For me it is 
 important to keep a careful record of what I do and plot the result with 
 confidence intervals to avoid tricking myself.

Anyhow, it's always wiser for someone to use a method he understands; so
best to keep it simple if you do not take/have the time to learn more
sophisticated techniques.

 Alternatively, especially with a very high number of real parameters,
 derivatives of MC techniques can be efficient and easy to implement:
 particle filtering or swarm optimization in particular.

 That would be tempting (I once implemented a fitting method inspired by 
 simulating annealing and it was very efficient) but it would require a 
 completely different test setup than the one I use right now.

Thinking more about it, that's not completely obvious, and hence is
interesting (for me): usually those methods are tailored for functions where 
they get the exact value, not a Bernoulli trial. 

 It also a matter of time and patience. I want new results every day. If I 
 would test all parameters at once using formal methods I would still have 
 to wait for weeks.

Well, once you are in the right zone, if you want to check a change, you
can look at this change only.
Note that you can also try and get information on other real parameters
even when focusing on a precise one: you make them vary very little.
Then the results for those supplementary parameters nearly do not impact
your main parameter, and you can get information through taking means
for a longer time...

Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-11 Thread Petr Baudis
On Tue, Mar 11, 2008 at 11:41:41AM -0700, Christoph Birk wrote:
 On Tue, 11 Mar 2008, Don Dailey wrote:
 I am going to keep the 25k playouts running and add a 10k play-out
 version of UCT. I want to establish a standard testing size so that

 Great! That way Jason can also participate.

 myCtest-10k-UCT has a long-term rating of about 1250.
 For the 50k version I have just started a test series that experiments
 with various thresholds before creating a new node.

My engine is now playing as pachi1-p0.25-li10k and pachi1-p0.25-li50k.

-- 
Petr Pasky Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Hybrid theory

2008-03-11 Thread David Doshay

We are still bringing up our 2nd method, so we are not yet as far
as choosing a voting method.

Cheers,
David



On 11, Mar 2008, at 12:18 PM, Alain Baeckeroot wrote:


Le vendredi 1 février 2008, David Doshay a écrit :

This is the direction in which we are moving with SlugGo. We also
expect it to be difficult to integrate different approaches, but this
has always been our research direction: when there are multiple
codes which will each give an evaluation of a situation, how does
one design an arbitrator that makes the final decision?


I asked how one do in my lab in speach recognition. They use home made
very simple method, but deeply linked to the internal of our tools.
The good news is that the phD student is going to study a little
voting methods and alike before his the end of his thesis !
Maybe in some monthes i'll have more info :)

Alain

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/