Re: [computer-go] Hash tables

2009-07-07 Thread Peter Drake

On Jul 6, 2009, at 10:09 PM, Peter Drake wrote:

I suppose I could also include first child / next sibling pointers.  
I wouldn't actually use them when performing playouts, but they  
would greatly speed up the mark phase of marking and sweeping.



Hmm. That would work for a tree, but this is a DAG.

Consider this situation:

  A
 / \
B   C
 \ / \
  D   E

(For simplicity, ignore the fact that an actual transposition must be  
at least three moves long.) Here D can be reached from either B or C,  
but E can only be reached from C (perhaps due to ko).


With first-child/next-sibling pointers, that might look like this:

A
|
B-C
|/
D-E

If we then actually move to B and try to keep the tree, we'll end up  
keeping E, even though it's really an unreachable node. That's  
probably acceptable: we can keep a few unreachable nodes, as long as  
we keep all of the real ones.


There might be another problem, though. Suppose, from the situation  
above, I add F and G as children of B:


A
|
B-C
|  \
F-G-D-E

Later, I discover that F is also a child of C. I look through C's  
apparent children (D and E) and discover that F is not among them. How  
do I add F to the list of C's children without some serious list  
traversal (and surgery)?


Perhaps a better solution is for each node to point to a linked list  
of its children. The nodes in these lists (allocated from a pool at  
load time, naturally) would each point to a search tree node and the  
next list node.


Peter Drake
http://www.lclark.edu/~drake/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Really basic question

2009-07-07 Thread Oliver Lewis
Although the number of games explored is very limited relative to the total
number of possible games, those games are in some sense representative of
what happens if you start with a particular move.  That's why they can help
to create a ranking that tells you something about which moves are better
than others.  The move to heavy playouts is about making the sample games
even more representative so that they yield more useful information.

Others on this list have reported in the past that the randomness is
actually very important.  Playouts that are very heavy, no matter how
clever they are, actually reduce the performance because they narrow the
number of games too much.

You should also read up on the all moves as first (AMAF) technique.  This
is even more surprising because it attributes the outcome of a random game
to every move of that colour during the random game, as if that was the move
that had been played first.  This generates information to help rank the
moves even more quickly.

Oliver


On Tue, Jul 7, 2009 at 2:15 AM, Fred Hapgood hapg...@pobox.com wrote:

 I have a really basic question about how MC works in the context of Go.

 Suppose the problem is to make the first move in a game, and suppose we
 have accepted as a constraint that we will abstain from just copying
 some joseki out of a book -- we are going to use MC to figure out the
 first move de novo. We turn on the software and it begins to play out
 games. My question is: how does the software pick its first move?  Does
 it move entirely at random? Sometimes it sounds that way MC works is by
 picking each move at random, from the first to the last, for a million
 games or so. The trouble is that the number of possible Go games is so
 large that a million games would not even begin to explore the
 possibilities.  It is hard to imagine anything useful emerging from
 examining such a small number. So I'm guessing that the moves are not
 chosen at random.  But even if you reduce the possibilities to two
 options per move, which would be pretty impressive, you'd still run out
 of your million games in only twenty moves, after which you would be
 back to picking at random again.

 What am I missing??




 http://www.BostonScienceAndEngineeringLectures.com
 http://www.pobox.com/~fhapgood http://www.pobox.com/%7Efhapgood

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Really basic question

2009-07-07 Thread Magnus Persson

Quoting Oliver Lewis ojfle...@gmail.com:


Others on this list have reported in the past that the randomness is
actually very important.  Playouts that are very heavy, no matter how
clever they are, actually reduce the performance because they narrow the
number of games too much.


I would like to disagree with this statement. I think it is difficult  
to make a good heavy playouts, but it is not impossible.


Failing to make a playout stronger through heaviness does not prove  
anything. It just mean one has failed.


If I could make a heavy playout of 1 Dan strength and then run in MC  
tree search. I am sure it would be stronger than the playout itself.


The problem I think is to find a good tradeoff between heavyness and  
speed. In my test with Valkyria vs Fuego, Valkyria is superior when  
the number of playouts are the same. But Fuego can play 5 times more  
playouts per second on the hardware that results in Fuego being  
slightly stronger than Valkyria at the moment.


It just cannot be that having strong playout leads to worse play in  
general. It is just a matter of not slowing down the code too much and  
add just those heavy elements that do increase playing strength.


-Magnus

--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Really basic question

2009-07-07 Thread Heikki Levanto
On Tue, Jul 07, 2009 at 09:16:25AM +0200, Oliver Lewis wrote:
 
 You should also read up on the all moves as first (AMAF) technique.  This
 is even more surprising because it attributes the outcome of a random game
 to every move of that colour during the random game, as if that was the move
 that had been played first.  This generates information to help rank the
 moves even more quickly.

Actually, it is not so surprising. Since the playouts are random, they have
no idea of urgency - they leave large groups in atari for many moves, and it
is a toss of a coin who plays to the crucial point first. Most likely it does
not happen on the first move. But if that is the game-deciding point, it
makes sense to give credit to who ever got to play it.

  -H

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Really basic question

2009-07-07 Thread Christian Nentwich

Fred,

others have already indicated that you are missing the tree part of 
Monte Carlo Tree Search, but there is something else to add.


If you run uniform random playouts on an empty 9x9 board, let's say a 
million of them for each point, you will see a probability distribution 
emerging that roughly shows the centre 3x3 or 4x4 points as having a 
high probability of winning (around 50%, depending on komi), and you 
will see the edges and first line having a much lower probability.


This is not going to help you choose between the points in the centre. 
Every time you run the simulation, a different point will be selected as 
best - because the state space will be inadequately explored, as you 
say - but you will be able to choose a good move.


On 19x19, the space is so inadequately explored that running a uniform 
Monte Carlo simulation on an empty board is useless (i.e. you will get 
completely different results every time you run it) and further 
heuristics either during playouts or during the tree search phase are 
needed.


Christian


Fred Hapgood wrote:

I have a really basic question about how MC works in the context of Go.

Suppose the problem is to make the first move in a game, and suppose we
have accepted as a constraint that we will abstain from just copying
some joseki out of a book -- we are going to use MC to figure out the
first move de novo. We turn on the software and it begins to play out
games. My question is: how does the software pick its first move?  Does
it move entirely at random? Sometimes it sounds that way MC works is by
picking each move at random, from the first to the last, for a million
games or so. The trouble is that the number of possible Go games is so
large that a million games would not even begin to explore the
possibilities.  It is hard to imagine anything useful emerging from
examining such a small number. So I'm guessing that the moves are not
chosen at random.  But even if you reduce the possibilities to two
options per move, which would be pretty impressive, you'd still run out
of your million games in only twenty moves, after which you would be
back to picking at random again.

What am I missing??  





http://www.BostonScienceAndEngineeringLectures.com
http://www.pobox.com/~fhapgood

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

  



--

Christian Nentwich

Director, Model Two Zero Ltd.
+44-(0)7747-061302
http://www.modeltwozero.com

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Experimentation (was: Really basic question)

2009-07-07 Thread Christian Nentwich


The problem I think is to find a good tradeoff between heavyness and 
speed. In my test with Valkyria vs Fuego, Valkyria is superior when the 
number of playouts are the same. But Fuego can play 5 times more 
playouts per second on the hardware that results in Fuego being slightly 
stronger than Valkyria at the moment.


Indeed - this makes me wonder why I keep seeing papers where different 
versions of algorithms are compared with the same number of playouts, 
rather than under the same time limit.


What is the motivation in this? I cannot conceive of any good reason for 
running an experiment this way, so I would be interested in opinions. It 
seems to me that making algorithms heavier and then demonstrating that 
they are stronger with the same number of playouts misses the point - 
why would one not run an experiment under the same time conditions instead?


Christian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Darren Cook
 What is the motivation in this? I cannot conceive of any good reason for
 running an experiment this way, so I would be interested in opinions. It
 seems to me that making algorithms heavier and then demonstrating that
 they are stronger with the same number of playouts misses the point -
 why would one not run an experiment under the same time conditions instead?

* The heavier algorithm might be unoptimized;

* The heavier algorithm might be easier to parallelize (or put into
hardware);

* The scaling behaviour might be different. E.g. if fuego and Valkyria
are both run with 10 times more playouts the win rate might change. Just
to dismiss an algorithm that loses at time limits that happen to suit
rapid testing on today's hardware could mean we miss out on the ideal
algorithm for tomorrow's hardware. (*)

* By analyzing why the win rate of the super-heavy algorithm is better
might give other people ideas for lighter but still effective playouts.

Darren

*: As an example, monte carlo itself was ignored for the first 10 years
of its life because traditional programs were stronger on the same hardware.

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/mlsn/ (Multilingual open source semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Christian Nentwich

Darren Cook wrote:

seems to me that making algorithms heavier and then demonstrating that
they are stronger with the same number of playouts misses the point -
why would one not run an experiment under the same time conditions instead?



* The heavier algorithm might be unoptimized;
  
That is probably a good pragmatic point, but that does not make the 
experiment any more valid. You are already designing the algorithm so 
that it performs better at the same number of playouts, by trading off 
speed. I don't think it is then valid to also use that as proof that it 
performs better - the experiment is slanted against the lighter, faster 
algorithm.


Unfortunately, it is unoptimized, while usually a good argument, 
interferes with one of the main dimensions of the experiment, which is 
speed.



* The heavier algorithm might be easier to parallelize (or put into
hardware);
  
That seems unintuitive, but I confess I have no experience whatsoever in 
that respect.

* The scaling behaviour might be different. E.g. if fuego and Valkyria
are both run with 10 times more playouts the win rate might change. Just
to dismiss an algorithm that loses at time limits that happen to suit
rapid testing on today's hardware could mean we miss out on the ideal
algorithm for tomorrow's hardware. (*)
  
I am not sure I follow this argument. How do you intend to prove that, 
unless you run the algorithms with 10 times more playouts? In that case, 
I would still argue that you should run them with X times longer time 
limits, not with 10 times more playouts, unless you can assume (with 
proof or good evidence, so you can set differential numbers of playouts 
between the two) that tomorrow's hardware will favour one algorithm more 
than the other.



* By analyzing why the win rate of the super-heavy algorithm is better
might give other people ideas for lighter but still effective playouts.

  
This I can accept. In general, I do think it is interesting to see the 
win rate at the same number of playouts as background analysis, but I 
don't see it as a convincing evidence of advancement over other algorithms.


Christian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Darren Cook
 * The scaling behaviour might be different. E.g. if fuego and Valkyria
 are both run with 10 times more playouts the win rate might change
   
 I am not sure I follow this argument. How do you intend to prove that,
 unless you run the algorithms with 10 times more playouts? 

I think showing it is similar or better with same number of playouts is
a good first step - the second experiment takes 10 times as long to run :-)

Darren

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/mlsn/ (Multilingual open source semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Magnus Persson

Quoting Darren Cook dar...@dcook.org:


* The scaling behavior might be different. E.g. if Fuego and Valkyria
are both run with 10 times more playouts the win rate might change. Just
to dismiss an algorithm that loses at time limits that happen to suit
rapid testing on today's hardware could mean we miss out on the ideal
algorithm for tomorrow's hardware. (*)


I just happened to have experimental data on exactly this topic. This  
is Valkyria vs Fuego where I scale the number of playouts (Sims) x4 in  
each row.


SimsWinrate Err N   WR  EloDiff
2   99.20.4 500 0.992   837
8   98.20.6 500 0.982   696
32  94.21   500 0.942   484
128 88.81.4 500 0.888   360
512 82  1.7 500 0.82263
204883.21.7 499 0.832   278
819281.31.7 497 0.813   255
32768   75.53.6 139 0.755   196

The data shows clearly that the 0.3.2 version of Fuego I use, probably  
plays really bad moves with a high frequency in the playouts. With  
more playouts a lot of these blunders can be avoided I guess and the  
win rate goes down from 99% towards 80%. The question here if it goes  
asymptotically towards 80% or perhaps 50% with more simulations.  
Unfortunately I cannot continue this plot because I run out of memory  
and it takes ages to finish the games.


So the question is then: are there a fixed gain of the heavy playouts  
with more than  512 simulations or does the effect of the heavy  
playout get less and less important with larger tree size.


Note also that this also not only a matter of having heavy playouts or  
not. There is also a difference in tree search since Valkyria and  
Fuego probably search their trees differently, and it could be that  
Valkyria search deep trees

inefficiently.

Maybe I should run a similar test against a light version of Valkyria  
to control for the search algorithm.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Scoring - step function or sigmoid function?

2009-07-07 Thread Stefan Kaitschick

My conclusion was that the winning percentage is more than just an
estimate of how likely the player is to win. It is in fact a crude
estimator of the final score.

Going back to your original comment, when choosing between move A that
leads to a 0.5pt win, and move B that leads to a 100pt win, you should
be seeing move B has a higher winning percentage.

Darren


Good point. Wins will occur in clusters, so win-rate and score go hand in 
hand.
MC algorithms seem to be treacherous ground when it comes to anticipating 
consequences.

Here's another shot:

MC bots play handicap games poorly.
When they take w they go into maniac mode because they can't find any wins.
When they take b they dither around until the game gets close.
What is entirely missing are the concepts of catching up or of preserving a 
lead.
So my suggestion would be to pretend that there is a komi that almost 
compensates the handicap.

A numerical example: a 4 stone handicap is worth about 45 points.
Start with a virtual komi of about 35 points, and decrease that value to 0 
during the course of the game.
This works in both directions. If the bot has w he will be pessimistic, but 
not suicidal.

If he has b he will be optimistic, but not so lazy.
The rate of giving rope should depend on the number of moves played and on 
the winning percentage.

If the winning percentage drops early, reduce the virtual komi more sharply.
One advantage of this approach would be that it wouldn't tinker with even 
game stategies.
A more elaborate scheme would be to make several preliminary searches at 
different komi levels.
The goal would not be to find the best move. The search would only  try to 
find the win rate for the komi.
Depending on how far the game progressed, giving a virtual komi will be 
worth some win rate reduction.
(Or taking a virtual komi will increase the winrate, making the bot play 
less maniacal than playing a string of kothreats in an unfavorable position.
After a decision on the komi is reached, a search is done for this komi to 
find the move to be played.

Towards the end of the game the virtual komi must allways be 0.
This strategy might even be useful for even games, when the winrate strongly 
favors one side before the late endgame.
That  would revert to the handicap game situation.(Except that a disparity 
of strength is not presumed)


Stefan




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Really basic question

2009-07-07 Thread Don Dailey
On Tue, Jul 7, 2009 at 3:49 AM, Magnus Persson magnus.pers...@phmp.sewrote:

 Quoting Oliver Lewis ojfle...@gmail.com:

  Others on this list have reported in the past that the randomness is
 actually very important.  Playouts that are very heavy, no matter how
 clever they are, actually reduce the performance because they narrow the
 number of games too much.


 I would like to disagree with this statement. I think it is difficult to
 make a good heavy playouts, but it is not impossible.

 Failing to make a playout stronger through heaviness does not prove
 anything. It just mean one has failed.

 If I could make a heavy playout of 1 Dan strength and then run in MC tree
 search. I am sure it would be stronger than the playout itself.


I agree with everything you said.

In the Mogo papers it is claimed that having stronger playouts does not
NECESSARILY lead to stronger play in general.   I don't believe everything I
read, but I think there may be something to this.Also, having a random
element to this does not imply weakening the play,  perhaps it's more like
varying the playing style.   If the Mogo team is correct  the formula seems
to be that there is something in the playouts that can compliment the search
in some way not fully understood.

As you gradually go from random to deterministic you cease to have Monte
Carlo and you have something that is more like a best first search.It
may be that in a few years our programs will gradually transition away from
MC and more toward TS using best first methods. Maybe that is really
what we have discovered and the MC part is distracting us?




 The problem I think is to find a good tradeoff between heavyness and speed.
 In my test with Valkyria vs Fuego, Valkyria is superior when the number of
 playouts are the same. But Fuego can play 5 times more playouts per second
 on the hardware that results in Fuego being slightly stronger than Valkyria
 at the moment.


The search is good at some things and poor at other things.   These 2
aspects need to work together in a complimentary way and in my opinion will
mean more domain specific knowledge. Your observation concerning Fuego
and Valkyria indicate that there is a lot of overlap - you can make up
search with knowledge and knowledge with search.

I believe the huge lack of progress over the years (until recently) has been
the blind failure to recgonize that you cannot cover everything with
knowledge but we must not move too far in the other direction either.
Domain specific GO knowledge is too brittle to do it all,  but should be
provided as hints to a search.   That is exactly how us humans do it.   We
try to back up our positional judgement and knowledge with concrete
analysis.When the knowledge, understanding and intuition is strong, less
analysis is needed and visa versa.

I have had many discussions with Larry Kaufman,  who works with the Rybka
chess team - currently the strongest chess program in the world.   I think
some of the things we have talked about applies very much to GO. It is
very interesting to me that even though he is heavily involved in the
knowledge engineering aspect of the program,   he seems to feel that Rybka
is confined by the knowledge part.He tells me (and this applies to ALL
programs, not just Rybka),  that there are certain positions, ideas or
themes where Rybka is a victim of it's evaluation function.   Adding an
additional order of magnitude more time to the search is not going to change
the basic misconception if the evaluation just doesn't understand the
position. So you can have 2 equally strong chess programs that are
playing stronger than any human player,  choosing a different move in the
same position and one can be correct and the other wrong - and the one that
is wrong may not find the correct  move even thinking 10X longer.

That is a pretty depressing thought for GO programming because if it's a
problem in Chess, then it can only be worse for GO.   So I suspect that from
your description of the differences between Valkyra and Fuego  there will be
huge differences in which positions Valkyra plays better vs Fuego.

One thing I have found in chess is that each piece of knowledge has
side-effects.   Every new rule of thumb that you impose, makes your
program a bit more dogmatic.I believe the trick is figuring out how not
to make it too dogmatic,  while  still giving it a sensible set of
heuristics to guide it along.




 It just cannot be that having strong playout leads to worse play in
 general. It is just a matter of not slowing down the code too much and add
 just those heavy elements that do increase playing strength.


Yes,  I guess I just repeated you but in a different way.

- Don






 -Magnus

 --
 Magnus Persson
 Berlin, Germany

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Experimentation (was: Really basic question)

2009-07-07 Thread David Fotland
Experiments with equal playouts are much easier to control to get accurate
results.  

David

 
 What is the motivation in this? I cannot conceive of any good reason for
 running an experiment this way, so I would be interested in opinions. It
 seems to me that making algorithms heavier and then demonstrating that
 they are stronger with the same number of playouts misses the point -
 why would one not run an experiment under the same time conditions
instead?
 
 Christian
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Anchor on cgos 19x19?

2009-07-07 Thread David Fotland
Many Faces has won almost 100 games in a row to get a high rating, but
without an anchor, that rating is probably not accurate.  Is anyone willing
to put up an anchor or a stronger program that has lots of games and a
stable rating?

David


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation (was: Really basic question)

2009-07-07 Thread Don Dailey
On Tue, Jul 7, 2009 at 6:31 AM, Christian Nentwich 
christ...@modeltwozero.com wrote:


 The problem I think is to find a good tradeoff between heavyness and
 speed. In my test with Valkyria vs Fuego, Valkyria is superior when the
 number of playouts are the same. But Fuego can play 5 times more playouts
 per second on the hardware that results in Fuego being slightly stronger
 than Valkyria at the moment.

 Indeed - this makes me wonder why I keep seeing papers where different
 versions of algorithms are compared with the same number of playouts, rather
 than under the same time limit.


Because this is the right testing methodology to use.   The first thing you
want to know is if the core idea is good.   This is because you will never
know if you implemented it in the fastest possible way. But once you
know that the idea gives you better results with the same number of
playouts  you have identified something about it that is superior - then you
can go from there.

There are two aspects that you are concerned about with tests like this.
The first and most important thing is the theoretical soundness of the idea
or approach being used.The second is the engineering issues, which are
really quite open ended and tough to nail down accurately.   Not only that,
but you can kill 2 birds with one stone - if the theory is not sound, then
there is no need to pursue the engineering aspect.

There is probably no great crime in doing it your way if your only goal is
to produce a strong program,  but if your test fails you don't really know
if it failed because the idea is stupid or if your implementation is unduly
slow.

If you are writing a paper you certainly do not want to claim results based
solely on just your particular implementation of an idea that might be bad
anyway.   There is nothing wrong with presenting an engineering paper about
an interesting way to implement something, but even then it would be a
pretty silly paper if you did not present at least some analysis on why
someone would WANT to implement this thing i.e. it is a commonly used thing
(a new sorting algorithm)  or has some practical application that you can
identify.



What is the motivation in this? I cannot conceive of any good reason for
 running an experiment this way, so I would be interested in opinions. It
 seems to me that making algorithms heavier and then demonstrating that they
 are stronger with the same number of playouts misses the point - why would
 one not run an experiment under the same time conditions instead?


As I said,  when you are writing a paper you have to prove that something is
theoretically sound, at least empirically.   If you are writing an
engineering paper you might present an interesting implementation of some
idea, but it should be an idea that has first been shown to be interesting
in some way. For instance a new faster sorting algorithm is interesting.
But you certainly don't want to present evidence that YOUR
IMPLEMENTATION of YOUR IDEA is no good when you have not even attempted to
establish whether the idea itself is viable.


- Don







 Christian
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Experimentation

2009-07-07 Thread Christian Nentwich

Don,

I see where this argument comes from, but you are not taking it to its 
conclusion. Unless you can, in the end, show that your algorithm can 
outperform another one with the same time limit, you have not proved 
that it is an advance. That is why tournaments are played with time 
limits, not limits on the number of playouts. Time is an important 
dimension of the game of Go.


Here is why: if I follow your argument to its conclusion, I can come up 
with an algorithm that spends five minutes per move looking through 
databases of moves, performing simulations, and god knows what, and show 
a 20% improvement over an algorithm that thinks for 5 seconds. That is 
an interesting intermediate result. But would you then allow me to 
write obviously it will be just as quick as soon we've done a bit of 
optimisation in the epilogue, and accept that? What if there is no way 
to optimise it? What if the old algorithm will always scale better with 
the same time limit, on faster hardware? I guess you might let the 
argument pass under tight conditions: the other program is a version 
of the same program, and architecturally similar; and no algorithms 
beyond linear complexity per move were added, or something of the sort.


You mix proof and evidence a few times in the same paragraph; the 
sorting algorithm is probably not a good comparison. With a sorting 
algorithm, you would first prove its big O and omega complexities, maybe 
look at average cases too. I don't see a lot of people trying to prove 
that their algorithms are improvements in Go, because it is generally 
too complex; so evidence is all we get.


Christian


Don Dailey wrote:



On Tue, Jul 7, 2009 at 6:31 AM, Christian Nentwich 
christ...@modeltwozero.com mailto:christ...@modeltwozero.com wrote:



The problem I think is to find a good tradeoff between heavyness
and speed. In my test with Valkyria vs Fuego, Valkyria is superior
when the number of playouts are the same. But Fuego can play 5
times more playouts per second on the hardware that results in
Fuego being slightly stronger than Valkyria at the moment.

Indeed - this makes me wonder why I keep seeing papers where
different versions of algorithms are compared with the same number
of playouts, rather than under the same time limit.


Because this is the right testing methodology to use.   The first 
thing you want to know is if the core idea is good.   This is because 
you will never know if you implemented it in the fastest possible 
way. But once you know that the idea gives you better results with 
the same number of playouts  you have identified something about it 
that is superior - then you can go from there.


There are two aspects that you are concerned about with tests like 
this.  The first and most important thing is the theoretical soundness 
of the idea or approach being used.The second is the engineering 
issues, which are really quite open ended and tough to nail down 
accurately.   Not only that, but you can kill 2 birds with one stone - 
if the theory is not sound, then there is no need to pursue the 
engineering aspect. 

There is probably no great crime in doing it your way if your only 
goal is to produce a strong program,  but if your test fails you don't 
really know if it failed because the idea is stupid or if your 
implementation is unduly slow.   

If you are writing a paper you certainly do not want to claim results 
based solely on just your particular implementation of an idea that 
might be bad anyway.   There is nothing wrong with presenting an 
engineering paper about an interesting way to implement something, but 
even then it would be a pretty silly paper if you did not present at 
least some analysis on why someone would WANT to implement this thing 
i.e. it is a commonly used thing (a new sorting algorithm)  or has 
some practical application that you can identify. 




What is the motivation in this? I cannot conceive of any good
reason for running an experiment this way, so I would be
interested in opinions. It seems to me that making algorithms
heavier and then demonstrating that they are stronger with the
same number of playouts misses the point - why would one not run
an experiment under the same time conditions instead?


As I said,  when you are writing a paper you have to prove that 
something is theoretically sound, at least empirically.   If you are 
writing an engineering paper you might present an interesting 
implementation of some idea, but it should be an idea that has first 
been shown to be interesting in some way. For instance a new 
faster sorting algorithm is interesting. But you certainly don't 
want to present evidence that YOUR IMPLEMENTATION of YOUR IDEA is no 
good when you have not even attempted to establish whether the idea 
itself is viable. 

 
- Don




 




Christian
___
computer-go mailing list
 

Re: [computer-go] Experimentation

2009-07-07 Thread Christian Nentwich

Magnus,

along the lines of the argument I am trying to make: did you try your 
experiments with time limits from 30 seconds per game to five minutes 
per game (say), rather than playouts? Using gogui-twogtp this is 
relatively easy to achieve.


I am obviously not associated with Fuego, but I guess it is reasonable 
to assume that Fuego's architecture was not designed to operate at 
limits like 2, 8 or 32 simulations in the same way Valkyria was. It is 
an interesting study in its own right for scalability purposes; but to 
go on to infer strength from it would seem like comparing apples and 
oranges.


Christian


Magnus Persson wrote:

Quoting Darren Cook dar...@dcook.org:


* The scaling behavior might be different. E.g. if Fuego and Valkyria
are both run with 10 times more playouts the win rate might change. Just
to dismiss an algorithm that loses at time limits that happen to suit
rapid testing on today's hardware could mean we miss out on the ideal
algorithm for tomorrow's hardware. (*)


I just happened to have experimental data on exactly this topic. This 
is Valkyria vs Fuego where I scale the number of playouts (Sims) x4 in 
each row.


SimsWinrateErrNWREloDiff
299.20.45000.992837
898.20.65000.982696
3294.215000.942484
12888.81.45000.888360
512821.75000.82263
204883.21.74990.832278
819281.31.74970.813255
3276875.53.61390.755196

The data shows clearly that the 0.3.2 version of Fuego I use, probably 
plays really bad moves with a high frequency in the playouts. With 
more playouts a lot of these blunders can be avoided I guess and the 
win rate goes down from 99% towards 80%. The question here if it goes 
asymptotically towards 80% or perhaps 50% with more simulations. 
Unfortunately I cannot continue this plot because I run out of memory 
and it takes ages to finish the games.


So the question is then: are there a fixed gain of the heavy playouts 
with more than  512 simulations or does the effect of the heavy 
playout get less and less important with larger tree size.


Note also that this also not only a matter of having heavy playouts or 
not. There is also a difference in tree search since Valkyria and 
Fuego probably search their trees differently, and it could be that 
Valkyria search deep trees

inefficiently.

Maybe I should run a similar test against a light version of Valkyria 
to control for the search algorithm.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/




--

Christian Nentwich

Director, Model Two Zero Ltd.
+44-(0)7747-061302
http://www.modeltwozero.com

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Hellwig Geisse
On Tue, 2009-07-07 at 17:09 +0100, Christian Nentwich wrote:
 [..] Unless you can, in the end, show that your algorithm can 
 outperform another one with the same time limit, you have not proved 
 that it is an advance. That is why tournaments are played with time 
 limits, not limits on the number of playouts. Time is an important 
 dimension of the game of Go.

This is somewhat funny: normally, it is Don who argues along
these lines.. ;-)

Hellwig

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Really basic question

2009-07-07 Thread terry mcintyre
megasnippage of Don's post:

 Adding an additional order of magnitude more time to the search [ in Rybka]  
 is not
going to change the basic misconception if the evaluation just doesn't
understand the position. [ Don suggests that this applies even more so to Go ]

Amen! We've covered variants of this theme a few times. 

I believe that when the playout component knows as much about life-and-death, 
corner plays, nakade, running fights, semeai, and seki as a dan-level player, 
the power of the tree search will leverage that knowledge far better than at 
present.

 Terry McIntyre terrymcint...@yahoo.com


“We hang the petty thieves and appoint the great ones to public office.” -- 
Aesop


  ___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Experimentation

2009-07-07 Thread Brian Sheppard
If your program is limited by memory size, then it makes sense to
limit experiments by trial count.

Running on your development computer, you might be limited by
clock time. Running on competition hardware, you might not be.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Gian-Carlo Pascutto
Brian Sheppard wrote:

 Running on your development computer, you might be limited by
 clock time. Running on competition hardware, you might not be.

Only if the algorithm doesn't scale.

Which makes it uninteresting to begin with.

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Don Dailey
On Tue, Jul 7, 2009 at 12:09 PM, Christian Nentwich 
christ...@modeltwozero.com wrote:

 Don,

 I see where this argument comes from, but you are not taking it to its
 conclusion. Unless you can, in the end, show that your algorithm can
 outperform another one with the same time limit, you have not proved that it
 is an advance. That is why tournaments are played with time limits, not
 limits on the number of playouts. Time is an important dimension of the game
 of Go.


You are the one that started this conversation with a question about why
authors of papers seem to be concerned more with the theoretical properties
of algorithms than the practical implementation details.   I'm just trying
to give you the answer.   If you don't like the answer, why did you ask the
question?

You may not like it,  but I'm telling you that in general these kind of
papers are about ideas,  not implementations.   A good paper will have a
primary focus on one or the other.



 Here is why: if I follow your argument to its conclusion, I can come up
 with an algorithm that spends five minutes per move looking through
 databases of moves, performing simulations, and god knows what, and show a
 20% improvement over an algorithm that thinks for 5 seconds. That is an
 interesting intermediate result. But would you then allow me to write
 obviously it will be just as quick as soon we've done a bit of
 optimisation in the epilogue, and accept that?


I never said that.You constructed a ridiculous example here that does
NOT produce an interesting result.Your basic idea has to be interesting
in the first place if you are going to write a paper that someone might
actually want to read.

If you can produce a result that gives a 20% improvement and the idea is not
ridiculous,  then you may have an interesting paper even if it's difficult
to get the performance you need to put in a practical program.  It would
certainly be a badly written paper if your primary conclusion was that we
couldn't get it to run fast enough.  The first question any reader
would ask is how much improvement is it with the same number of
playouts? What would your answer be?   We didn't try that because the
only thing that matters to me is that it plays well on my computer with the
given time contstraints.(That may be true, and there is nothing wrong
with that attitude, but it's not one you would focus a paper on.)

There are a ton of algorithms out there that do not run well on a given
architecture, but may run well on some other.   Or it may be highly
feasible with dedicated hardware,  or perhaps be particularly easy to
parallelize.If the basic idea is sound, that is a much more important
result than how much time YOUR implementation took.   If you draw
conclusions based on that you just have a paper that has your opinion in
it.You become a philosopher instead of a scientist.

There is nothing wrong with presenting the idea and performance figures but
if you want others to have some respect for your paper you need to make sure
you are not drawing conclusions based on YOUR implementation and hardware
platform.



 You mix proof and evidence a few times in the same paragraph;


So what?   I'm not confusing the terms.  You just mentioned it in the same
sentence but that was not wrong either.   It's common when you are dealing
with inexact sciences like games programming to accept as proof strong
empirical evidence.   It's called empirical proof.


 the sorting algorithm is probably not a good comparison. With a sorting
 algorithm, you would first prove its big O and omega complexities, maybe
 look at average cases too. I don't see a lot of people trying to prove
 that their algorithms are improvements in Go, because it is generally too
 complex; so evidence is all we get.


Yes, we are not dealing with an exact science here.That is why you need
to at least show that in a given test program the algorithm has merit by
isolating the theory from the implementation details as much as possible.
It's certainly more interesting to me to know that the basic idea works in
other programs and I can speculate whether it is likely to work in mine.
I don't want also to have to speculate on whether you implemented it
efficiently or not,  whether there is some advantage to some platforms over
others,  etc.

Now I'm not saying it's bad  to include performance numbers - that might be
interesting to look at but it's second order information - it's not the key
thing.Whether the paper author could make it work from a performance
standpoint is not as important as whether the basic idea has some
feasibility.

- Don



 Christian


 Don Dailey wrote:



 On Tue, Jul 7, 2009 at 6:31 AM, Christian Nentwich 
 christ...@modeltwozero.com mailto:christ...@modeltwozero.com wrote:


The problem I think is to find a good tradeoff between heavyness
and speed. In my test with Valkyria vs Fuego, Valkyria is superior
when the number of playouts are the 

[computer-go] Experimentation

2009-07-07 Thread Brian Sheppard
 Running on your development computer, you might be limited by
 clock time. Running on competition hardware, you might not be.

Only if the algorithm doesn't scale.

Perhaps there is a misunderstanding? A scalable algorithm is
limited by the hardware it runs on. The limit may differ when you
change hardware.

My dev machine is slow and old. Most of my testing is limited by
clock time. If I switch to a modern computer then the limits would
change quite a bit. So I am satisfied if a change breaks even while
making the program slower.


Which makes it uninteresting to begin with.

Well, I thought it was interesting.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread Magnus Persson
I have not used time exactly to match up Valkyria and Fuego. But also  
with fixed numbers of simulations one can match them up closely in  
processor time. And then I do that Valkyria wins something like 41-45%  
of the time.


I never intentionally designed Valkyria to work well small number of  
simulation as in these tests, but in principle you have to do that no  
matter how many simulations you run per move, because you will always  
have few simulations in the leaves of the tree. And if the leaves are  
evaluated strongly then the nodes nearer to the root also will benefit.


Magnus

Quoting Christian Nentwich christ...@modeltwozero.com:


Magnus,

along the lines of the argument I am trying to make: did you try your
experiments with time limits from 30 seconds per game to five minutes
per game (say), rather than playouts? Using gogui-twogtp this is
relatively easy to achieve.

I am obviously not associated with Fuego, but I guess it is reasonable
to assume that Fuego's architecture was not designed to operate at
limits like 2, 8 or 32 simulations in the same way Valkyria was. It is
an interesting study in its own right for scalability purposes; but to
go on to infer strength from it would seem like comparing apples and
oranges.

Christian


Magnus Persson wrote:

Quoting Darren Cook dar...@dcook.org:


* The scaling behavior might be different. E.g. if Fuego and Valkyria
are both run with 10 times more playouts the win rate might change. Just
to dismiss an algorithm that loses at time limits that happen to suit
rapid testing on today's hardware could mean we miss out on the ideal
algorithm for tomorrow's hardware. (*)


I just happened to have experimental data on exactly this topic.   
This is Valkyria vs Fuego where I scale the number of playouts   
(Sims) x4 in each row.


SimsWinrateErrNWREloDiff
299.20.45000.992837
898.20.65000.982696
3294.215000.942484
12888.81.45000.888360
512821.75000.82263
204883.21.74990.832278
819281.31.74970.813255
3276875.53.61390.755196

The data shows clearly that the 0.3.2 version of Fuego I use,   
probably plays really bad moves with a high frequency in the   
playouts. With more playouts a lot of these blunders can be avoided  
 I guess and the win rate goes down from 99% towards 80%. The   
question here if it goes asymptotically towards 80% or perhaps 50%   
with more simulations. Unfortunately I cannot continue this plot   
because I run out of memory and it takes ages to finish the games.


So the question is then: are there a fixed gain of the heavy   
playouts with more than  512 simulations or does the effect of the   
heavy playout get less and less important with larger tree size.


Note also that this also not only a matter of having heavy playouts  
 or not. There is also a difference in tree search since Valkyria   
and Fuego probably search their trees differently, and it could be   
that Valkyria search deep trees

inefficiently.

Maybe I should run a similar test against a light version of   
Valkyria to control for the search algorithm.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/




--

Christian Nentwich

Director, Model Two Zero Ltd.
+44-(0)7747-061302
http://www.modeltwozero.com

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/




--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Experimentation

2009-07-07 Thread terry mcintyre
Digesting Don's remarks, it seems that a reasonable method would start with 
equal numbers of simulations. It could then proceed to try various 
optimizations, and compare algorithms which use equal time. 

It makes perfect sense for the ultimate goal to be better performance using 
the same time or less, but it might be easier to progress stepwise - first 
tweak the top-level design, then tweak the performance - in order to separate 
out the different inputs to the experiment.

Terry McIntyre terrymcint...@yahoo.com



“We hang the petty thieves and appoint the great ones to public office.” -- 
Aesop




  ___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Experimentation

2009-07-07 Thread steve uurtamo
something i've played with a little bit:

only look at algorithms with the following
property:

* they every so often update an in-memory score for each board position.

you can then run a timing loop around this and just make the
highest-scoring valid move the play.  you can use a signal handler to
dump the list at any time.

s.

2009/7/7 terry mcintyre terrymcint...@yahoo.com:
 Digesting Don's remarks, it seems that a reasonable method would start with
 equal numbers of simulations. It could then proceed to try various
 optimizations, and compare algorithms which use equal time.

 It makes perfect sense for the ultimate goal to be better performance using
 the same time or less, but it might be easier to progress stepwise - first
 tweak the top-level design, then tweak the performance - in order to
 separate out the different inputs to the experiment.

 Terry McIntyre terrymcint...@yahoo.com

 “We hang the petty thieves and appoint the great ones to public office.” --
 Aesop



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Scoring - step function or sigmoid function?

2009-07-07 Thread Darren Cook
 MC bots play handicap games poorly.
 ...
 So my suggestion would be to pretend that there is a komi that almost
 compensates the handicap.

The same ideas has been suggested before [1]. The counter comments were
mostly that hard to get right or tried that briefly and it didn't
work well.

Darren

[1]: I think starting here, and then the dozen or so followup messages.
http://computer-go.org/pipermail/computer-go/2008-August/015859.html


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/mlsn/ (Multilingual open source semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/