Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread Sylvain Gelly

Hi Rémi,

Thank you for this paper. I found the work very interesting, well
written, and the paper is clear and pleasant to read.
As two things are modified in the same time (simulation policy and
tree search), I wonder what is the contribution of each part in the
overall improvement. For example I wonder what is the exact
improvement of the new policy simulation on itself (without modifying
UCT) over the one of MoGo. I guess you already have those results, but
don't have enough room to put it on this paper. For example, if I
remember correctly plain UCT with MoGo's simulation policy at 70k per
move was 83% against gnugo 3.6. What is the result with your
simulation policy? 85%/90%/95 %/ more?
That would help to know where this approach is more usefull:
simulation, tree or even both. I mean it is possible that combining
the two improvements makes a stronger player that taking the sum of
each improvement. If so, that would mean that some win-win effect
arises, and the tree search part type has to be related to the
simulation type.

Again, very interesting work.
Sylvain

2007/5/17, Rémi Coulom [EMAIL PROTECTED]:

Hi,

I first thought I would keep my ideas secret until the Asmterdam
tournament, but now that I have submitted my paper, I cannot wait to
share it. So, here it is:

http://remi.coulom.free.fr/Amsterdam2007/

Comments and questions are very welcome.

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread Magnus Persson

Thanks, for sharing this early!

Everything in this paper makes perfect sense, and is well written.
Reading it is
like finding the last pieces of a difficult puzzle.

I do however have some problem with the math of section 2.5 A
Minorization-Maximization Algorithm. I have a feeling that a lot of
definitions are left out here. What is A, B, M and N? I think I can get what N
is. Somehow I have a feeling that you are using some special notation of
summation here, but even so I have no clue what is summed.

Also for Wi you use {} and || which might mean something special, but
since I am
already quite confused it hard to figure out.

Later on the section Derivation of the Minorization-Maximization Formula
seemed pretty clear to me although that was supposed to be more difficult.

Best
Magnus

Quoting Rémi Coulom [EMAIL PROTECTED]:


Hi,

I first thought I would keep my ideas secret until the Asmterdam
tournament, but now that I have submitted my paper, I cannot wait to
share it. So, here it is:

http://remi.coulom.free.fr/Amsterdam2007/

Comments and questions are very welcome.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] producing a good probability distribution over legal moves

2007-05-17 Thread Brian Slesinsky

I think there is something to this; it seems like it should be
possible to use a database of randomly selected positions from games
along with the best known followup, and use that as a faster way of
testing a program's strength than playing full games.  Such a database
would be valuable for all sorts of approaches.

The question is how you know whether the program has discovered a
better move than the best known move, or just chose another move
that's just as good?  Probably it would have to be generalized into
keeping a score for each followup, and again the question becomes how
much to trust it and how to update it.  Perhaps the scores could be
kept from running a large number of simulations and we could test how
rapidly the program converges on that using fewer resources.

A weakness of this approach is that sometimes the best move depends on
how you plan to follow it up; a program that plays the theoretically
best move but doesn't know how to follow it up is weaker than a
program that plays safer moves.

- Brian

On 5/16/07, George Dahl [EMAIL PROTECTED] wrote:

I find Monte-Carlo Go a fascinating avenue of research, but what pains
me is that a huge number of simulations are performed each game and at
the end of the game the results are thrown out.  So what I was
thinking is that perhaps the knowledge generated by the simulations
could be collapsed in some way.

Suppose that epsilon greedy versions of a reasonably strong MC Go
program were to play a very large number of games against themselves.
By epsilon greedy versions I mean that with probability epsilon a
random move is played and with probability 1- epsilon the move the MC
Player would normally play is played.  Each position in these games
would be stored along with the Monte Calro/UCT evaluation for that
position's desirability.  This would produce an arbitrarily large
database of position/score pairs.  At this point a general function
approximator / learning algorithm (such as a neural network) could be
trained to map positions to scores.  If this was successful, it would
produce something that could very quickly (even a large neural net
evaluation or what have you would be much faster than doing a large
number of MC playouts) map positions to scores.  Obviously the scores
would not be perfect since the monte carlo program did not play
anywhere near perfect Go.  But this static evaluator could then be
plugged back into the monte carlo player and used to bias the random
playouts.  Wouldn't it be useful to be able to quickly estimate the MC
score without doing any playouts?

Clearly this idea could be extended recursively with a lot of offline
training.  What makes this formulation more valuable is that given
enough time and effort someone familiar with machine learning should
be able to produce a learning architecture that can actually learn the
MC scores.  It would be a straightforward, if potentially quite
difficult, supervised learning task with effectively unlimited data
since more could be generated at will.  Such a learning architecture
could be used in the manner I described above or thrown at the more
general reinforcement learning problem.


Does anyone have any thoughts on this idea?  Does anyone know of it
being tried before?

- George
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] producing a good probability distribution over legal moves

2007-05-17 Thread David Doshay

On 17, May 2007, at 8:17 AM, Brian Slesinsky wrote:


A weakness of this approach is that sometimes the best move depends on
how you plan to follow it up; a program that plays the theoretically
best move but doesn't know how to follow it up is weaker than a
program that plays safer moves.


I have often said that when SlugGo makes what looks to me to be a
really good move, my joyful surprise quickly turns to worry about
what is about to happen. It is exactly this inability to make the proper
continuing followup moves that is the problem.



Cheers,
David





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] producing a good probability distribution over legal moves

2007-05-17 Thread Zach Keatts

What you would have after your training/evaluator phase is a hueristic
knowlege of possibly better montecarlo trees to consider.  This will
definitely cut down on the search space, but could also alienate a strong
search path.  I have been thinking along these same line for some time.  The
problem then lies in where you can decide what trees would be worth looking
at initially.  What about a database of professional games?  Take the
winning games as examples of strong searches that ended in a win.  The
problem is even more complex because where in the winning tree do you tell
montecarlo to start searching?  Will you assign a higher probability to each
move in those games (defining a known probabilistically stronger predicted
result)?

That is one approach.  The other approach is the purely simulated approach
where you run simulations and gradually allow your probability function to
evolve based on the results.  Although this is a purer approach I think the
aforementioned strategy may yield some useful experimentation.  The strong
point is that it will take advantage of the human brain's innate pattern
recognition and calculation skills.  Since we have recorded games we have
plenty of examples of this thought process.  For a 9Dan winning game,
those trees surely are worth investigating...


On 5/16/07, George Dahl [EMAIL PROTECTED] wrote:


I find Monte-Carlo Go a fascinating avenue of research, but what pains
me is that a huge number of simulations are performed each game and at
the end of the game the results are thrown out.  So what I was
thinking is that perhaps the knowledge generated by the simulations
could be collapsed in some way.

Suppose that epsilon greedy versions of a reasonably strong MC Go
program were to play a very large number of games against themselves.
By epsilon greedy versions I mean that with probability epsilon a
random move is played and with probability 1- epsilon the move the MC
Player would normally play is played.  Each position in these games
would be stored along with the Monte Calro/UCT evaluation for that
position's desirability.  This would produce an arbitrarily large
database of position/score pairs.  At this point a general function
approximator / learning algorithm (such as a neural network) could be
trained to map positions to scores.  If this was successful, it would
produce something that could very quickly (even a large neural net
evaluation or what have you would be much faster than doing a large
number of MC playouts) map positions to scores.  Obviously the scores
would not be perfect since the monte carlo program did not play
anywhere near perfect Go.  But this static evaluator could then be
plugged back into the monte carlo player and used to bias the random
playouts.  Wouldn't it be useful to be able to quickly estimate the MC
score without doing any playouts?

Clearly this idea could be extended recursively with a lot of offline
training.  What makes this formulation more valuable is that given
enough time and effort someone familiar with machine learning should
be able to produce a learning architecture that can actually learn the
MC scores.  It would be a straightforward, if potentially quite
difficult, supervised learning task with effectively unlimited data
since more could be generated at will.  Such a learning architecture
could be used in the manner I described above or thrown at the more
general reinforcement learning problem.


Does anyone have any thoughts on this idea?  Does anyone know of it
being tried before?

- George
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Professional Go player at the Computer Olympiad

2007-05-17 Thread Chaslot G (MICC)
Dear all,

We are proud to announce that Guo Juan, 5d professional from China, will
be present at the Computer Olympiad in Amsterdam on the Sunday 17th of
June. She will play against the winner of the Olympiad and comment the
most interesting games. We will broadcast as many Go games and comments
as possible on KGS. 

The website of the Olympiad can be found here:
http://amsterdam2007.icga.org/

Best regards, and hopping to meet some of you there,

Guillaume
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread Brian Slesinsky

This is very interesting.  I've skimmed so far and don't understand
everything yet, but I can make some comments on readability.

Table 1 takes some study to understand.  If I understand it correctly,
the Feature column might be more clearly labeled Feature of Candidate
Move.  (For example, Pass means that the candidate move is a pass.)
The Level column might be more clearly labeled the Alternative
for that feature.

It was unclear what distance to the move before means at first.  It
sounds like this is distance to the move before the previous move or
maybe distance from own previous move.

- Brian

On 5/16/07, Rémi Coulom [EMAIL PROTECTED] wrote:

Hi,

I first thought I would keep my ideas secret until the Asmterdam
tournament, but now that I have submitted my paper, I cannot wait to
share it. So, here it is:

http://remi.coulom.free.fr/Amsterdam2007/

Comments and questions are very welcome.

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Chris Fant

It seems that e-mail at my university does not work any more. I have
received none of the replies to my message of yesterday, but I could
read them on the web archives of the list. So I have registered from
another address, and will answer to the questions I have read on the web.



In section 2.3, you say:

Also, the generalization to teams assumes that the strength of a team
is the sum (in terms of Elo ratings) of the strengths of its members.

But it seems from the equation just above that section that you means
the product and not the sum.  Or did I misunderstand something?
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] JCIS extended abstract

2007-05-17 Thread Chaslot G (MICC)
Dear all,


Following the example of Rémi, I would like to share with you a paper that I 
wrote which describe some important elements of my Go program Mango.

I submitted this paper recently to the JCIS workshop 2007. Due to the fact that 
it was an extended abstract, a lot of details are missing. I am now working on 
the full paper, which will contain much more information.

 

The extended abstract can be found here: 
www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf

 

Cheers,

 

Guillaume

 

PS: To answer's Sylvain question: Modifying the probability distribution of 
moves in the tree was more effective in Mango than the improvement done on the 
simulation strategy. However, Mango simulation strategy is not as good as 
CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the last KGS 
tournament because it had a better way to use domain knowledge in the Tree. 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread Chris Fant

Thanks for the great paper.  And thanks for sharing it before it's published.

Now I know what directions to take my engine in next.

Time for Team MoGo so share some more secrets  :)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Rémi Coulom

Álvaro Begué wrote:

There are many things in the paper that we had never thought of, like
considering the distance to the penultimate move.
That feature improved the effectiveness of progressive widening a lot. 
When I had only the distance to the previous move, and the opponent made 
a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to 
move the focus of local search away from the danger. Considering the 
distance to the penultimate move fixes a lot of the problems of local 
search. Maybe I should try to consider distances to even older moves.


Also the approach you devised with John seems to be very similar to what 
I do, indeed. I also worked on the idea of adjusting the random policy 
online to adapt to the current position, with a method that looks like 
incremental Elo rating algorithms. I am deeply convinced that this can 
bring huge improvements. If the random policy only uses general 
patterns, whatever tree search we do at the root won't be able to make a 
strong Go player. We have to find a way to influence random simulations, 
not only inside the tree near the root, but also far from the root. Once 
we know how to do this effectively, we'll have very strong programs.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] producing a good probability distribution over legal moves

2007-05-17 Thread Don Dailey
On Thu, 2007-05-17 at 12:17 -0400, George Dahl wrote:
 Imagine if you had a monte carlo program that took almost no time to
 run.  You would use it to do heavy playouts for another monte carlo
 program to make it even stronger. 

I tried something like this as a test with simple monte carlo.   I
called it recursive monte carlo.

This was not UCT, it was just the simple statistic monte/carlo where you
choose the single move that had the most success in totally random
play-outs.

This started when I noticed that even running 1 or 2 simulations gave
far better play than random.   So my idea was that instead of choosing
moves randomly in the play-outs,  I would choose each  move in the
playout by doing another set of play-outs - applying the same simulation
idea recursively.

I didn't expect it to be fast enough to be practical, but I thought that
at least I could see if the idea was feasible.   If I'm doing 1000
play-outs, the version that does the sub-play-outs  should be far
stronger than the one that didn't even if it's a lot slower.

Instead, it was much slower and played worse.   I didn't pursue this any
farther, but I suspect that it should have worked.   I think the problem
is that I wasn't getting fair representation of each move.   The
sub-playouts were causing some moves not to get sampled much for
instance.  

I am guessing that if I used such a technique for the play-out portion
of the monte-carlo search, it would be incredibly strong for a given
level if I only considered the number of play-outs of the outer-level.

- Don


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Magnus Persson

Yes, now I understand. I think what made it hard for me conceptually was that
P(Rj) can be rewritten n different ways for each feature ci 1 = i = n. I had
this problem with your example too. I first thought that the lines with the
factors were arbitarary, but then I realized that each line goes with one way
of rewriting P(Rj) it all became clear.

Quoting Rémi Coulom [EMAIL PROTECTED]:



to Magnus: If you consider the example of section 2.2: 1,2,3 wins
against 4,2 and 1,5,6,7. The probability is
P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example:
N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2
N2j=c1c3,B2j=0
N3j=c1c2,B3j=0
N4j=0,B4j=c1c2c3
I will add this example to the paper if it makes things clearer.

Regarding the definition of Wi, |{Nij|Nij!=0}| means number of
elements of the set of all the Nij that verify Nij!=0. It is
incorrect. It should be |{j|Nij!=0}|. I'll fix it in the final
version. As I wrote in the next sentence, it is equal to the number
of wins of i.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread Chris Fant

I first thought I would keep my ideas secret until the Asmterdam
tournament, but now that I have submitted my paper, I cannot wait to
share it. So, here it is:

http://remi.coulom.free.fr/Amsterdam2007/

Comments and questions are very welcome.


I'd like to propose a potential direction of further research.  In
your paper, you acknowledge that the strong assumption that each
feature's Elo can be added to form the feature team Elo may not be
correct all the time.

The Stern/Herbrich/Graepel method did not need to make this assumption
because for them a feature team was it's own first class feature
(leading to exponential growth of the number of features).  You could
evaluate the degree to which each feature violates additive-Elo
assumption by distributing that feature to all the other features and
retesting the prediction rate.

For example, instead of having features {Pass, Capture, Extension},
you would evaluate the Pass feature additive-Elo assumption by testing
with features {Capture, Extension, Pass-Capture, Pass-Extension}.

This obviously leads to more first class features, but you can test
each one one at a time to see if it is worth it.  Or at least to
validate that the additive-Elo assumption is okay in most cases.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread dhillismail
Very interesting paper: many ideas for me to study. And thanks for giving us an 
early look at it. 
 
I'll make one suggestion: in the final version, it deserves some better 
salesmanship in the results section. I've read through too many papers, only to 
reach the results section at the end and see something like although our 
program was unable to defeat Wally on a 9x9 board, the authors still feel... 
So now I look there first to see if I want to invest time on the paper. 
 
When I look at the end of this paper, what jumps out is 
 37.5% victories against Gnugo. 
IMHO, it's asking an awful lot of potential readers to expect them to recognize 
that this is an impressive result, which of course it is. I think it would be 
easier for people to understand if the Performance against Gnugo section 
started with 9x9 results against gnugo, maybe mentioned scaling issues and the 
13x13 KGS results, and then went on to 19x19. 
 
Just my 2 cents. Now back to studying the paper.
 
Dave Hillis

Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam 
and email virus protection.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] JCIS extended abstract

2007-05-17 Thread Jason House

How do you handle n_i = 0 in equation 1.2?  I'd assume that if a heuristic
says a move is really bad (H_B = 0), then you'd want to avoid simulations
for a while.

Also, has mango experimented with other related strategies for a soft
transition?

In my own mathematical recreation, I came up with a slightly variant method.
Just to keep the text short, here's the notation that I'll use...
 N = number of simulations through parent node
 n_i = number of simulations through this node
 w_i = winning simulations through this node
 v_i = w_i / n_i
 H_B = heuristic_bias
 n_h = heuristic confidence (used in my equation)
 UCTdelta = sqrt(ln(N)/n_i)
 p_hat = estimated probability of winning with soft transition

... then equation 1.2 is p_hat + UCTdelta

For you,
 p_hat = (w_i + H_B)/n_i
I derived and posted the following formula to the list:
 p_hat = (w_i + n_h*H_B)/(n_i+n_h)

The two equations are fairly similar... especially if n_h is set to 1.  Then
the only difference is that I replace n_i with (n_i+1)... and allows p_hat
to be calculated with n_i = 0.  I made no attempt to handle UCTdelta, but
replacing n_i in there with n_i+n_h is probably reasonable

On 5/17/07, Chaslot G (MICC) [EMAIL PROTECTED] wrote:


 Dear all,


Following the example of Rémi, I would like to share with you a paper that
I wrote which describe some important elements of my Go program Mango.

I submitted this paper recently to the JCIS workshop 2007. Due to the fact
that it was an extended abstract, a lot of details are missing. I am now
working on the full paper, which will contain much more information.



The extended abstract can be found here:
www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf



Cheers,



Guillaume



PS: To answer's Sylvain question: Modifying the probability distribution
of moves in the tree was more effective in Mango than the improvement done
on the simulation strategy. However, Mango simulation strategy is not as
good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the
last KGS tournament because it had a better way to use domain knowledge in
the Tree.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Erik van der Werf

On 5/17/07, Rémi Coulom [EMAIL PROTECTED] wrote:

Álvaro Begué wrote:
 There are many things in the paper that we had never thought of, like
 considering the distance to the penultimate move.
That feature improved the effectiveness of progressive widening a lot.
When I had only the distance to the previous move, and the opponent made
a dangerous move, Crazy Stone sometimes wanted to tenuki at the root, to
move the focus of local search away from the danger. Considering the
distance to the penultimate move fixes a lot of the problems of local
search. Maybe I should try to consider distances to even older moves.


Nice paper, thanks! It's funny to see the distance to the last move
now being used as an effective feature in Mont Carlo. Only a few years
ago I got a lot of religious criticism when I used it in my move
predictor...

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-17 Thread Chrilly
I have now also finished a first version of UCT-Suzie (in parallel the Peter 
Woitke works on the Alpha-Beta Version). UCT-Suzie uses a hashtable, mainly 
because I found the programming of the tree too complicated. The Monte-Carlo 
part uses some simple patterns according the MoGo article. Progress is 
rather slow, because I am working (more than) full-time on FPGA-projects in 
Computer-Tomography.


Here are the problems with hash tables as a tree:

1. Time - it is more expensive - you must gather the children together
when making decisions about which node to expand (which generally
involves re-generating the keys by making all the legal moves.)   There
are ways around this that trade space for time but in either case it is
more expensive.

I do not understand this. In UCT-Suzie the moves are generated, when a new 
leaf-node is reached. The Hashtable has a link to the move-list. When the 
node is reached the next time, the moves must not be generated again. Just 
the calculation of the UCT-Urgency value (WinRate + sqrt() ) has to be 
done. I assume that this calculation has to be done also in a tree 
representation. I see no difference in this respect with Gunnars Gnu-GO UCT 
code.
Memory is at least for 9x9 no problem. The number of Monte-Carlo runs/sec. 
is about 17K (9x9). This can be improved, because the UCT-Player uses the 
Alpha-Beta DoMove/UndoMove functions which are overkill for UCT.



2. GHI - you must take special care to deal with Graph History
Interaction - primarly recognizing that ko situations are different.
You can get by with relatively simple solutions that don't fully address
this issue but it's still imperfect.

I have serious problems with KO. UCT-Suzie plays generally strong, but makes 
terrible blunders in KO-positions. So far I do not even understand the 
problem. Could you describe it more detailed?
I had also some serious SuperKO problems. UCT-Suzie was very clever to 
find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not 
deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly. 
GameStack-Overflow.


Chrilly

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Amsterdam 2007 paper

2007-05-17 Thread Rémi Coulom

Chris Fant wrote:

I first thought I would keep my ideas secret until the Asmterdam
tournament, but now that I have submitted my paper, I cannot wait to
share it. So, here it is:

http://remi.coulom.free.fr/Amsterdam2007/

Comments and questions are very welcome.


I'd like to propose a potential direction of further research.  In
your paper, you acknowledge that the strong assumption that each
feature's Elo can be added to form the feature team Elo may not be
correct all the time.

The Stern/Herbrich/Graepel method did not need to make this assumption
because for them a feature team was it's own first class feature
(leading to exponential growth of the number of features).  You could
evaluate the degree to which each feature violates additive-Elo
assumption by distributing that feature to all the other features and
retesting the prediction rate.

For example, instead of having features {Pass, Capture, Extension},
you would evaluate the Pass feature additive-Elo assumption by testing
with features {Capture, Extension, Pass-Capture, Pass-Extension}.
This is a bad example, because Pass and Capture are mutually exclusive, 
and so are Pass and Extension, but I understand the idea, and I think it 
is very good. In fact, I had planned to add it to the conclusion of the 
paper. That is the reason why I wrote that relevance of the model 
section. But it was 1 AM when I finished the paper, and I had been 
writing it since early in the morning, so I decided I wouldn't start 
explaining something complicated, sorry.


Another (less important) problem I forgot to mention in the paper, 
regarding the validity of the model, is the hypothesis of independence. 
When training from consecutive moves of the same game, it may not be 
reasonable to consider that all the positions are independent. Sampling 
one or two random position in every game might be better.


Thanks for your remarks,

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Sylvain Gelly

Hi Rémi,

2007/5/17, Rémi Coulom [EMAIL PROTECTED]:

to Sylvain: Here are tests of Crazy Stone at 90s/game 1CPU against GNU
3.6 level 10, measured over about 200 games
[...]


Thank you for your answer. These numbers are interesting.
The improvement in the tree search is really huge. It is what I
expected and is consistent with my own experiments (different of
course, but comparing in the same class).
That is good to be consistent, at least we have a chance to understand
something :-p.

Unfortunately I don't have time in the next weeks to read it really
carefully and make (I hope) more interesting comments. But there are
already so many interesting comments on the list ;-).

Again thank you for this interesting work.

Cheers,
Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-17 Thread Chris Fant

On 5/17/07, Chrilly [EMAIL PROTECTED] wrote:

I have now also finished a first version of UCT-Suzie (in parallel the Peter
Woitke works on the Alpha-Beta Version). UCT-Suzie uses a hashtable, mainly
because I found the programming of the tree too complicated. The Monte-Carlo
part uses some simple patterns according the MoGo article. Progress is
rather slow, because I am working (more than) full-time on FPGA-projects in
Computer-Tomography.

 Here are the problems with hash tables as a tree:

 1. Time - it is more expensive - you must gather the children together
 when making decisions about which node to expand (which generally
 involves re-generating the keys by making all the legal moves.)   There
 are ways around this that trade space for time but in either case it is
 more expensive.

I do not understand this. In UCT-Suzie the moves are generated, when a new
leaf-node is reached. The Hashtable has a link to the move-list. When the
node is reached the next time, the moves must not be generated again. Just
the calculation of the UCT-Urgency value (WinRate + sqrt() ) has to be
done. I assume that this calculation has to be done also in a tree
representation. I see no difference in this respect with Gunnars Gnu-GO UCT
code.
Memory is at least for 9x9 no problem. The number of Monte-Carlo runs/sec.
is about 17K (9x9). This can be improved, because the UCT-Player uses the
Alpha-Beta DoMove/UndoMove functions which are overkill for UCT.

 2. GHI - you must take special care to deal with Graph History
 Interaction - primarly recognizing that ko situations are different.
 You can get by with relatively simple solutions that don't fully address
 this issue but it's still imperfect.

I have serious problems with KO. UCT-Suzie plays generally strong, but makes
terrible blunders in KO-positions. So far I do not even understand the
problem. Could you describe it more detailed?
I had also some serious SuperKO problems. UCT-Suzie was very clever to
find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is not
deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly.
GameStack-Overflow.

Chrilly



So does your hash function consider all previous board states (for
superKo)?  If so, how?  I can think of one way, but I don't use it
since I have a tree that handles the allowable moves independent of
the hashtable.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: Amsterdam 2007 paper

2007-05-17 Thread Jason House

Rémi Coulom wrote:
to Magnus: If you consider the example of section 2.2: 1,2,3 wins 
against 4,2 and 1,5,6,7. The probability is 
P=c1c2c3/(c1c2c3+c4c2+c1c5c6c7). For this example:

N1j=c2c3,B1j=0,M1j=c2c3+c5c6c7,A1j=c4c2
N2j=c1c3,B2j=0
N3j=c1c2,B3j=0
N4j=0,B4j=c1c2c3
I will add this example to the paper if it makes things clearer.


I'd recommend not using N_ij as an arbitrary constant when N is used for 
a completely different purpose in the paper. 

My other stumbling block was the jump to the f(x) equation.  Calling it 
f(y_i) and putting the definition of W_i near there would help some. 

Putting log(N_ij*y_i +B_ij) = delta(N_ij)*[log(N_ij)+log(y_i)] + 
delta(B_ij)*log(N_ij) would make the jump very easy.  by delta(x) = 1 
when x=0 and 0 for anything else.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] JCIS extended abstract

2007-05-17 Thread Chaslot G (MICC)
 How do you handle n_i = 0 in equation 1.2? 

I don't compute it when n_i is not equal to 1. At that point, I give every node 
a high upper bound in order to visit every node once. Remember that I am using 
progressive unpruning in the same time, so only the best nodes according to 
the heuristic are visited.

 I'd assume that if a heuristic says a move is really bad (H_B = 0), then 
 you'd want to avoid simulations for a while.

Yes! This is exactly the purpose of progressive unprunning.

 p_hat = (w_i + n_h*H_B)/(n_i+n_h)

Interesting... But then how do you compute n_h in practice

If I understood Remi correctly, he computes the value by adding one virtual 
loss and one virtual win for each gamma, ie, with your notation: p_hat= 
(w_i+gamma)/(gamma*2+n_i)

This formula tends to center the probability distribution on 0.5. But if a 
pattern is good, I would rather shift the distribution towards 1! Adding a term 
in gamma/n_i as I do in my paper is a solution to shift the probability 
distribution to towards 1 which works well in practice.

For the full version of my paper I will compare different ways to modify the 
probability distribution according to knowledge. 
I believe there is no optimal way to do that :(

This problem is fundamental, because it gives the opportunity to include 
previously learned knowledge into Monte-Carlo Tree Search. 


-Original Message-
From: [EMAIL PROTECTED] on behalf of Jason House
Sent: Thu 17/05/2007 22:26
To: computer-go
Subject: Re: [computer-go] JCIS extended abstract
 
How do you handle n_i = 0 in equation 1.2?  I'd assume that if a heuristic
says a move is really bad (H_B = 0), then you'd want to avoid simulations
for a while.

Also, has mango experimented with other related strategies for a soft
transition?

In my own mathematical recreation, I came up with a slightly variant method.
Just to keep the text short, here's the notation that I'll use...
  N = number of simulations through parent node
  n_i = number of simulations through this node
  w_i = winning simulations through this node
  v_i = w_i / n_i
  H_B = heuristic_bias
  n_h = heuristic confidence (used in my equation)
  UCTdelta = sqrt(ln(N)/n_i)
  p_hat = estimated probability of winning with soft transition

... then equation 1.2 is p_hat + UCTdelta

For you,
  p_hat = (w_i + H_B)/n_i
I derived and posted the following formula to the list:
  p_hat = (w_i + n_h*H_B)/(n_i+n_h)

The two equations are fairly similar... especially if n_h is set to 1.  Then
the only difference is that I replace n_i with (n_i+1)... and allows p_hat
to be calculated with n_i = 0.  I made no attempt to handle UCTdelta, but
replacing n_i in there with n_i+n_h is probably reasonable

On 5/17/07, Chaslot G (MICC) [EMAIL PROTECTED] wrote:

  Dear all,


 Following the example of Rémi, I would like to share with you a paper that
 I wrote which describe some important elements of my Go program Mango.

 I submitted this paper recently to the JCIS workshop 2007. Due to the fact
 that it was an extended abstract, a lot of details are missing. I am now
 working on the full paper, which will contain much more information.



 The extended abstract can be found here:
 www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf



 Cheers,



 Guillaume



 PS: To answer's Sylvain question: Modifying the probability distribution
 of moves in the tree was more effective in Mango than the improvement done
 on the simulation strategy. However, Mango simulation strategy is not as
 good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the
 last KGS tournament because it had a better way to use domain knowledge in
 the Tree.

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


winmail.dat___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] JCIS extended abstract

2007-05-17 Thread Jason House

Chaslot G (MICC) wrote:

p_hat = (w_i + n_h*H_B)/(n_i+n_h)



Interesting... But then how do you compute n_h in practice
  


The mathematical derivation is based on estimating an a-priori 
probability distribution.  In theory, one simply needs to run MC 
simulations for a wide variety of heuristically identical situations, 
and then fit the best beta distribution to the measured data.  A beta 
distribution has two parameters - alpha and beta.  n_h = alpha+beta  
and  n_h*H_B = alpha.


In practice...  I don't have an MC bot yet.  I'm slowly redoing my bot 
in D (an up and coming programming language http://www.tiobe.com/tpci.htm).


For the full version of my paper I will compare different ways to modify the probability distribution according to knowledge. 
I believe there is no optimal way to do that :(
  


Well, if beta distributions are a good fit, then the above would be the 
optimal probability distribution...   Of course, my analysis doesn't 
take tree searches into... Maybe I'll get lucky and it'll work well like 
a multi-armed bandit.  Actually, even if it doesn't, being optimal 
before it's time to build a subtree may be enough.  I think I've seen 
stuff like waiting until doing 100 simulations.  If n_h is relatively 
small, the effect is probably sufficiently washed out by then and 
optimality probably doesn't matter.


I guess if empirical evidence shows beta distributions are a good fit 
and a high n_h is appropriate, then I'll have to revisit the shortcomings...

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/