Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Chrilly


I have serious problems with KO. UCT-Suzie plays generally strong, but 
makes

terrible blunders in KO-positions. So far I do not even understand the
problem. Could you describe it more detailed?
I had also some serious SuperKO problems. UCT-Suzie was very clever to
find SuperKOs. We do not check for SuperKO in Alpha-Beta. The search is 
not

deep enough. Ignoring SuperKO in UCT is for a Hashtable version deadly.
GameStack-Overflow.

Chrilly



So does your hash function consider all previous board states (for
superKo)?  If so, how?  I can think of one way, but I don't use it
since I have a tree that handles the allowable moves independent of
the hashtable.

When going down a variation the Hash and other Board-State Information like 
e.g. the KO-Point are stored on a stack. Starting from the current Top of 
Stack the detection goes down and search for the same hash-key and Ko-Point. 
Its the Repeated Position Detection method of chess. The Gamestack-Pointer 
is decremented by 2, one can stop, when a non-capturing move is done (in 
chess its the other way round). One can start 4 Plies from the top of stack. 
Due to the stoping criterion one has to check only a few entries (most of 
the time none).
If a SuperKO occurs, the position is evaluated by the Material-Balance. 
BlackCaptures - WhiteCaptures + Komi. Probably a better way is to ignore the 
result. But I assumed that SuperKO is a rare event and the result has no 
significant impact on the search-tree.
Maybe there is something wrong with this approach and the Ko-Problems I have 
a related to this simple SuperKO handling. I noticed several times that a 
direct transformation of chess methods has some subtle flaws.


Chrilly

___

computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/ 


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Erik van der Werf

On 5/18/07, Chrilly [EMAIL PROTECTED] wrote:

When going down a variation the Hash and other Board-State Information like
e.g. the KO-Point are stored on a stack. Starting from the current Top of
Stack the detection goes down and search for the same hash-key and Ko-Point.
Its the Repeated Position Detection method of chess. The Gamestack-Pointer
is decremented by 2, one can stop, when a non-capturing move is done (in
chess its the other way round).


Non-capturing moves can create repetition (but there will of course be
captures elsewhere in the cycle). Fortunately, there are other simple
criteria. E.g., you can stop whenever a move is played on an
intersection for the first time.


If a SuperKO occurs, the position is evaluated by the Material-Balance.
BlackCaptures - WhiteCaptures + Komi.


I guess you mean the *change* in material balance after one cycle. I
don't see why komi should be used here.


Probably a better way is to ignore the result.


Only for balanced cycles.

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread chrilly


Non-capturing moves can create repetition (but there will of course be
captures elsewhere in the cycle).

So far the SuperKOs I have found where a round-trip of KOs.

Fortunately, there are other simple

criteria. E.g., you can stop whenever a move is played on an
intersection for the first time.


Okay.


If a SuperKO occurs, the position is evaluated by the Material-Balance.
BlackCaptures - WhiteCaptures + Komi.


I guess you mean the *change* in material balance after one cycle. I
don't see why komi should be used here.

No. I assume the game is over. As there is usually no reliable method for 
counting territory I use the difference of the captured stones as a - first 
approximation - for the evaluation. If e.g. both sides have captured - in 
the whole game - the same amount of stones, the eval is 0, but white wins 
due to Komi.
At least for significant differences in captures this is true. MC-Runs are 
also stopped, when one sides leads in Capture by a large margin.

Depending on the margin it produces only a negible bias.



Only for balanced cycles.

The cycles I have found so far are balanced. Each side captures in turn 1 
stone.


Chrilly

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Erik van der Werf

On 5/18/07, chrilly [EMAIL PROTECTED] wrote:


 Non-capturing moves can create repetition (but there will of course be
 captures elsewhere in the cycle).
So far the SuperKOs I have found where a round-trip of KOs.


Maybe the others are just hiding from you ;-)



 Fortunately, there are other simple
 criteria. E.g., you can stop whenever a move is played on an
 intersection for the first time.

Okay.

 If a SuperKO occurs, the position is evaluated by the Material-Balance.
 BlackCaptures - WhiteCaptures + Komi.

 I guess you mean the *change* in material balance after one cycle. I
 don't see why komi should be used here.

No. I assume the game is over. As there is usually no reliable method for
counting territory I use the difference of the captured stones as a - first
approximation - for the evaluation. If e.g. both sides have captured - in
the whole game - the same amount of stones, the eval is 0, but white wins
due to Komi.
At least for significant differences in captures this is true. MC-Runs are
also stopped, when one sides leads in Capture by a large margin.
Depending on the margin it produces only a negible bias.


Hmm... To me it seems more natural that the evaluation primarily
depends on the nature of the cycle. Your program may choose to play a
cycle because of some arbitrary material balance value, which will be
meaningless if you can't break out of that cycle. IMO you should
simply treat balanced cycles as draws.

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Peter Drake
It took me a long time to get around my mental block and accept the  
advice of everyone here, but your intuition is correct: superko is so  
rare, and so expensive to detect, that you should NOT check for it on  
every move.


Orego's current approach is:

During search, just maintain a single ko point and ignore superko.  
(There is a maximum number of moves per game to prevent infinitely  
long games.)


After search, when actually making a move:
1) Make a copy of the board
2) Compute the Zobrist hash of the current position from scratch
3) Check for superko violations (against a stack of previous Zobrist  
hashes for positions in the real game,)
4) If there is a violation, go back to the copy and try the next best  
move


Thanks,

Peter Drake
http://www.lclark.edu/~drake/



On May 17, 2007, at 11:00 PM, Chrilly wrote:



I have serious problems with KO. UCT-Suzie plays generally  
strong, but makes
terrible blunders in KO-positions. So far I do not even  
understand the

problem. Could you describe it more detailed?
I had also some serious SuperKO problems. UCT-Suzie was very  
clever to
find SuperKOs. We do not check for SuperKO in Alpha-Beta. The  
search is not
deep enough. Ignoring SuperKO in UCT is for a Hashtable version  
deadly.

GameStack-Overflow.

Chrilly



So does your hash function consider all previous board states (for
superKo)?  If so, how?  I can think of one way, but I don't use it
since I have a tree that handles the allowable moves independent of
the hashtable.

When going down a variation the Hash and other Board-State  
Information like e.g. the KO-Point are stored on a stack. Starting  
from the current Top of Stack the detection goes down and search  
for the same hash-key and Ko-Point. Its the Repeated Position  
Detection method of chess. The Gamestack-Pointer is decremented by  
2, one can stop, when a non-capturing move is done (in chess its  
the other way round). One can start 4 Plies from the top of stack.  
Due to the stoping criterion one has to check only a few entries  
(most of the time none).
If a SuperKO occurs, the position is evaluated by the Material- 
Balance. BlackCaptures - WhiteCaptures + Komi. Probably a better  
way is to ignore the result. But I assumed that SuperKO is a rare  
event and the result has no significant impact on the search-tree.
Maybe there is something wrong with this approach and the Ko- 
Problems I have a related to this simple SuperKO handling. I  
noticed several times that a direct transformation of chess methods  
has some subtle flaws.


Chrilly

___

computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread John Tromp

On 5/18/07, Peter Drake [EMAIL PROTECTED] wrote:

It took me a long time to get around my mental block and accept the advice
of everyone here, but your intuition is correct: superko is so rare, and so
expensive to detect, that you should NOT check for it on every move.


In dimwit, we check for superko while descending the UCT tree,
and only check basic ko in the MC playouts. Our check involves
a single lookup (of the current, incrementally computed Zobrist key)
in a hash_map of previous Zobrist keys, which is not all that expensive.
I agree that doing superko checks in MC playouts would cause an
unnecesary slowdown.

regards,
-John
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Chris Fant

After search, when actually making a move:
1) Make a copy of the board
2) Compute the Zobrist hash of the current position from scratch
3) Check for superko violations (against a stack of previous Zobrist hashes
for positions in the real game,)
4) If there is a violation, go back to the copy and try the next best move


But then you have an engine which does not converge to perfect play
given infinite resources.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Keep the common case fast (was Re: [computer-go] KO in Hashtable-UCT?)

2007-05-18 Thread Peter Drake

Yes, that sounds reasonable too.

In thinking about optimization, the rule as always is keep the  
common case fast. The cases in UCT Go, in decreasing order of  
commonness, are:


1) Play a move in an MC playout
2) Play a UCT tree move
3) Start (or finish) a playout
4) Make (or accept) a move in the actual game
5) Initialize data structures and the beginning of the game

1 has to be blazingly fast. 2-3 have to be relatively fast. 4-5 don't  
matter much unless you're spending an outrageous amount of time on them.


Also note that almost all UCT moves are played from nodes with very  
few children (often 0 or 1).


Peter Drake
http://www.lclark.edu/~drake/



On May 18, 2007, at 9:15 AM, John Tromp wrote:


On 5/18/07, Peter Drake [EMAIL PROTECTED] wrote:
It took me a long time to get around my mental block and accept  
the advice
of everyone here, but your intuition is correct: superko is so  
rare, and so

expensive to detect, that you should NOT check for it on every move.


In dimwit, we check for superko while descending the UCT tree,
and only check basic ko in the MC playouts. Our check involves
a single lookup (of the current, incrementally computed Zobrist key)
in a hash_map of previous Zobrist keys, which is not all that  
expensive.

I agree that doing superko checks in MC playouts would cause an
unnecesary slowdown.

regards,
-John
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Peter Drake

True...

My experience has been that (largely) ignoring the extremely rare  
case of superko is a better use of the finite resources we have.


Have others found the same thing?

Peter Drake
http://www.lclark.edu/~drake/



On May 18, 2007, at 9:19 AM, Chris Fant wrote:


After search, when actually making a move:
1) Make a copy of the board
2) Compute the Zobrist hash of the current position from scratch
3) Check for superko violations (against a stack of previous  
Zobrist hashes

for positions in the real game,)
4) If there is a violation, go back to the copy and try the next  
best move


But then you have an engine which does not converge to perfect play
given infinite resources.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread forrestc
Peter Drake said:
 It took me a long time to get around my mental block and accept the
 advice of everyone here, but your intuition is correct: superko is so
 rare, and so expensive to detect, that you should NOT check for it on
 every move.

Depending on how a program works, you may well need to check for
repetitions other than straight ko,

ie

black white
a2--pass
b2--pass
b1--a1
a2...

Forrest Curo


-
This email was sent using AIS WebMail.
http://www.americanis.net/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: computer-go Digest, Vol 34, Issue 15

2007-05-18 Thread David Silver

Very interesting paper!

I have one question. The assumption in your paper is that increasing  
the performance of the simulation player will increase the  
performance of Monte-Carlo methods that use that simulation player.  
However, we found in MoGo that this is not necessarily the case! Do  
you think there is some property of your learning algorithm that  
makes it particularly suitable for Monte-Carlo methods?


Thanks!
Dave


Date: Thu, 17 May 2007 01:17:55 +0200
From: R?mi Coulom [EMAIL PROTECTED]
Subject: [computer-go] Amsterdam 2007 paper
To: computer-go computer-go@computer-go.org
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,

I first thought I would keep my ideas secret until the Asmterdam
tournament, but now that I have submitted my paper, I cannot wait to
share it. So, here it is:

http://remi.coulom.free.fr/Amsterdam2007/

Comments and questions are very welcome.

Rémi




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread David Silver
I also use an online learning algorithm in RLGO to adjust feature  
weights during the game. I use around a million features (all  
possible patterns from 1x1 up to 3x3 at all locations on the board)  
and update the weights online from simulated games using temporal  
difference learning. I also use the sum of feature weights to  
estimate the value of a move, rather than a multiplicative estimate.  
The learning signal is only win/lose at the end of each simulation,  
rather than supervised learning like Remi. The results are  
encouraging (currently ~1820 Elo on CGOS, based on 5000 simulations  
per move) for a program that does not use UCT or Monte-Carlo Tree  
Search in any way.


Like Alvaro and Remi, I am convinced that online learning during  
simulation has great potential. In fact, I would go even further, and  
suggest that Monte-Carlo Tree Search and UCT are at one end of a  
large spectrum of promising sample-based search algorithms. We don't  
need to restrict ourselves to a tabular representation: state  
abstraction (for example, using pattern features) provides a way to  
generalise between positions and learn more efficiently. Nor do we  
need to restrict our learning algorithm to Monte-Carlo: any online  
reinforcement learning algorithm can be applied to learn from  
simulated experience.


In classical game programming terms, we can think of this approach as  
dynamically learning an evaluation function. Instead of learning a  
single evaluation function that applies to all positions, we tailor  
the evaluation function to the current position, by learning from  
simulated experience. Rather than thinking of learning as an offline  
procedure that takes months to train weights, let's think of it as an  
online procedure that can run in a few seconds. Once we learn the  
evaluation function, it can be used in any way we please - for  
example alpha-beta search (RLGO 2.1 uses a full width 5-ply search)  
or as a heuristic function for UCT (current research in MoGo).


I would be very interested to hear more about Dimwit's approach, and  
also Remi's experiments with online learning in CrazyStone.


-Dave

PS Apologies if you receive this twice, I had some difficulties posting.


Rémi:

John and I are planning a similar usage of patterns and features in
the MC simulations, although we were thinking of a very different
method to update strengths. Although we derived our formulas from an
extension of the logistic regression instead of Bradley-Terry models,
we arrived at very similar expressions. The main difference is that
what we have been calling score seems to be the log of your
strength, and instead of multiplying strengths when considering
teams, we just add the scores of different features. We use an
online learning algorithm to adjust the scores during the game, which
severely limits the kind of algorithms we can use, but also allows us
to learn things that apply specifically to the situations that are
appearing on the board in this game.

Now we only have to make dimwit 400 points stronger to show the world
that our approach works. :)

There are many things in the paper that we had never thought of, like
considering the distance to the penultimate move. Most of those things
won't be directly applicable to dimwit, but it gives us many
suggestions of things to try.

Thanks for this important contribution to computer go.


Álvaro.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Amsterdam 2007 paper

2007-05-18 Thread David Silver



Thanks for the great paper.  And thanks for sharing it before it's  
published.


Now I know what directions to take my engine in next.

Time for Team MoGo so share some more secrets  :)


We are publishing MoGo's secrets at ICML 2007, in just over a month.
So not long to wait now!

-Dave


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Don Dailey
I don't think you can ignore superko in the UCT search - you will lose
games.   You will also lose games even if you only check the move you
will actually play in the game.   

But in the play-outs, it has almost no value whatsoever, even if it 
was free.   (It probably has SOME value, but miniscule and clearly 
would weaken the program because you would do less play-outs in a
given amount of time.)

If you keep superko tests in the UCT tree, you have a scalable
program.

- Don



On Fri, 2007-05-18 at 09:33 -0700, Peter Drake wrote:
 True...
 
 
 My experience has been that (largely) ignoring the extremely rare case
 of superko is a better use of the finite resources we have.
 
 
 Have others found the same thing?
 
 Peter Drake
 http://www.lclark.edu/~drake/
 
 
 
 
 
 On May 18, 2007, at 9:19 AM, Chris Fant wrote:
 
   After search, when actually making a move:
   1) Make a copy of the board
   2) Compute the Zobrist hash of the current position from scratch
   3) Check for superko violations (against a stack of previous
   Zobrist hashes
   for positions in the real game,)
   4) If there is a violation, go back to the copy and try the next
   best move
  
  
  But then you have an engine which does not converge to perfect play
  given infinite resources.
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Don Dailey
John's post is almost identical to mine!   Sorry about that.  I could
have just said ditto for Lazarus.

- Don

On Fri, 2007-05-18 at 12:15 -0400, John Tromp wrote:
 On 5/18/07, Peter Drake [EMAIL PROTECTED] wrote:
  It took me a long time to get around my mental block and accept the advice
  of everyone here, but your intuition is correct: superko is so rare, and so
  expensive to detect, that you should NOT check for it on every move.
 
 In dimwit, we check for superko while descending the UCT tree,
 and only check basic ko in the MC playouts. Our check involves
 a single lookup (of the current, incrementally computed Zobrist key)
 in a hash_map of previous Zobrist keys, which is not all that expensive.
 I agree that doing superko checks in MC playouts would cause an
 unnecesary slowdown.
 
 regards,
 -John
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Don Dailey
Lazarus does full superko in the UCT search - but the play-out only look
at
simple KO.   

In the play-outs, I'm pretty sure infinite play-outs due to not using
superko are possible - even with the randomness.But I have a limit
on the length of the play-out games because when you use heavy play-outs
the games can occasionally last for hundreds of moves.The more
randomness you take out of the play-outs the more likely you will get
ridiculously long loops.  Superko would solve this, but it would also
slow down the play-outs significantly.I see no reason to have the
refinement of superko in a random play-out.

- Don

On Fri, 2007-05-18 at 09:01 -0700, Peter Drake wrote:
 It took me a long time to get around my mental block and accept the
 advice of everyone here, but your intuition is correct: superko is so
 rare, and so expensive to detect, that you should NOT check for it on
 every move.
 
 
 Orego's current approach is:
 
 
 During search, just maintain a single ko point and ignore superko.
 (There is a maximum number of moves per game to prevent infinitely
 long games.)
 
 
 After search, when actually making a move:
 1) Make a copy of the board
 2) Compute the Zobrist hash of the current position from scratch
 3) Check for superko violations (against a stack of previous Zobrist
 hashes for positions in the real game,)
 4) If there is a violation, go back to the copy and try the next best
 move
 
 
 Thanks,
 
 Peter Drake
 http://www.lclark.edu/~drake/
 
 
 
 
 
 On May 17, 2007, at 11:00 PM, Chrilly wrote:
 

I have serious problems with KO. UCT-Suzie plays generally
strong, but makes
terrible blunders in KO-positions. So far I do not even
understand the
problem. Could you describe it more detailed?
I had also some serious SuperKO problems. UCT-Suzie was very
clever to
find SuperKOs. We do not check for SuperKO in Alpha-Beta. The
search is not
deep enough. Ignoring SuperKO in UCT is for a Hashtable version
deadly.
GameStack-Overflow.


Chrilly
   
   
   
   
   So does your hash function consider all previous board states (for
   superKo)?  If so, how?  I can think of one way, but I don't use it
   since I have a tree that handles the allowable moves independent
   of
   the hashtable.
   
   
  When going down a variation the Hash and other Board-State
  Information like e.g. the KO-Point are stored on a stack. Starting
  from the current Top of Stack the detection goes down and search for
  the same hash-key and Ko-Point. Its the Repeated Position
  Detection method of chess. The Gamestack-Pointer is decremented by
  2, one can stop, when a non-capturing move is done (in chess its the
  other way round). One can start 4 Plies from the top of stack. Due
  to the stoping criterion one has to check only a few entries (most
  of the time none).
  If a SuperKO occurs, the position is evaluated by the
  Material-Balance. BlackCaptures - WhiteCaptures + Komi. Probably a
  better way is to ignore the result. But I assumed that SuperKO is a
  rare event and the result has no significant impact on the
  search-tree.
  Maybe there is something wrong with this approach and the
  Ko-Problems I have a related to this simple SuperKO handling. I
  noticed several times that a direct transformation of chess methods
  has some subtle flaws.
  
  
  Chrilly
  
  
  ___
   computer-go mailing list
   computer-go@computer-go.org
   http://www.computer-go.org/mailman/listinfo/computer-go/ 
  
  
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: computer-go Digest, Vol 34, Issue 15

2007-05-18 Thread Rémi Coulom

David Silver wrote:

Very interesting paper!

I have one question. The assumption in your paper is that increasing 
the performance of the simulation player will increase the performance 
of Monte-Carlo methods that use that simulation player. However, we 
found in MoGo that this is not necessarily the case! Do you think 
there is some property of your learning algorithm that makes it 
particularly suitable for Monte-Carlo methods?


Thanks!
Dave 
Maximizing the likelihood does not optimize the performance of the 
simulation player. For instance, by making it more greedy, I am sure it 
would become a stronger player. I have the feeling that maximizing the 
likelihood produces a good balance between playing good moves and being 
random. It would be worth testing the strength of the MC player with 
more or less greedy versions of the random player to test this.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: computer-go Digest, Vol 34, Issue 15

2007-05-18 Thread Álvaro Begué

On 5/18/07, Rémi Coulom [EMAIL PROTECTED] wrote:

David Silver wrote:
 Very interesting paper!

 I have one question. The assumption in your paper is that increasing
 the performance of the simulation player will increase the performance
 of Monte-Carlo methods that use that simulation player. However, we
 found in MoGo that this is not necessarily the case! Do you think
 there is some property of your learning algorithm that makes it
 particularly suitable for Monte-Carlo methods?

 Thanks!
 Dave
Maximizing the likelihood does not optimize the performance of the
simulation player. For instance, by making it more greedy, I am sure it
would become a stronger player. I have the feeling that maximizing the
likelihood produces a good balance between playing good moves and being
random. It would be worth testing the strength of the MC player with
more or less greedy versions of the random player to test this.

Rémi


My own take on this is that we don't really know what the right
probability distribution is to get MC simulations that predict well
the outcome of the game. What we can do is provide indications of what
properties these moves have and combine them with weights that will be
tuned by playing many games. Rémi's way of computing a probability
distribution is probably an acceptable way to pick, but it's perfectly
possible that using some power of his strengths would actually give a
stronger program.

In our case, we are currently trying to estimate a score for each
move, which roughly corresponds to how many points we think the move
is worth. Then we'll make the probability of each move proportional to
exp(A*score), where A is a number to be tuned by playing many games.
One could also use other features by making the probability
proportional to exp(A*score+B*is_capture+C*is_atari+...) but one would
have to tune A, B, C, ... by playing even more games. I guess we could
use the maximum-likelihood settings (what Rémi did) as an initial
guess, and then try our hand at this difficult optimization problem by
trying perturbations of those settings. There is a reason why just
bought a powerful computer and we are going to buy more. :)

Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] KO in Hashtable-UCT?

2007-05-18 Thread Chris Fant

 But then you have an engine which does not converge to perfect play
 given infinite resources.

I assume that you're joking, given: a) a current lack of infinite resources,
and b) a current lack of convergence of any kind.


No, but I can rephrase for those spooked by the concept of infinity:

But then you have an engine that stops scaling at an unknown point.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/