Re: [computer-go] MC approach

2007-02-13 Thread Chris Fant

The following code did not hurt the strength against self-play in over
2000 games at boardsize 8x8 (faster games) with 10k playouts per move:

  moveEval[m] = (float)wins[m]/nGames[m] +
points[m]/(nGames[m]*Board-Spaces*100)

where points[m] is accumulated only for wins.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-13 Thread Weston Markham

On 2/9/07, Weston Markham [EMAIL PROTECTED] wrote:

I don't seem to have any numbers on this anymore, but I should be able
to try some experiments this weekend.  I do have some code that does
what I describe below.  It is also using an all moves as first
heuristic.  According to my notes, I made this change in an attempt to
avoid severely conservative (in my non-expert opinion) moves near the
beginning of the game, which seem to be preferred when using
all-moves-as-first.  It specifically aims for a 30-point lead at the
beginning of the game, and reduces this by one point for each turn
into the game.

I should point out that I am not averaging scores, but simply changing
which games I count as wins for the evaluation of a move.  This is
perhaps not quite what Steve Uurtamo had in mind when he was
originally musing about being greedy at the beginning of the game.
Nevertheless, it is a very similar sort of idea to what he described,
so I thought that I would mention it.

Weston

On 2/8/07, Chris Fant [EMAIL PROTECTED] wrote:
 On 2/8/07, Weston Markham [EMAIL PROTECTED] wrote:
  I believe that I have had some success with an approach like this,
  actually.  I believe that I initially only tally games that are won by
  a certain margin, and reduce that margin to zero as the game
  progresses.  I am pretty sure that this improves upon straight Monte
  Carlo.  I think that I can get some numbers on it, if anyone is
  interested.
 
  Weston

 Yes, please.



I did run some tests over the weekend, but they did not complete as
quickly as I had expected.  Anyway, I have enough data at this point
to say that although there is nothing spectacular to show one way or
the other, I was probably wrong that the code was an improvement.

Playing against gnugo 3.7.10, komi 7.5, playing black in 50% of the games:

86/732 games won baseline (11.7%)
106/1000 games won after enabling logic described above. (10.6%, a
slight degradation)

I also tried a version that dynamically adjusts an expected score,
and tallies the number of games where the score exceeds this.  It
appears to give an improvement:  12.8% of the games were won, out of
1000.

Weston
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-12 Thread Weston Markham

I think that you are essentially correct.  However, this is only going
to affect a small number of games where two different moves are
exactly tied for the best winning percentage, after many playouts.
Even if the underlying probabilities are exactly the same, you can't
really expect this to happen much.

Weston

On 2/11/07, Heikki Levanto [EMAIL PROTECTED] wrote:

But there is a way. If we do N play-outs, the effect of any single of
them is 1/N. If we make sure to scale the score to be less than half of
this, it can not disturb anything in cases where the number of wins is
different. Only in cases with exactly the same number of wins in the
play-outs, would the score break the tie.

In other words my large constant of 1000 was far too small. It would
have to be something like 2NM, where M is the maximum score (say 361).
Round it up to 1000N, and we should be safe.

I still believe it would make endgames look more reasonable, and
possibly even better, in case the winning program has overlooked a
detail somewhere, having a large margin of points on the board should
act as an insurance against small blunders.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-12 Thread Heikki Levanto
On Mon, Feb 12, 2007 at 11:20:43AM -0500, Weston Markham wrote:
 I think that you are essentially correct.  However, this is only going
 to affect a small number of games where two different moves are
 exactly tied for the best winning percentage, after many playouts.
 Even if the underlying probabilities are exactly the same, you can't
 really expect this to happen much.

I am not sure I agree. In the end of each game there comes a time when
the winner is already certain, or very neaqrly so. Even a MC player will
see this at some point. When that point comes, there is no direction in
pure MC play, except to avoid the worst blunders.  The winning program
is happy to let a tail be cut off here, a small group die there, and a
territory be invaded here, if none of this shakes its unfaltering faith
in its victory. Conversely, the loosing program doesn't even try all
these tricks, as it sees it looses anyway. Both of them play pure
random. It doesn't affect the result, usually. It can be that both
players have misjudged something, and having wasted the winning margin,
that something may turn out to be valuable anyway.

Alas, I don't have my own MC player coded (haven't even started), so I
can not make the experiment myself. If someone here would like to try,
I'd like to hear of it.

-Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-11 Thread Heikki Levanto
On Wed, Feb 07, 2007 at 02:41:22PM -0600, Nick Apperson wrote:
 If it only did one playout you would be right, but imagine the following
 cases:
 
 case 1: White wins by .5 x 100, Black wins by .5 x 100
 case 2: White wins by 100.5 x 91, Black wins by .5 x 109
 
 the method that takes into account score would prefer the second case even
 though it has a lower winning percentage that may be represented by the fact
 that white is making an overplay for instance.  Obviously this is just one
 example, but there are many cases like this and overplays tend to be
 priveledged in a sense I would suspect with this kind of algorithm.

I have been thinking about this, and have to agree with you, averaging
the results gives pretty small numbers, that can easily be disturbed by
adding the winning scores to the mixture. 

But there is a way. If we do N play-outs, the effect of any single of
them is 1/N. If we make sure to scale the score to be less than half of
this, it can not disturb anything in cases where the number of wins is
different. Only in cases with exactly the same number of wins in the
play-outs, would the score break the tie.

In other words my large constant of 1000 was far too small. It would
have to be something like 2NM, where M is the maximum score (say 361).
Round it up to 1000N, and we should be safe.

I still believe it would make endgames look more reasonable, and
possibly even better, in case the winning program has overlooked a
detail somewhere, having a large margin of points on the board should
act as an insurance against small blunders. 

Or am still missing something obvious?

  - Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-09 Thread Weston Markham

I don't seem to have any numbers on this anymore, but I should be able
to try some experiments this weekend.  I do have some code that does
what I describe below.  It is also using an all moves as first
heuristic.  According to my notes, I made this change in an attempt to
avoid severely conservative (in my non-expert opinion) moves near the
beginning of the game, which seem to be preferred when using
all-moves-as-first.  It specifically aims for a 30-point lead at the
beginning of the game, and reduces this by one point for each turn
into the game.

I should point out that I am not averaging scores, but simply changing
which games I count as wins for the evaluation of a move.  This is
perhaps not quite what Steve Uurtamo had in mind when he was
originally musing about being greedy at the beginning of the game.
Nevertheless, it is a very similar sort of idea to what he described,
so I thought that I would mention it.

Weston

On 2/8/07, Chris Fant [EMAIL PROTECTED] wrote:

On 2/8/07, Weston Markham [EMAIL PROTECTED] wrote:
 I believe that I have had some success with an approach like this,
 actually.  I believe that I initially only tally games that are won by
 a certain margin, and reduce that margin to zero as the game
 progresses.  I am pretty sure that this improves upon straight Monte
 Carlo.  I think that I can get some numbers on it, if anyone is
 interested.

 Weston

Yes, please.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread Magnus Persson

Quoting Heikki Levanto [EMAIL PROTECTED]:


On Wed, Feb 07, 2007 at 04:42:01PM -0500, Don Dailey wrote:

In truth the only thing that matters is to increase your winning
percentage - not your score.   There seems to be no point in tampering
with this.


I guess I must accept the wisdom of those who have tried these things.
Still, it hurts my intuition that it could be better for a program to
choose a line where it seems to win by 2 points, when another line seems
to end in a 100 point win. What if the opponent can improve his play
from what I expected, and gain an extra 3 points somewhere?

Maybe all this shows that we have not (yet?) understood all the
complications of the MC evaluation, and that more research is needed?


No, it is actually quite simple although of course more reseacrh is needed for
MC but not on this topic. Here I mght repeat what others already said but I
will try to make it clear out of how I understand it.

The line that ends in a 100 point win is often only a win if the 
opponent makes
a mistake, otherwise you lose some points or even the game. The fallacy 
here (I

have been there too) is that you think of specific situations where all lines
ends with victory. But in real games many lines often exists that wins big but
are also risky. You are of course safe when all moves are valued 1000 
each. But before that there might be some evaluated for example 998 
because the MC

eval knows that something may go wrong (a 7ply deep tactical defect exists) if
you then add the average score of 2 to 1000 and an average score of 10 to 998
then the program makes a mistake. It is all about eliminating uncertainty. The
average score can contain a very large proportion of losees if it is
compensated by bigger wins.

Since mc eval is based on random games it will always have scores less 
than the

maximum 1000, in won positions that can still be lost by greedy moves. And
there is no way of securely determine the point when greedy play. Not even
when all moves are evaluated as sure wins, there can be bugs in the program
that inflates the win score of moves that leads to a tactical disaster.

The solution is to improve the simulation part so that real sure wins are
evaluated as sure wins early and among these one can bias the selected moves
toward profitable moves. Valkyria does something like this, although I have
forgotten the exact details now.

This problem I think become less important when the program become stronger in
other areas. The reason is that the simulation part plays a very bad endgame.
Thus a strong MC/UCT program realizes that more territory is also an insurance
against a bad endgame. I have no solid backing for this but my impression is
that Valkyria used to win with 0.5 points almost always but now often 
wins with

much more.

-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread steve uurtamo
 The average score can contain a very large proportion of losees if it is
 compensated by bigger wins.

yes, it is easy to see how this might cripple the play of an MC player.

that 90% territory win that requires 3 opponent blunders is tempting enough
to ignore the fact that all other non-blundering lines lead to 0.5 point losses.

i wonder if this kind of greediness might, however, be useful for selecting,
say, the first move or two in a 9x9 game.  the thinking here is that since the
endgame is essentially noise at this point, you might as well be greedy
before tactics become an issue.  that's probably faulty intuition, though.

on another note, has anyone just let their MC code rip for a day or two (or
maybe a week or more) on the first move alone?  i would think that if you
ordered the distribution of the resulting list, it would give very good 
information
about how well MC acts as a board-eval function.  (i.e. turn off all book lines,
turn off all rules about not playing on the first line or two early in the 
game, etc.
etc.  turn off all heuristics related to the opening and then print the 
distribution
over the board).  what are the top, say, 10 moves on a 9x9 board and how are
they distributed, and the top, say, 40 moves on a 19x19 board along with their
distribution?  if you fold board symmetries into your search, i suppose that you
can get a factor of 8 here.

my thinking is that if it's anything other than a very smooth distribution among
the top moves, that's a good indicator.

s.





 

It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread Don Dailey
I think there are 15 first moves in 9x9 go if you factor out the
symetries.
UCT isn't good at evauating all the moves, it will pick one of them and
spend most of it's time on it.But you could search each 1 at a time.

The UCT programs are memory bound, so you could search each of these 15
moves 1 at a time and study the scores.

- Don


On Thu, 2007-02-08 at 04:41 -0800, steve uurtamo wrote:
  The average score can contain a very large proportion of losees if it is
  compensated by bigger wins.
 
 yes, it is easy to see how this might cripple the play of an MC player.
 
 that 90% territory win that requires 3 opponent blunders is tempting enough
 to ignore the fact that all other non-blundering lines lead to 0.5 point 
 losses.
 
 i wonder if this kind of greediness might, however, be useful for selecting,
 say, the first move or two in a 9x9 game.  the thinking here is that since the
 endgame is essentially noise at this point, you might as well be greedy
 before tactics become an issue.  that's probably faulty intuition, though.
 
 on another note, has anyone just let their MC code rip for a day or two (or
 maybe a week or more) on the first move alone?  i would think that if you
 ordered the distribution of the resulting list, it would give very good 
 information
 about how well MC acts as a board-eval function.  (i.e. turn off all book 
 lines,
 turn off all rules about not playing on the first line or two early in the 
 game, etc.
 etc.  turn off all heuristics related to the opening and then print the 
 distribution
 over the board).  what are the top, say, 10 moves on a 9x9 board and how are
 they distributed, and the top, say, 40 moves on a 19x19 board along with their
 distribution?  if you fold board symmetries into your search, i suppose that 
 you
 can get a factor of 8 here.
 
 my thinking is that if it's anything other than a very smooth distribution 
 among
 the top moves, that's a good indicator.
 
 s.
 
 
 
 
 
  
 
 It's here! Your new message!  
 Get new email alerts with the free Yahoo! Toolbar.
 http://tools.search.yahoo.com/toolbar/features/mail/
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread Chris Fant

I thought that the memory boundedness was completely fixed by not
expanding a UCT node until it has been visited X number of times.
Just increase X until you are no longer memory bound.  I don't recall
anyone reporting a loss in playing strength by doing this.


On 2/8/07, Don Dailey [EMAIL PROTECTED] wrote:

I think there are 15 first moves in 9x9 go if you factor out the
symetries.
UCT isn't good at evauating all the moves, it will pick one of them and
spend most of it's time on it.But you could search each 1 at a time.

The UCT programs are memory bound, so you could search each of these 15
moves 1 at a time and study the scores.

- Don


On Thu, 2007-02-08 at 04:41 -0800, steve uurtamo wrote:
  The average score can contain a very large proportion of losees if it is
  compensated by bigger wins.

 yes, it is easy to see how this might cripple the play of an MC player.

 that 90% territory win that requires 3 opponent blunders is tempting enough
 to ignore the fact that all other non-blundering lines lead to 0.5 point 
losses.

 i wonder if this kind of greediness might, however, be useful for selecting,
 say, the first move or two in a 9x9 game.  the thinking here is that since the
 endgame is essentially noise at this point, you might as well be greedy
 before tactics become an issue.  that's probably faulty intuition, though.

 on another note, has anyone just let their MC code rip for a day or two (or
 maybe a week or more) on the first move alone?  i would think that if you
 ordered the distribution of the resulting list, it would give very good 
information
 about how well MC acts as a board-eval function.  (i.e. turn off all book 
lines,
 turn off all rules about not playing on the first line or two early in the 
game, etc.
 etc.  turn off all heuristics related to the opening and then print the 
distribution
 over the board).  what are the top, say, 10 moves on a 9x9 board and how are
 they distributed, and the top, say, 40 moves on a 19x19 board along with their
 distribution?  if you fold board symmetries into your search, i suppose that 
you
 can get a factor of 8 here.

 my thinking is that if it's anything other than a very smooth distribution 
among
 the top moves, that's a good indicator.

 s.






 

 It's here! Your new message!
 Get new email alerts with the free Yahoo! Toolbar.
 http://tools.search.yahoo.com/toolbar/features/mail/
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread Magnus Persson

Quoting Don Dailey [EMAIL PROTECTED]:


I think there are 15 first moves in 9x9 go if you factor out the
symetries.
UCT isn't good at evauating all the moves, it will pick one of them and
spend most of it's time on it.But you could search each 1 at a time.

The UCT programs are memory bound, so you could search each of these 15
moves 1 at a time and study the scores.


It is perhaps better to do this experiment on 7x7 where it is actually known
what the best moves in the opening are so it is possible to judge the quality.
I am now not thinking about the first move mut more of finding a high quality
principal variation. I did some experiments long time ago with t least several
hours of computing time for some opening positions, but found that there were
some serious problems for many positions. Improvements or changes to the
program seemed to change the evaluation of opening positions almost randomly.

-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread Don Dailey
On Thu, 2007-02-08 at 08:59 -0500, Chris Fant wrote:
 I thought that the memory boundedness was completely fixed by not
 expanding a UCT node until it has been visited X number of times.
 Just increase X until you are no longer memory bound.  I don't recall
 anyone reporting a loss in playing strength by doing this.

There is a loss of playing strength as X grows,  but the loss 
grows very slowly.
  
Obviously, if you set X really high, it would not build much
of a tree and it would play very weak.

I have been planning to make this dynamic in Lazarus, but I
have not done so yet.   The idea is that X starts at 1 but
when the table fills up there is a consolidation pass where
the table is rebuilt, child nodes may fold back into their 
parents and X gets reset to a higher value.  As the search 
proceeds this may happen a few time, perhaps doubling the 
value of X each time.

In this way the program continues to scale smoothly.  There
is never a sudden out of memory wall and you can set the table
initiallly to any size that suits you.

I haven't bothered because I cannot detect a loss of strength
in values up to 100 but it's on my todo list - I know there is
a small loss of strength even if I cannot measure it.  

Right now, Lazarus simply stops searching and returns a move
when this happens, but I have X set high enough that it can
think for many minutes.

- Don



 
 On 2/8/07, Don Dailey [EMAIL PROTECTED] wrote:
  I think there are 15 first moves in 9x9 go if you factor out the
  symetries.
  UCT isn't good at evauating all the moves, it will pick one of them and
  spend most of it's time on it.But you could search each 1 at a time.
 
  The UCT programs are memory bound, so you could search each of these 15
  moves 1 at a time and study the scores.
 
  - Don
 
 
  On Thu, 2007-02-08 at 04:41 -0800, steve uurtamo wrote:
The average score can contain a very large proportion of losees if it is
compensated by bigger wins.
  
   yes, it is easy to see how this might cripple the play of an MC player.
  
   that 90% territory win that requires 3 opponent blunders is tempting 
   enough
   to ignore the fact that all other non-blundering lines lead to 0.5 point 
   losses.
  
   i wonder if this kind of greediness might, however, be useful for 
   selecting,
   say, the first move or two in a 9x9 game.  the thinking here is that 
   since the
   endgame is essentially noise at this point, you might as well be greedy
   before tactics become an issue.  that's probably faulty intuition, though.
  
   on another note, has anyone just let their MC code rip for a day or two 
   (or
   maybe a week or more) on the first move alone?  i would think that if you
   ordered the distribution of the resulting list, it would give very good 
   information
   about how well MC acts as a board-eval function.  (i.e. turn off all book 
   lines,
   turn off all rules about not playing on the first line or two early in 
   the game, etc.
   etc.  turn off all heuristics related to the opening and then print the 
   distribution
   over the board).  what are the top, say, 10 moves on a 9x9 board and how 
   are
   they distributed, and the top, say, 40 moves on a 19x19 board along with 
   their
   distribution?  if you fold board symmetries into your search, i suppose 
   that you
   can get a factor of 8 here.
  
   my thinking is that if it's anything other than a very smooth 
   distribution among
   the top moves, that's a good indicator.
  
   s.
  
  
  
  
  
  
   
   It's here! Your new message!
   Get new email alerts with the free Yahoo! Toolbar.
   http://tools.search.yahoo.com/toolbar/features/mail/
   ___
   computer-go mailing list
   computer-go@computer-go.org
   http://www.computer-go.org/mailman/listinfo/computer-go/
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-08 Thread Weston Markham

On 2/8/07, steve uurtamo [EMAIL PROTECTED] wrote:

i wonder if this kind of greediness might, however, be useful for selecting,
say, the first move or two in a 9x9 game.  the thinking here is that since the
endgame is essentially noise at this point, you might as well be greedy
before tactics become an issue.  that's probably faulty intuition, though.


I believe that I have had some success with an approach like this,
actually.  I believe that I initially only tally games that are won by
a certain margin, and reduce that margin to zero as the game
progresses.  I am pretty sure that this improves upon straight Monte
Carlo.  I think that I can get some numbers on it, if anyone is
interested.

Weston
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread Heikki Levanto
On Wed, Feb 07, 2007 at 12:06:40PM +0200, Tapani Raiko wrote:
 Let my try again using the handicap example. Let's say MC player is given 
 a huge handicap. In the simulations, it is winning all of its games, so 
 there is no information helping to select the next move. 

This situation happens in normal games too, once one player is so much
ahead that it wins almost no matter what. It leads into really
stupid-looking endgames, where live groups are allowed to die, and dead
ones are allowed to be rescued.

All this could be avoided by a simple rule: Instead of using +1 and -1
as the results, use +1000 and -1000, and add the final score to this.

The purpose of the large constant (1000) is to make sure that it prefers
any win to any loss (so that large_win + small_loss  small_win +
small_win). One could even add another term in the result, favouring
games that end early (for the winner) or postpone them (for the looser),
in hope of allowing the opponent more chances to make mistakes.

As far as I can see, this ought to fit straight in to any MC or UCT
program. It may not improve the winning chances, but it sure should make
the programs play look more reasonable.


Just my humble idea. Feel free to shoot down (with serious arguments),
and/or use where ever you like. I would like to hear if this makes any
practical difference, if anyone tries.

   - Heikki



-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread dhillismail
  
 
-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 7 Feb 2007 5:34 AM
Subject: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo 
(QMC))


On Wed, Feb 07, 2007 at 12:06:40PM +0200, Tapani Raiko wrote:
 Let my try again using the handicap example. Let's say MC player is given 
 a huge handicap. In the simulations, it is winning all of its games, so 
 there is no information helping to select the next move. 

This situation happens in normal games too, once one player is so much
ahead that it wins almost no matter what. It leads into really
stupid-looking endgames, where live groups are allowed to die, and dead
ones are allowed to be rescued.

All this could be avoided by a simple rule: Instead of using +1 and -1
as the results, use +1000 and -1000, and add the final score to this.

The purpose of the large constant (1000) is to make sure that it prefers
any win to any loss (so that large_win + small_loss  small_win +
small_win). One could even add another term in the result, favouring
games that end early (for the winner) or postpone them (for the looser),
in hope of allowing the opponent more chances to make mistakes.

As far as I can see, this ought to fit straight in to any MC or UCT
program. It may not improve the winning chances, but it sure should make
the programs play look more reasonable.


Just my humble idea. Feel free to shoot down (with serious arguments),
and/or use where ever you like. I would like to hear if this makes any
practical difference, if anyone tries.



 Intuitively, it seems like this should work. You only give the winning 
margin a small weight, or only use it to break ties, or only apply it after the 
game is already decided. I've tried many variations, as have others, including 
your exact algorithm above. It can make some moves look a little prettier, but 
it always causes problems and I have to take it out.
 I have my theories as to why that is, but for brevity's sake, in my 
experience, giving any consideration to winning margin is detrimental. After 
the game is decided, there are more elegant ways to bring it to a close. I came 
to this conclusion reluctantly.
 
- Dave Hillis

Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam 
and email virus protection.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread dhillismail
 I should have mentioned that I have only tested on 9x9. For larger boards, 
I don't know.
- Dave Hillis
 
`

 Intuitively, it seems like this should work. You only give the winning 
margin a small weight, or only use it to break ties, or only apply it after the 
game is already decided. I've tried many variations, as have others, including 
your exact algorithm above. It can make some moves look a little prettier, but 
it always causes problems and I have to take it out.
 I have my theories as to why that is, but for brevity's sake, in my 
experience, giving any consideration to winning margin is detrimental. After 
the game is decided, there are more elegant ways to bring it to a close. I came 
to this conclusion reluctantly.
 
 

 

Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam 
and email virus protection.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread terry mcintyre
If I recall correctly, someone spoke of constraining the opening moves to the 
3rd,4th,and 5th lines in the absence of nearby stones, or something to that 
effect. What was the impact of this experiment? I notice the recent discussion 
of the need for a lot of thinking time to find good opening moves; would this 
thinking time be reduced with a better-than-random selection of opening moves 
during the playouts? If fuseki and joseki databases were used to bias the 
playouts, would the speed and quality of opening play improve?

One concern about using expert knowledge of this sort - what happens when one's 
opponent plays out of book? Human players often find this frustrating - we 
sense that moves which deviate from joseki are wrong, but finding the correct 
refutation under time pressure is not always easy.

To address this concern: at least one MC player uses an opening book; would it 
be profitable to automatically analyze past losses, and devote a few 100k 
playouts to look for better replies to plays which were favored by opponents in 
past games? This would be the analogue to advice often given to human players: 
study your own lost games; look for improvements.

Unfortunately, an opening book does not generalize well; learning a better move 
for position Xi won't help with position Xj which differs by even one stone 
from Xi. What sort of move generators would cope with the vast number of 
similar positions? A single stone ( a ladder-breaker or a key point which makes 
or denies a second eye, for instance ) can make a large difference in the 
score, but one hopes that MC playouts would discover the negative consequences 
of moves which are almost but not quite right.

 


 

Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html ___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread terry mcintyre
What sort of sampling was used for the playouts? For this variable ( 
incorporating some information about the score vs only the win-loss variable ), 
does it make a difference whether playouts are totally random or incorporate 
varying degrees of similitude to good play?

From: [EMAIL PROTECTED] [EMAIL PROTECTED]


 I should have mentioned that I have only tested on 9x9. For larger boards, 
I don't know.



- Dave Hillis



 



`







 Intuitively, it seems like this should work. You only give the winning 
margin a small weight, or only use it to break ties, or only apply it after the 
game is already decided. I've tried many variations, as have others, including 
your exact algorithm above. It can make some moves look a little prettier, but 
it always causes problems and I have to take it out.







 I have my theories as to why that is, but for brevity's sake, in my 
experience, giving any consideration to winning margin is detrimental. After 
the game is decided, there are more elegant ways to bring it to a close. I came 
to this conclusion reluctantly.



 



 







 



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/






 

Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread Richard J. Lorentz

terry mcintyre wrote:
If I recall correctly, someone spoke of constraining the opening moves 
to the 3rd,4th,and 5th lines in the absence of nearby stones, or 
something to that effect. What was the impact of this experiment?


For what it's worth, I tried a number of experiments along these lines 
and none of them produced any improvement in play whatsoever. I'm still 
baffled by this. Sylvain says, both in his paper and in a message posted 
here, that it seems that one must try to constrain moves based on 
sequences of moves rather than individual moves in isolation. All rather 
mysterious to me.


-Richard
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte Carlo (QMC))

2007-02-07 Thread dhillismail
 The tests that involved factoring in the margin of victory while the game 
was still in play, used pure random playouts (or relatively close to it.) 
Later, I did some tests with esthetics as a goal in itself, and for these, I 
used what I call a heavy playout. I doubt that the playout type makes a 
difference but I don't know that for sure. If there was significant curiosity 
on this list about a specific configuration, I could probably run a quick test.
 
- Dave Hillis
 
 
-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 7 Feb 2007 12:42 PM
Subject: Re: [computer-go] MC approach (was: Monte Carlo (MC) vs Quasi-Monte 
Carlo (QMC))


What sort of sampling was used for the playouts? For this variable ( 
incorporating some information about the score vs only the win-loss variable ), 
does it make a difference whether playouts are totally random or incorporate 
varying degrees of similitude to good play?


From: [EMAIL PROTECTED] [EMAIL PROTECTED]


 I should have mentioned that I have only tested on 9x9. For larger boards, 
I don't know.
- Dave Hillis
 
`

 Intuitively, it seems like this should work. You only give the winning 
margin a small weight, or only use it to break ties, or only apply it after the 
game is already decided. I've tried many variations, as have others, including 
your exact algorithm above. It can make some moves look a little prettier, but 
it always causes problems and I have to take it out.
 I have my theories as to why that is, but for brevity's sake, in my 
experience, giving any consideration to winning margin is detrimental. After 
the game is decided, there are more elegant ways to bring it to a close. I came 
to this conclusion reluctantly.
 
 

 
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/





Any questions? Get answers on any topic at Yahoo! Answers. Try it now. 
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam 
and email virus protection.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach

2007-02-07 Thread Matt Gokey

Don Dailey wrote:


On Wed, 2007-02-07 at 11:34 +0100, Heikki Levanto wrote:


All this could be avoided by a simple rule: Instead of using +1 and -1
as the results, use +1000 and -1000, and add the final score to this.



Heikki,

I've tried ideas such as this in the past and it's quite
frustrating - everything that tries to take territory
scoring into account weakens the program.  


If you just need to see prettier moves,  I think it is
good enough to priortize the moves using some other
algorithm at the root of the tree.   If you just cover
the case where a program is easily winning or losing
it will play nicer but not stronger.


Don, do you have any theories or information about why this is the case?

I would think either way the algorithm should always prefer higher
average win probabilities, but faced with alternatives where the win
probabilities are same or nearly the same but the average winning
margins are higher for one alternative, wouldn't it be better to take
the path with the better margin? I mean it may in fact be wrong about
the win/loss classifications so choosing the better scores would seem to
make sense within reason as long as it's not greedy.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-07 Thread Nick Apperson

If it only did one playout you would be right, but imagine the following
cases:

case 1: White wins by .5 x 100, Black wins by .5 x 100
case 2: White wins by 100.5 x 91, Black wins by .5 x 109

the method that takes into account score would prefer the second case even
though it has a lower winning percentage that may be represented by the fact
that white is making an overplay for instance.  Obviously this is just one
example, but there are many cases like this and overplays tend to be
priveledged in a sense I would suspect with this kind of algorithm.

- Nick

On 2/7/07, Matt Gokey [EMAIL PROTECTED] wrote:


Don Dailey wrote:

 On Wed, 2007-02-07 at 11:34 +0100, Heikki Levanto wrote:

All this could be avoided by a simple rule: Instead of using +1 and -1
as the results, use +1000 and -1000, and add the final score to this.


 Heikki,

 I've tried ideas such as this in the past and it's quite
 frustrating - everything that tries to take territory
 scoring into account weakens the program.

 If you just need to see prettier moves,  I think it is
 good enough to priortize the moves using some other
 algorithm at the root of the tree.   If you just cover
 the case where a program is easily winning or losing
 it will play nicer but not stronger.

Don, do you have any theories or information about why this is the case?

I would think either way the algorithm should always prefer higher
average win probabilities, but faced with alternatives where the win
probabilities are same or nearly the same but the average winning
margins are higher for one alternative, wouldn't it be better to take
the path with the better margin? I mean it may in fact be wrong about
the win/loss classifications so choosing the better scores would seem to
make sense within reason as long as it's not greedy.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach

2007-02-07 Thread terry mcintyre
That drives me nuts! Minimax search would eliminate bad lines of play whenever 
a refutation is found. A good opponent would not play badly, and the quantity 
of possible bad moves should not affect the evaluation of good moves - but that 
seems to be what MC does, averaging out all moves regardless of whether they 
are known to be good,  have been refuted, or are of indeterminate status.

What am I missing?
 
Terry McIntyre


- Original Message 
From: Nick Apperson [EMAIL PROTECTED]

If it only did one playout you would be right, but imagine the following cases:

case 1: White wins by .5 x 100, Black wins by .5 x 100
case 2: White wins by 100.5 x 91, Black wins by .5 x 109

the method that takes into account score would prefer the second case even 
though it has a lower winning percentage that may be represented by the fact 
that white is making an overplay for instance.  Obviously this is just one 
example, but there are many cases like this and overplays tend to be 
priveledged in a sense I would suspect with this kind of algorithm.


- Nick






 

Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach

2007-02-07 Thread Don Dailey
On Wed, 2007-02-07 at 14:08 -0600, Matt Gokey wrote:
 Don, do you have any theories or information about why this is the
 case?

Not really.  In truth the only thing that matters is to increase your
winning percentage - not your score.   There seems to be no point in
tampering with this. 

- Don



 I would think either way the algorithm should always prefer higher
 average win probabilities, but faced with alternatives where the win
 probabilities are same or nearly the same but the average winning
 margins are higher for one alternative, wouldn't it be better to take
 the path with the better margin? I mean it may in fact be wrong about
 the win/loss classifications so choosing the better scores would seem
 to
 make sense within reason as long as it's not greedy.
 
 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-07 Thread dhillismail
 It's a *weighted* average of all moves. UCT tree search doesn't explore 
bad moves as often as good ones, so they don't contribute as much to the 
estimation of the worth of a node.
 
- Dave Hillis
 
 
-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 7 Feb 2007 4:31 PM
Subject: Re: [computer-go] MC approach


That drives me nuts! Minimax search would eliminate bad lines of play whenever 
a refutation is found. A good opponent would not play badly, and the quantity 
of possible bad moves should not affect the evaluation of good moves - but that 
seems to be what MC does, averaging out all moves regardless of whether they 
are known to be good,  have been refuted, or are of indeterminate status.

What am I missing?

 
Terry McIntyre




- Original Message 
From: Nick Apperson [EMAIL PROTECTED]

If it only did one playout you would be right, but imagine the following cases:

case 1: White wins by .5 x 100, Black wins by .5 x 100
case 2: White wins by 100.5 x 91, Black wins by .5 x 109

the method that takes into account score would prefer the second case even 
though it has a lower winning percentage that may be represented by the fact 
that white is making an overplay for instance.  Obviously this is just one 
example, but there are many cases like this and overplays tend to be 
priveledged in a sense I would suspect with this kind of algorithm. 

- Nick






It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar. 
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam 
and email virus protection.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MC approach

2007-02-07 Thread Chris Fant

The only think I can suggest which perhaps hasn't been tried is to
only consider the score in the evaluation if NONE of the playouts
resulted in a loss.


On 2/7/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


 It's a *weighted* average of all moves. UCT tree search doesn't explore
bad moves as often as good ones, so they don't contribute as much to the
estimation of the worth of a node.

- Dave Hillis


 -Original Message-
 From: [EMAIL PROTECTED]
 To: computer-go@computer-go.org
 Sent: Wed, 7 Feb 2007 4:31 PM
 Subject: Re: [computer-go] MC approach



That drives me nuts! Minimax search would eliminate bad lines of play
whenever a refutation is found. A good opponent would not play badly, and
the quantity of possible bad moves should not affect the evaluation of good
moves - but that seems to be what MC does, averaging out all moves
regardless of whether they are known to be good,  have been refuted, or are
of indeterminate status.

 What am I missing?

  Terry McIntyre




- Original Message 
 From: Nick Apperson [EMAIL PROTECTED]

 If it only did one playout you would be right, but imagine the following
cases:

 case 1: White wins by .5 x 100, Black wins by .5 x 100
 case 2: White wins by 100.5 x 91, Black wins by .5 x 109

 the method that takes into account score would prefer the second case even
though it has a lower winning percentage that may be represented by the fact
that white is making an overplay for instance.  Obviously this is just one
example, but there are many cases like this and overplays tend to be
priveledged in a sense I would suspect with this kind of algorithm.

 - Nick


 
 It's here! Your new message!
 Get new email alerts with the free Yahoo! Toolbar.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


 
 Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading
spam and email virus protection.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-07 Thread Heikki Levanto
On Wed, Feb 07, 2007 at 04:42:01PM -0500, Don Dailey wrote:
 In truth the only thing that matters is to increase your winning
 percentage - not your score.   There seems to be no point in tampering
 with this. 

I guess I must accept the wisdom of those who have tried these things.
Still, it hurts my intuition that it could be better for a program to
choose a line where it seems to win by 2 points, when another line seems
to end in a 100 point win. What if the opponent can improve his play
from what I expected, and gain an extra 3 points somewhere?

Maybe all this shows that we have not (yet?) understood all the
complications of the MC evaluation, and that more research is needed?

-H



-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-07 Thread Don Dailey
On Thu, 2007-02-08 at 00:46 +0100, Heikki Levanto wrote:
 On Wed, Feb 07, 2007 at 04:42:01PM -0500, Don Dailey wrote:
  In truth the only thing that matters is to increase your winning
  percentage - not your score.   There seems to be no point in tampering
  with this. 
 
 I guess I must accept the wisdom of those who have tried these things.
 Still, it hurts my intuition that it could be better for a program to
 choose a line where it seems to win by 2 points, when another line seems
 to end in a 100 point win. What if the opponent can improve his play
 from what I expected, and gain an extra 3 points somewhere?
 
 Maybe all this shows that we have not (yet?) understood all the
 complications of the MC evaluation, and that more research is needed?
 
 -H


I agree, more research is needed.

My intuition is also the same as yours - one would think 100 point
win is better.   It's not that it's better or it's a better move
objectively,  but it should be better against a fallible opponent
who could easily get distracted by the little battles and forget
the real point of the game.   At least it's a way to fight when
you are losing.

- Don



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-07 Thread Weston Markham

But of course, it's not the size of the win that counts, it is rather
the confidence that it really is a win.  In random playouts that
continue from a position from a close game, the ones that result in a
large victory are generally only ones where the opponent made a severe
blunder.  (Put another way, the score of the game is affected more by
how bad the bad moves are, rather than how good the good ones are, or
even how good most of the moves are.  Others have commented on this
effect in this list, in other contexts.)  Since you can't count on
that happening in the real game, these simulations have a lower value
in the context of ensuring a win.

Even when the program is losing (say by a little bit) it is more
important to play moves that it thinks are the most likely to convert
the game to a win by confusing the opponent, rather than to play moves
that will make the losing outcome more apparent.  These tend to be
different moves, and Monte Carlo methods are good at uncovering these
differences.  I agree that this is a bit surprising, but I find it
much less so when I think about it in these terms.

Given that people have reported such a strong effect, I am actually
wondering if these simulations (those that result in a large score
difference) should be _penalized_, for not being properly
representative of the likely outcome of the game.  In other words:

value = 1000 * win - score

Weston

On 2/7/07, Don Dailey [EMAIL PROTECTED] wrote:

My intuition is also the same as yours - one would think 100 point
win is better.   It's not that it's better or it's a better move
objectively,  but it should be better against a fallible opponent
who could easily get distracted by the little battles and forget
the real point of the game.   At least it's a way to fight when
you are losing.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MC approach

2007-02-07 Thread Hideki Kato
Matt Gokey: [EMAIL PROTECTED]:
Weston Markham wrote:

 But of course, it's not the size of the win that counts, it is rather
 the confidence that it really is a win.
Yes, and my reasoning was that a larger average win implied a higher 
confidence since there is more room for error.  That intuition may not 
hold though.
  In random playouts that
 continue from a position from a close game, the ones that result in a
 large victory are generally only ones where the opponent made a severe
 blunder.  (Put another way, the score of the game is affected more by
 how bad the bad moves are, rather than how good the good ones are, or
 even how good most of the moves are.  Others have commented on this
 effect in this list, in other contexts.)  Since you can't count on
 that happening in the real game, these simulations have a lower value
 in the context of ensuring a win.
That is the first reasonable argument I've heard that makes some sense 
as to why this effect may be true.  The opposite of course may be true 
as well and close games may really not be close due to the same blunder 
effect.  Perhaps it is just another symptom of the fact that most 
playouts are nonsense games.

We can test this effect by using, for example,
v = 0.5 * (1.0 + tanh (k * score)); // v is in [0...1].
with a little penalty of simulation speed. As k being lager, this 
function closes to commonly used threshold function, and vice versa.

I guess the best value of k depends on the sensefulness of the games, 
ie., current heuristics for pruning moves are not so effective that 
larger k is the best.

- gg
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/