Re: [computer-go] Simplified MC evaluator ?explained?

2007-04-10 Thread Jacques Basaldúa

Weston Markham wrote:

 1.  Uniform playouts, as used in practice, are not really uniform over
 all legal go moves.  Generally, pass moves are excluded until
 necessary, and moves that fill eyelike points are excluded.  So, I
 assume that when you use the word legal, you mean admissible within
 this sort of playout.

That shouldn't make a difference. If you count the pass as a legal move
you should also count it as a legal moves in the subpaths.
You would have W (subpaths without pass) and W' (subpaths with pass).
Since it is only one of the 255 legal moves (counting the expected value
of # legal moves as 19^2 - 212/2 where 212 = length of a pro game) it
should not be very important even if miscounted.

 2. That variance depends on the length of the playout.  It is
 difficult for me to make sense of this statement, simply because not
 all playouts from a given position have the same length.

Of course, but it is reasonable to expect 9x9 playout around 120, 19x19
around 400, etc. That has, at least, two consequences:

 1. UCT is stronger for global search studying move 150 than move 5.

 2. UCT can also be used for local search on 50-80 empty cells, something
astronomically above what can be done with alfa-beta (assuming we want final
evaluation, not evaluating nodes at some intermediate state.) Of course,
UCT does not talk territory so adding the local subgames is far from 
trivial.


 The variance of the stochastic process is not to be mixed up
 with the distribution of the error of a repeated Bernoulli experiment!
 Perhaps I have mixed them up.  Can you explain more clearly or precisely
 what the variance of the stochastic process is?

Repeated Bernoulli experiments are studied as a stochastic process and,
in our case, _the experiment is a stochastic process itself_.

We have a nested process: A process whose steps are stochastic processes.

Therefore we have the variance of the Bernoulli process which is in the
binomial Bi(n, p) but that p is the result of a stochastic process whose
variance biases p towards 1/2.

I don't don't have a particular mathematical model for that process and 
model
it as a noise. No matter how you distribute it, as long as the expected 
value is zero

it and has the same consequences. This may be clearer:

Count each move in the playout as adding a random territory difference
in {-3, -2, -1, 0, 1, 2, 3} to the actual territorial value with any
distribution whose expected value is 0. The probability of the result
being  0 (a win) starting with a good move (initial value = +5)
after 10 moves is significantly bigger than starting with a bad move
(initial value = -3). But after a million moves the probability of
a win is the same for both = 1/2. Because the variance of the
experiment is somewhere in k·n the standard deviation is proportional
to the sqrt of n = 1000. With a std deviation around 1000 the initial
values 5 or -3 are way too small to make any difference.

 Does this game violate the condition that the number of legal moves
 for each side is balanced?

I mean my argument in the big numbers. Of course, if you take it down to
the detail you can find counterexamples. But these counterexamples
should be balanced for black and white. In fact, that's what I pretended
to say if everything is the same for black and white take the same
in an informal sense, there is no reason one of the sides should be
favored and p estimates W. What is more important is that it estimates
W biased towards 1/2 because the playout is a stochastic process.

 Even if we can compute W exactly, do we have any reason to think
 that its value is a good estimate of the minimax value of the game?

It is *not* a good estimate. I am not trying to advocate in favor of MC
as a panacea. In the beginning I was among the critics and have done
an effort to understand it better. Slowly I am becoming convinced
of the possibilities specially in combination with other methods
and now I have written a UCT engine, mainly for analysis.
W fails to represent the minimax value of the game at least in two
common situations: self-atari moves that would be a good idea if
the opponent was kind enough as to forgive the capture and ladders.
But _that is_ what uniformly random MC evaluates, not my fault ;-)

Jacques.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Simplified MC evaluator ¿explained?

2007-04-09 Thread Weston Markham

The second explanation was no clearer to me.  I'll try to criticize in
more detail:

1.  Uniform playouts, as used in practice, are not really uniform over
all legal go moves.  Generally, pass moves are excluded until
necessary, and moves that fill eyelike points are excluded.  So, I
assume that when you use the word legal, you mean admissible within
this sort of playout.

2.  That variance depends on the length of the playout.  It is
difficult for me to make sense of this statement, simply because not
all playouts from a given position have the same length.  My best
guess is that you are claiming that the longer a playout is, the more
likely it is that its result differs from the result under correct
play.  However, I strongly doubt that this is true for all starting
positions.  (Imagine that the first player needs to prevent the second
player from forming two eyes in a large group.  After doing this, that
group will eventually be captured, allowing playouts to continue
longer by filling the intersections that it once occupied.  Failing to
kill this group may allow the playouts to complete much more quickly,
but gives inaccurate results.)

2.5.  The variance of the stochastic process is not to be mixed up
with the distribution of
the error of a repeated Bernoulli experiment!  Perhaps I have mixed
them up.  Can you explain more clearly or precisely what the variance
of the stochastic process is?  Do you perhaps mean some measurement
of variation across different starting points, rather than across
different Bernoulli trials from the same starting position?  Or, do
you mean to distinguish the probability that a playout's outcome
differs from the outcome under correct play, from the probability that
a playout results in a win?  (Although those are just two different
Bernoulli experiments, right?)  Or is there some subtlety that I have
missed?

3.  'p is a biased towards 1/2 estimator of W'.  Consider the game:

o
  / \
 o   o
/ \  |
1   0 0

(1 is a win by the first player, and 0 is a loss.)  There is a move
that could allow the first player to win, if the second player does
not respond to it correctly.  This sounds like a realistic scenario
for go.

W = 1/3
p = 1/4

p is further from 1/2 than W.  Does this game violate the condition
that the number of legal moves for each side is balanced?  (It is
still not clear to me what this condition is that you are attempting
to impose.)  Or, was I supposed to calculate a statistic across
multiple game trees where W=1/3, in order to interpret p as an
estimator of W?

4.  Even if we can compute W exactly, do we have any reason to think
that its value is a good estimate of the minimax value of the game?
Is it even a better estimate than p, which we can already estimate
accurately?  (Note that in the game tree above, it is not.)  My
offhand guess is that it would not be as good.

Weston
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Simplified MC evaluator ¿explained?

2007-04-08 Thread Martin Møller Pedersen

On 08/04/07, Jacques Basaldúa [EMAIL PROTECTED] wrote:

I will try to explain it better:

What had puzzled me until recently is how to combine two facts:

 a. As Jason says,  A single MC playout corresponds to a Bernoulli
trial with probability p and therefore, p-hat is distributed as a binomial
which converges to the normal.


See also:
http://en.wikipedia.org/wiki/Bernoulli_process

Regards
Martin
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Simplified MC evaluator ¿explained?

2007-04-07 Thread Weston Markham

On 4/7/07, Jacques Basaldúa [EMAIL PROTECTED] wrote:


Assuming two simplifying hypotheses:

1. The playouts are uniformly random.
2. Both players have the same number of legal moves (or any
unbalanced numbers compensate in the long term).



I did not understand your post either.  Is #2 the same as neither player
passes until the end of the game?  Or do you intend to assume that at any
given level of the game tree, all nodes have the same number of children?
The latter assumption seems to be much stronger than what you intend, since
it would imply that p=W exactly.  Plus, it is obviously (and dramatically)
false under any normal go rules where two passes end the game.

Weston
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/