Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

2007-02-07 Thread Olivier Teytaud
As I have spent a lot of time trying to guess what could be done for 
Quasi-Monte-Carlo
or other standard forms of Monte-Carlo-improvements in computer-go, I 
write below

my (humble and pessimistic :-) ) opinion about that.

Let's formalize Monte-Carlo.
Consider P a distribution of probability. Consider that you
want to estimate Pf, the expectation of f under distribution P. You can 
get values
of f at some points; note hP ($\hat P$ is a usual notation) the 
empirical distribution

of probability, i.e.

 Pf = unknown expectation of f for distribution P
 hPf = (\sum_{i=1}^n f(x_i) )/n   
In standard naive Monte-Carlo, the x_i are independent identically 
distributed according
to distribution P. One can sample without replacement in the discrete 
case (no more indep.

then, but almost :-) ).

Biasing Monte-Carlo (importance sampling) consists in replacing P by 
some P',

which samples more often points in more important areas. Standard tools
consist in using P' with density proportional to the product
  (i) density of P
  (ii) local standard deviation of f (that is unknown and must be 
approximated)

This does not introduce a bias (on average, hPf remains close to Pf), as we
change the weighting of points; hPf is now:

 hPf = (\sum_{i=1}^n P(x_i)f(x_i)/P'(x_i)  )/n
and the x_i are i.i.d according to P' in the most simple case (but other 
better cases

are possible, e.g. sampling in strata).

Let's consider how this can be applied to Monte-Carlo-Go.

Designing a good random player is not importance sampling as above. There's
no reweighting. We change what is estimated; we do not want to know what
is the probability of winning for a uniform random player; we just want 
a better
player, so that probabilities estimated by this player will lead to a 
better MC-based
player. Improving the random player is very important, but this is not 
importance sampling.


Why not adding importance sampling ?
Because this requires a good estimate of important
areas, in the sense e.g. areas with high standard deviation. This is, 
in my humble opinion,
nearly as hard as designing a good Go-evaluation-function. Of course, I 
might miss something;
but I think that if you can quickly find areas with significantly higher 
of lower probabilities
of winning, then you have improved your random player, and you don't 
want to reweight anything,
and you are not interested in the probability of winning for the basic 
random player - you want

to move to the new random player.
This is only improving the random player; you change the distribution, 
because you guess that it will
be a better random player (and you hope that your MC-based player will 
be better).


Quasi-Monte-Carlo methods consist in replacing hP by a sum on a 
deterministic

or moderately randomized sample so that the convergence
from hPf to Pf is faster (than the O(1/\sqrt{n}) usual in MC-methods). 
This is somewhat
orthogonal to importance sampling (but both techniques are difficult to 
merge in one technique,

by the way).
In standard QMC, the distribution is not modified, but points are not 
chosen
by independent identically distributed sampling (by the way, this is 
also not the case
with most importance-samplings - but in QMC the focus is on reducing the 
random-part).


(It is now often considered that QMC must be only partially 
derandomized, as
randomized QMC (as opposed to strictly deterministic QMC) is better, 
especially for
large dimension; but, nevermind, let's consider a QMC method, 
independently of
how it is designed; we can just pick up a QMC-sequence from the state of 
the art,

without (at first) trying to understand how it works.)

But, QMC is very efficient in continuous domains, sometimes in specific 
discrete
domains also, but I don't see how to plug QMC in tree-search, or even 
just in a Go-ban sampling
for introducing QMC in a particular node. It looks appealing, but you 
need something like a distance
between moves that is related to the probability of winning, and even if 
you have such a distance, designing

the QMC-sequence is far from straightforward.

However, a simple form of derandomization, the so-called 
jittered-sampling, can perhaps be applied.


  instead of randomly sampling a go-ban,

  split (partition) the go-ban in k zones, and pick the i^{th} point in 
the 1+(i%k)  zone;

  (% means modulo) zone.
  (this in the case of a partition in areas with same area, but this 
can be adapted to

  unequal areas)

Perhaps it is a good idea, but I have not enough Go-knowledge for 
designing such zones. 
Another possible tool for improving Monte-Carlo comparisons

is introducing correlations between explorations of various nodes.
This possibility, consisting in correlating choices in nodes B and C.
should improve the stability of the comparison between B and C, 
therefore improving

the choice performed in node A if A has sons B and C.
However, this introduce a strong variance-increase at node A if A is the 
father of B and C (much

Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

2007-02-07 Thread Tapani Raiko
  I could see a case where it is possible to reduce a variance of a single
  variable even in the 0-1 case. Let us say that black has about 5% chances of
  winning. If we could (exactly) double the chances of black winning by
  changing the nonuniform sampling somehow (say, enforce bad moves by white),
  we could sample from that and divide the estimated black's winning chance in
  the end by 2. This would of course be very difficult in practice. (A binary
  random variable gives more information when the chances are closer to
  50-50.) This could be useful in practice in handicap games, by for instance
  enforcing a black pass with 1% chance every move. Sampling would be
  distorted towards white win, which is realistic since white is assumed to be
  a stronger player, anyway.

 I don't understand this line of reasoning.

Let my try again using the handicap example. Let's say MC player is given 
a huge handicap. In the simulations, it is winning all of its games, so 
there is no information helping to select the next move. Using information 
theory, each play-out gives one bit of information if the chances are 
50:50, but if the chances are unbalanced, the information content is 
lower. (see http://en.wikipedia.org/wiki/Binary_entropy_function ) In the 
extreme case, there is no information at all.

Now, let us use distorted MC where we enforce black to pass with a few 
percent chance every move. White begins to win some of the simulations, so 
MC is useful again.

How this is related to reducing the variance?

Let us say that a black move leads to a white win with probability p very 
close to zero. Let us also assume that distorting the simulations doubles 
white's chances to 2p.
Using normal MC, the variance of our estimate of p using N samples is
p*(1-p)/N
and using distroted MC, the variance of 2p is
2p*(1-2p)/N
estimating p by using the estimate of 2p, the variance is divided by 4:
p*(1-2p)/2N which is less than p*(1-p)/N.

In practice, we cannot know that distorting would increase the chances 
exactly by doubling them, but if we use the same distortion to estimate 
all moves, we can still compare them.

Tapani
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

2007-02-06 Thread Tapani Raiko
It seems that there are at least three cases:
1: Choosing a random move from a uniform distribution
2: Choosing a random move from a nonuniform distribution (patterns etc.)
3: Choosing a move taking into account what has been chosen before

The concensus seems to be that numbers 1 and 2 are MC and 3 is QMC. 
Mogo uses QMC within the tree in memory and MC for the leaves, so which 
should it be called?

And about reducing variance: In games you only care about estimating the 
goodness of the best moves (in order to select the best one). You don't 
care how bad a move is, if you are fairly certain that it is not the best 
one. You should thus reduce the variance of the best moves, that is, study 
them more often. This is exactly what UCT is about, reducing the variance 
of variables of interest.

I could see a case where it is possible to reduce a variance of a single 
variable even in the 0-1 case. Let us say that black has about 5% chances 
of winning. If we could (exactly) double the chances of black winning by 
changing the nonuniform sampling somehow (say, enforce bad moves by 
white), we could sample from that and divide the estimated black's winning 
chance in the end by 2. This would of course be very difficult in 
practice. (A binary random variable gives more information when the 
chances are closer to 50-50.) This could be useful in practice in 
handicap games, by for instance enforcing a black pass with 1% chance 
every move. Sampling would be distorted towards white win, which is 
realistic since white is assumed to be a stronger player, anyway.

To summarise, I agree that there are links to other MC research, and they 
should be explored.

--
 Tapani Raiko, [EMAIL PROTECTED], +358 50 5225750
 http://www.cis.hut.fi/praiko/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

2007-02-06 Thread Matt Gokey

ivan dubois wrote:

I dont understand how you can reduce the variance of monte-carlo sampling, 
given a simulation can return either 0(loss) or 1(win).
Maybe it means trying to have mean values that are closer to 0 or 1 ?

Well strictly speaking I agree the standard models don't fit that well 
- the application of monte carlo to go is much different than 
traditional applications.  However, imagine the whole path of a 
simulation to the leaf as a meaningful set of points.  We are only 
measuring the end, but the path is very important too.


Also, as you mentioned one could target the larger scoped variance of 
the set of simulations mean value correctly classifying the point as a 
win or loss.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

2007-02-06 Thread Luke Gustafson



It seems that there are at least three cases:
1: Choosing a random move from a uniform distribution
2: Choosing a random move from a nonuniform distribution (patterns etc.)
3: Choosing a move taking into account what has been chosen before

The concensus seems to be that numbers 1 and 2 are MC and 3 is QMC.


I don't think 3 is an accurate description of MC.  Generally, MC is a 
process where a number of paths (aka sequences of random numbers) are used 
to sample some function.  In go, a path would be a single playout, and the 
function is the score.  QMC is when the paths are constructed using variance 
reduction techniques, meaning that they are more representative of the 
sample space.  AFAIK no one has used any QMC techniques in go; I really 
doubt they would be much help because the function (the score) is not smooth 
in the inputs (that is, small changes in the path are not small changes in 
the score).


I think what is confusing the matter is the sample space--i.e. what games we 
are evaluating.  The standard MC engine's sample space is all games that 
don't fill an eye.  Better might be all games that don't fill an eye and 
don't play self-atari.  Mogo has an even more restrictive sample space, 
designed to be a much better evaluation function.


Finally, UCT is not MC.  MC is an evaluation function, UCT is a tree search 
technique. You could just as easily use UCT with any other (stochastic) 
evaluation fuction, or MC with any other tree search.  It turns out that UCT 
has proven to be very effective using MC evaluation.


So, one could say Mogo is UCT, with a MC evaluation function, with 
heuristics to improve the MC games.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/