Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)
As I have spent a lot of time trying to guess what could be done for Quasi-Monte-Carlo or other standard forms of Monte-Carlo-improvements in computer-go, I write below my (humble and pessimistic :-) ) opinion about that. Let's formalize Monte-Carlo. Consider P a distribution of probability. Consider that you want to estimate Pf, the expectation of f under distribution P. You can get values of f at some points; note hP ($\hat P$ is a usual notation) the empirical distribution of probability, i.e. Pf = unknown expectation of f for distribution P hPf = (\sum_{i=1}^n f(x_i) )/n In standard naive Monte-Carlo, the x_i are independent identically distributed according to distribution P. One can sample without replacement in the discrete case (no more indep. then, but almost :-) ). Biasing Monte-Carlo (importance sampling) consists in replacing P by some P', which samples more often points in more important areas. Standard tools consist in using P' with density proportional to the product (i) density of P (ii) local standard deviation of f (that is unknown and must be approximated) This does not introduce a bias (on average, hPf remains close to Pf), as we change the weighting of points; hPf is now: hPf = (\sum_{i=1}^n P(x_i)f(x_i)/P'(x_i) )/n and the x_i are i.i.d according to P' in the most simple case (but other better cases are possible, e.g. sampling in strata). Let's consider how this can be applied to Monte-Carlo-Go. Designing a good random player is not importance sampling as above. There's no reweighting. We change what is estimated; we do not want to know what is the probability of winning for a uniform random player; we just want a better player, so that probabilities estimated by this player will lead to a better MC-based player. Improving the random player is very important, but this is not importance sampling. Why not adding importance sampling ? Because this requires a good estimate of important areas, in the sense e.g. areas with high standard deviation. This is, in my humble opinion, nearly as hard as designing a good Go-evaluation-function. Of course, I might miss something; but I think that if you can quickly find areas with significantly higher of lower probabilities of winning, then you have improved your random player, and you don't want to reweight anything, and you are not interested in the probability of winning for the basic random player - you want to move to the new random player. This is only improving the random player; you change the distribution, because you guess that it will be a better random player (and you hope that your MC-based player will be better). Quasi-Monte-Carlo methods consist in replacing hP by a sum on a deterministic or moderately randomized sample so that the convergence from hPf to Pf is faster (than the O(1/\sqrt{n}) usual in MC-methods). This is somewhat orthogonal to importance sampling (but both techniques are difficult to merge in one technique, by the way). In standard QMC, the distribution is not modified, but points are not chosen by independent identically distributed sampling (by the way, this is also not the case with most importance-samplings - but in QMC the focus is on reducing the random-part). (It is now often considered that QMC must be only partially derandomized, as randomized QMC (as opposed to strictly deterministic QMC) is better, especially for large dimension; but, nevermind, let's consider a QMC method, independently of how it is designed; we can just pick up a QMC-sequence from the state of the art, without (at first) trying to understand how it works.) But, QMC is very efficient in continuous domains, sometimes in specific discrete domains also, but I don't see how to plug QMC in tree-search, or even just in a Go-ban sampling for introducing QMC in a particular node. It looks appealing, but you need something like a distance between moves that is related to the probability of winning, and even if you have such a distance, designing the QMC-sequence is far from straightforward. However, a simple form of derandomization, the so-called jittered-sampling, can perhaps be applied. instead of randomly sampling a go-ban, split (partition) the go-ban in k zones, and pick the i^{th} point in the 1+(i%k) zone; (% means modulo) zone. (this in the case of a partition in areas with same area, but this can be adapted to unequal areas) Perhaps it is a good idea, but I have not enough Go-knowledge for designing such zones. Another possible tool for improving Monte-Carlo comparisons is introducing correlations between explorations of various nodes. This possibility, consisting in correlating choices in nodes B and C. should improve the stability of the comparison between B and C, therefore improving the choice performed in node A if A has sons B and C. However, this introduce a strong variance-increase at node A if A is the father of B and C (much
Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)
I could see a case where it is possible to reduce a variance of a single variable even in the 0-1 case. Let us say that black has about 5% chances of winning. If we could (exactly) double the chances of black winning by changing the nonuniform sampling somehow (say, enforce bad moves by white), we could sample from that and divide the estimated black's winning chance in the end by 2. This would of course be very difficult in practice. (A binary random variable gives more information when the chances are closer to 50-50.) This could be useful in practice in handicap games, by for instance enforcing a black pass with 1% chance every move. Sampling would be distorted towards white win, which is realistic since white is assumed to be a stronger player, anyway. I don't understand this line of reasoning. Let my try again using the handicap example. Let's say MC player is given a huge handicap. In the simulations, it is winning all of its games, so there is no information helping to select the next move. Using information theory, each play-out gives one bit of information if the chances are 50:50, but if the chances are unbalanced, the information content is lower. (see http://en.wikipedia.org/wiki/Binary_entropy_function ) In the extreme case, there is no information at all. Now, let us use distorted MC where we enforce black to pass with a few percent chance every move. White begins to win some of the simulations, so MC is useful again. How this is related to reducing the variance? Let us say that a black move leads to a white win with probability p very close to zero. Let us also assume that distorting the simulations doubles white's chances to 2p. Using normal MC, the variance of our estimate of p using N samples is p*(1-p)/N and using distroted MC, the variance of 2p is 2p*(1-2p)/N estimating p by using the estimate of 2p, the variance is divided by 4: p*(1-2p)/2N which is less than p*(1-p)/N. In practice, we cannot know that distorting would increase the chances exactly by doubling them, but if we use the same distortion to estimate all moves, we can still compare them. Tapani ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)
It seems that there are at least three cases: 1: Choosing a random move from a uniform distribution 2: Choosing a random move from a nonuniform distribution (patterns etc.) 3: Choosing a move taking into account what has been chosen before The concensus seems to be that numbers 1 and 2 are MC and 3 is QMC. Mogo uses QMC within the tree in memory and MC for the leaves, so which should it be called? And about reducing variance: In games you only care about estimating the goodness of the best moves (in order to select the best one). You don't care how bad a move is, if you are fairly certain that it is not the best one. You should thus reduce the variance of the best moves, that is, study them more often. This is exactly what UCT is about, reducing the variance of variables of interest. I could see a case where it is possible to reduce a variance of a single variable even in the 0-1 case. Let us say that black has about 5% chances of winning. If we could (exactly) double the chances of black winning by changing the nonuniform sampling somehow (say, enforce bad moves by white), we could sample from that and divide the estimated black's winning chance in the end by 2. This would of course be very difficult in practice. (A binary random variable gives more information when the chances are closer to 50-50.) This could be useful in practice in handicap games, by for instance enforcing a black pass with 1% chance every move. Sampling would be distorted towards white win, which is realistic since white is assumed to be a stronger player, anyway. To summarise, I agree that there are links to other MC research, and they should be explored. -- Tapani Raiko, [EMAIL PROTECTED], +358 50 5225750 http://www.cis.hut.fi/praiko/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)
ivan dubois wrote: I dont understand how you can reduce the variance of monte-carlo sampling, given a simulation can return either 0(loss) or 1(win). Maybe it means trying to have mean values that are closer to 0 or 1 ? Well strictly speaking I agree the standard models don't fit that well - the application of monte carlo to go is much different than traditional applications. However, imagine the whole path of a simulation to the leaf as a meaningful set of points. We are only measuring the end, but the path is very important too. Also, as you mentioned one could target the larger scoped variance of the set of simulations mean value correctly classifying the point as a win or loss. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)
It seems that there are at least three cases: 1: Choosing a random move from a uniform distribution 2: Choosing a random move from a nonuniform distribution (patterns etc.) 3: Choosing a move taking into account what has been chosen before The concensus seems to be that numbers 1 and 2 are MC and 3 is QMC. I don't think 3 is an accurate description of MC. Generally, MC is a process where a number of paths (aka sequences of random numbers) are used to sample some function. In go, a path would be a single playout, and the function is the score. QMC is when the paths are constructed using variance reduction techniques, meaning that they are more representative of the sample space. AFAIK no one has used any QMC techniques in go; I really doubt they would be much help because the function (the score) is not smooth in the inputs (that is, small changes in the path are not small changes in the score). I think what is confusing the matter is the sample space--i.e. what games we are evaluating. The standard MC engine's sample space is all games that don't fill an eye. Better might be all games that don't fill an eye and don't play self-atari. Mogo has an even more restrictive sample space, designed to be a much better evaluation function. Finally, UCT is not MC. MC is an evaluation function, UCT is a tree search technique. You could just as easily use UCT with any other (stochastic) evaluation fuction, or MC with any other tree search. It turns out that UCT has proven to be very effective using MC evaluation. So, one could say Mogo is UCT, with a MC evaluation function, with heuristics to improve the MC games. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/