Re: [computer-go] Simplified MC evaluator ?explained?
Weston Markham wrote: 1. Uniform playouts, as used in practice, are not really uniform over all legal go moves. Generally, pass moves are excluded until necessary, and moves that fill eyelike points are excluded. So, I assume that when you use the word legal, you mean admissible within this sort of playout. That shouldn't make a difference. If you count the pass as a legal move you should also count it as a legal moves in the subpaths. You would have W (subpaths without pass) and W' (subpaths with pass). Since it is only one of the 255 legal moves (counting the expected value of # legal moves as 19^2 - 212/2 where 212 = length of a pro game) it should not be very important even if miscounted. 2. That variance depends on the length of the playout. It is difficult for me to make sense of this statement, simply because not all playouts from a given position have the same length. Of course, but it is reasonable to expect 9x9 playout around 120, 19x19 around 400, etc. That has, at least, two consequences: 1. UCT is stronger for global search studying move 150 than move 5. 2. UCT can also be used for local search on 50-80 empty cells, something astronomically above what can be done with alfa-beta (assuming we want final evaluation, not evaluating nodes at some intermediate state.) Of course, UCT does not talk territory so adding the local subgames is far from trivial. The variance of the stochastic process is not to be mixed up with the distribution of the error of a repeated Bernoulli experiment! Perhaps I have mixed them up. Can you explain more clearly or precisely what the variance of the stochastic process is? Repeated Bernoulli experiments are studied as a stochastic process and, in our case, _the experiment is a stochastic process itself_. We have a nested process: A process whose steps are stochastic processes. Therefore we have the variance of the Bernoulli process which is in the binomial Bi(n, p) but that p is the result of a stochastic process whose variance biases p towards 1/2. I don't don't have a particular mathematical model for that process and model it as a noise. No matter how you distribute it, as long as the expected value is zero it and has the same consequences. This may be clearer: Count each move in the playout as adding a random territory difference in {-3, -2, -1, 0, 1, 2, 3} to the actual territorial value with any distribution whose expected value is 0. The probability of the result being 0 (a win) starting with a good move (initial value = +5) after 10 moves is significantly bigger than starting with a bad move (initial value = -3). But after a million moves the probability of a win is the same for both = 1/2. Because the variance of the experiment is somewhere in k·n the standard deviation is proportional to the sqrt of n = 1000. With a std deviation around 1000 the initial values 5 or -3 are way too small to make any difference. Does this game violate the condition that the number of legal moves for each side is balanced? I mean my argument in the big numbers. Of course, if you take it down to the detail you can find counterexamples. But these counterexamples should be balanced for black and white. In fact, that's what I pretended to say if everything is the same for black and white take the same in an informal sense, there is no reason one of the sides should be favored and p estimates W. What is more important is that it estimates W biased towards 1/2 because the playout is a stochastic process. Even if we can compute W exactly, do we have any reason to think that its value is a good estimate of the minimax value of the game? It is *not* a good estimate. I am not trying to advocate in favor of MC as a panacea. In the beginning I was among the critics and have done an effort to understand it better. Slowly I am becoming convinced of the possibilities specially in combination with other methods and now I have written a UCT engine, mainly for analysis. W fails to represent the minimax value of the game at least in two common situations: self-atari moves that would be a good idea if the opponent was kind enough as to forgive the capture and ladders. But _that is_ what uniformly random MC evaluates, not my fault ;-) Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Simplified MC evaluator ¿explained?
The second explanation was no clearer to me. I'll try to criticize in more detail: 1. Uniform playouts, as used in practice, are not really uniform over all legal go moves. Generally, pass moves are excluded until necessary, and moves that fill eyelike points are excluded. So, I assume that when you use the word legal, you mean admissible within this sort of playout. 2. That variance depends on the length of the playout. It is difficult for me to make sense of this statement, simply because not all playouts from a given position have the same length. My best guess is that you are claiming that the longer a playout is, the more likely it is that its result differs from the result under correct play. However, I strongly doubt that this is true for all starting positions. (Imagine that the first player needs to prevent the second player from forming two eyes in a large group. After doing this, that group will eventually be captured, allowing playouts to continue longer by filling the intersections that it once occupied. Failing to kill this group may allow the playouts to complete much more quickly, but gives inaccurate results.) 2.5. The variance of the stochastic process is not to be mixed up with the distribution of the error of a repeated Bernoulli experiment! Perhaps I have mixed them up. Can you explain more clearly or precisely what the variance of the stochastic process is? Do you perhaps mean some measurement of variation across different starting points, rather than across different Bernoulli trials from the same starting position? Or, do you mean to distinguish the probability that a playout's outcome differs from the outcome under correct play, from the probability that a playout results in a win? (Although those are just two different Bernoulli experiments, right?) Or is there some subtlety that I have missed? 3. 'p is a biased towards 1/2 estimator of W'. Consider the game: o / \ o o / \ | 1 0 0 (1 is a win by the first player, and 0 is a loss.) There is a move that could allow the first player to win, if the second player does not respond to it correctly. This sounds like a realistic scenario for go. W = 1/3 p = 1/4 p is further from 1/2 than W. Does this game violate the condition that the number of legal moves for each side is balanced? (It is still not clear to me what this condition is that you are attempting to impose.) Or, was I supposed to calculate a statistic across multiple game trees where W=1/3, in order to interpret p as an estimator of W? 4. Even if we can compute W exactly, do we have any reason to think that its value is a good estimate of the minimax value of the game? Is it even a better estimate than p, which we can already estimate accurately? (Note that in the game tree above, it is not.) My offhand guess is that it would not be as good. Weston ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Simplified MC evaluator ¿explained?
On 08/04/07, Jacques Basaldúa [EMAIL PROTECTED] wrote: I will try to explain it better: What had puzzled me until recently is how to combine two facts: a. As Jason says, A single MC playout corresponds to a Bernoulli trial with probability p and therefore, p-hat is distributed as a binomial which converges to the normal. See also: http://en.wikipedia.org/wiki/Bernoulli_process Regards Martin ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Simplified MC evaluator ¿explained?
On 4/7/07, Jacques Basaldúa [EMAIL PROTECTED] wrote: Assuming two simplifying hypotheses: 1. The playouts are uniformly random. 2. Both players have the same number of legal moves (or any unbalanced numbers compensate in the long term). I did not understand your post either. Is #2 the same as neither player passes until the end of the game? Or do you intend to assume that at any given level of the game tree, all nodes have the same number of children? The latter assumption seems to be much stronger than what you intend, since it would imply that p=W exactly. Plus, it is obviously (and dramatically) false under any normal go rules where two passes end the game. Weston ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/