Re: [computer-go] Yet another question on uct and rave
On Fri, Mar 28, 2008 at 3:10 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]> wrote: > So to sum up we have the following pseudo code : > > at a given node : > - find the child (among the visited child only) that maximizes de UCT-RAVE > value > - if this maximum UCT-RAVE value is less than FPU value and if there still > exisits unvisited nodes : > > choose one unvisited node- continue > > Is this correct ? Maybe, but it depends on how exactly you compute your Upper Confidence Bounds. If you don't add UCB's to the rave values you may either have to compare based on the UCT part alone, or do as GCP suggests (which sort of turns FPU in an UCT prior). However, I would suggest that you first start out with the standard UCT (always explore unvisited nodes) and see how you can improve from that as a baseline. If your branching factor becomes too big maybe try some kind of metabandit. Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
> So to sum up we have the following pseudo code : > at a given node : > > - find the child (among the visited child only) that maximizes de UCT-RAVE > value > > - if this maximum UCT-RAVE value is less than FPU value and if there still > exisits unvisited nodes : > > choose one unvisited node > > - continue > > > Is this correct ? I don't think so. You simply substitute the FPU in the UCT-RAVE formula for the UCT score if you have not explored this move before. You can not encounter the case where there is no RAVE score because of priors, so there is never a problem filling that part of the formula in. So, you simply put FPU instead of the UCT score if you don't have an UCT score. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
So to sum up we have the following pseudo code : at a given node : - find the child (among the visited child only) that maximizes de UCT-RAVE value - if this maximum UCT-RAVE value is less than FPU value and if there still exisits unvisited nodes : choose one unvisited node - continue Is this correct ? On Fri, Mar 28, 2008 at 2:57 PM, Erik van der Werf <[EMAIL PROTECTED]> wrote: > On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]> > wrote: > > So if I understand, at each node we need to play every possible action > once > > at first, even many of these actions are surely non optimal. And this > may be > > slow if the number of the possible action at this node is huge. > > Well, as discussed in their ICML paper you could also initialize nodes > with prior knowledge. > > > When you talk about FPU, does it mean that you give a kind of default > value > > for unvisited node and compare this value with (1-beta)*Q_uct + > beta*Q_rave > > if we can compute it ? > > Yes, you do the normal UCT-RAVE selection for the moves that have been > already been explored at least once, then if the highest upper > confidence bound (from the move you would normally select if there are > no unexplored nodes) does not exceed the FPU value you select an > unexplored node (FPU=infinity gives standard UCT). > > Erik > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] State of the art of pattern matching
On 28-mrt-08, at 09:43, Jacques Basaldúa wrote: The first source code was just an example to see what kind of code is generated. The second is useful, if you understand asm you should understand it. Well, the only serious assembler I ever wrote was on a 6502 :-) And that was a very long time ago, so I'm sorry to say the asm is a bit too hard for me to see what it does. I suppose there's no reason why it has to be assembler, you could just as well generate C-code. Maybe C-compilers aren't good enough still compared to hand-crafted code in cases like these? So although I start to understand some things, there's still a lot unclear to me. I think I see how you generate functions to efficiently update the hash-codes but I don't see yet how you go from there to finding patterns. I assume you allow some of the points to be undefined (also called 'don't care points') but I don't see how. And if you allow undefined points, why would you need masks of smaller manhatten distances? Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]> wrote: > So if I understand, at each node we need to play every possible action once > at first, even many of these actions are surely non optimal. And this may be > slow if the number of the possible action at this node is huge. Well, as discussed in their ICML paper you could also initialize nodes with prior knowledge. > When you talk about FPU, does it mean that you give a kind of default value > for unvisited node and compare this value with (1-beta)*Q_uct + beta*Q_rave > if we can compute it ? Yes, you do the normal UCT-RAVE selection for the moves that have been already been explored at least once, then if the highest upper confidence bound (from the move you would normally select if there are no unexplored nodes) does not exceed the FPU value you select an unexplored node (FPU=infinity gives standard UCT). Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] State of the art of pattern matching
terry mcintyre wrote: It is possible to get some remarkably high correlation between the moves played by pros and a predictor - yet still not have a good program. Why? We have a random variable, the place at which a player plays, and some variables that we can compute. The distribution of the random variable, depends on this variables. Our variables are shape urgency and life. Of course, I agree that it would be great to include life in our statistical model. It would be a much better model. The problem is we can compute shape in few clockcycles but we cannot compute life/death. (The revised Benson methods are of no use in real games because they detect life/death when it way too late.) All the pattern logic is not intended to determine what move has to be played, but only to select candidates for later methods. Also, it is not about what a player did once, but what is statistically sound in over 10 M positions. We ignored life/death when learning and we ignore it when we just suggest the best moves in terms of shape. That is fair, but, of course, the interaction between shape an life would change things for better. UCT methods are (among other reasons) strong in 9x9 because a random move (not in the first 2 lines in the beginning) is a reasonable move, and then the stochastic parts of the search finds good candidates (RAVE and similar). But in real go, there are 361 legal moves for move 1! And CrazyStone (as far as I know still the strongest 19x19 program) is "still" 2 kyu. That is the state of computer go in 2008, not the remark made by a pro about how hard it was to beat mogo by 9 stones. Most people in this list seem to be against human learned shape, but the truth is that we don't know if it is useful or not. That depends on the price we pay for urgency. In terms of RAM it is a couple of tenths Mb, that's nothing today, it was a lot years ago. Some people assume that you can predict what airplane society would be without building an airplane. Just move people in a bus, the airplane is the same only faster. That's not how things work. Unfortunately, you have to build an airplane to see what it can or cannot do. Some call it overengineering, but if it is not fast, then I agree it will be useless almost for sure. Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
I use FPU for both values for precisely the reasons you describe. Sent from my iPhone On Mar 28, 2008, at 9:36 AM, "Jaonary Rabarisoa" <[EMAIL PROTECTED]> wrote: So if I understand, at each node we need to play every possible action once at first, even many of these actions are surely non optimal. And this may be slow if the number of the possible action at this node is huge. When you talk about FPU, does it mean that you give a kind of default value for unvisited node and compare this value with (1-beta) *Q_uct + beta*Q_rave if we can compute it ? Finally does it make sense to throw away all non visited node and only consider the node that have a rave value first. Precisely use the FPU but only for unvisited node that have Q_rave value. On Fri, Mar 28, 2008 at 12:41 PM, Jason House <[EMAIL PROTECTED] > wrote: On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote: > - its rave and uct value are defined ( in this case we can > compute the above score) > - only the rave value is defined (in this situation the n (s,a) > = 0 and the uct value is not defined) > - neiher rave nor uct value is defined > > So my question is how they handle these case when they traverse the > tree ? Because their score are not always defined for every childs of > a node. I handled this by using FPU values. FPU = first play urgency, the value to initially assign to unvisited nodes. Other Gelley papers recommended a value of 1.1 > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
So if I understand, at each node we need to play every possible action once at first, even many of these actions are surely non optimal. And this may be slow if the number of the possible action at this node is huge. When you talk about FPU, does it mean that you give a kind of default value for unvisited node and compare this value with (1-beta)*Q_uct + beta*Q_rave if we can compute it ? Finally does it make sense to throw away all non visited node and only consider the node that have a rave value first. Precisely use the FPU but only for unvisited node that have Q_rave value. On Fri, Mar 28, 2008 at 12:41 PM, Jason House <[EMAIL PROTECTED]> wrote: > > On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote: > > > - its rave and uct value are defined ( in this case we can > > compute the above score) > > - only the rave value is defined (in this situation the n(s,a) > > = 0 and the uct value is not defined) > > - neiher rave nor uct value is defined > > > > So my question is how they handle these case when they traverse the > > tree ? Because their score are not always defined for every childs of > > a node. > > > I handled this by using FPU values. FPU = first play urgency, the value > to initially assign to unvisited nodes. Other Gelley papers recommended > a value of 1.1 > > > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] State of the art of pattern matching
Mark Boon wrote: Sorry, without a bit more explanation, the assembler code is very hard to understand. What exactly does it do? The first source code was just an example to see what kind of code is generated. The second is useful, if you understand asm you should understand it. The board has 4 modes, as far as patterns are concerned. So the following applies for each mode and for each possible mapping scheme. When 40 neighbors are used, (i.e. Manhattan distance 4) there are 81 possible situations (all the cells of a 9x9 board are different as far as board limits are concerned). On bigger boards, each cell maps one of the 81 possible 9x9 cells. The board system cannot play on less than 9x9. And for each of these 4x(maximum)81 (some board modes have smaller distance) the generator writes 2 functions: one for creating the mask/hash from scratch and an other to update all masks/hashes in the neighborhood of a new stone. Does it relate to a pattern? Yes the complete pattern is: +---+ | 4 | +---+---+---+ | 4 | 3 | 4 | +---+---+---+---+---+ | 4 | 3 | 2 | 3 | 4 | +---+---+---+---+---+---+---+ | 4 | 3 | 2 | 1 | 2 | 3 | 4 | +---+---+---+---+---+---+---+---+---+ | 4 | 3 | 2 | 1 | · | 1 | 2 | 3 | 4 | +---+---+---+---+---+---+---+---+---+ | 4 | 3 | 2 | 1 | 2 | 3 | 4 | +---+---+---+---+---+---+---+ | 4 | 3 | 2 | 3 | 4 | +---+---+---+---+---+ | 4 | 3 | 4 | +---+---+---+ | 4 | +---+ Justification of this shape: 1. It includes all standard nobi, iken tobi, niken tobi, kosumi, keima & ogeima relations (+ double iken tobi + double kosumi) 2. It detects if a pattern is at the 4-th line or the 4x4 corner (and below obviously). Less than that is too small. The 4-th line is too different from the center. 3. It is a nested structure, {dist12, dist12 + dist3} are both usable. 4. It has reasonable size for the information it contains. 5. The bit arrangement is optimized for 90 deg rotation. Additionally, I keep urgency information for: normal, the pattern shows up for the first time and urgency in a ko. Did you write a paper or post explaining this in more detail? Not yet. Do I understand correctly that you generate this code automatically? Yes. It would be a nightmare to write about 70K lines by hand and debugging would be hard as well. Although what the functions do is simple enough to create a test that verifies 100% of the cases. You are talking about updating 40 hashes. Does it mean your patterns have fixed size? Yes. 3 sizes: Manhattan distance 2, 3 and 4 In the 500 nanoseconds, how many patterns do you compare? That was updating (max) 40 hashes in the neighborhood of a newly placed stone. The precise number of hashes depends on the board coordinate and the surrounding stones although that is achieved without a single conditional jump. (It is a very conservative estimation for approx 140 instructions in a jump free sequence. The typical case is probably more like half of that.) But, as mentioned, it does not include neither the board logic nor the search that translates hash -> urgency. How about rotations and color inversions? In the slowest mode the pattern is rotated using macros like this one (that rotates a 40 neighbor pattern) asm mov edx, eax// @jm mov eax, [edx + 8] // jm.mask4 rol eax, 8 mov [edx + 8], eax // jm.mask4 mov eax, [edx + 4] // jm.mask3 shl eax, 6 mov ecx, eax and eax, 0ffh rol ecx, 8 or al, cl mov [edx + 4], eax // jm.mask3 mov eax, [edx] // jm.msk12 mov ecx, eax shl eax, 4 rol cl, 2 mov al, cl mov ecx, eax shr ecx, 16 and eax, 0ffF0FFh and ecx, F00h or eax, ecx mov [edx], eax // jm.msk12 end and converted to canonical. Color inversion is automatic because the pattern is own/opponent instead of black/white. In the fastest mode, the hash table has x8 size and includes the hashes of the rotated patterns. Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote: > - its rave and uct value are defined ( in this case we can > compute the above score) > - only the rave value is defined (in this situation the n(s,a) > = 0 and the uct value is not defined) > - neiher rave nor uct value is defined > > So my question is how they handle these case when they traverse the > tree ? Because their score are not always defined for every childs of > a node. I handled this by using FPU values. FPU = first play urgency, the value to initially assign to unvisited nodes. Other Gelley papers recommended a value of 1.1 > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Yet another question on uct and rave
On Fri, Mar 28, 2008 at 11:20 AM, Jaonary Rabarisoa <[EMAIL PROTECTED]> wrote: > Hi all, > > After a long search on the computer go mailing list archive and reading and > reading again the paper of Gelly and Silver (ICML 2007) I didn't find > answers to my question. > In this paper they introduce a way to select the next move, at a given > state, using the rave and uct value of its childs. They do this by comparing > > (1-beta)*Q_uct + beta*Q_rave > > > But, by the definition of the rave and uct value, for each child of a given > node we may have the following situation : > > - its rave and uct value are defined ( in this case we can compute the > above score) > - only the rave value is defined (in this situation the n(s,a) = 0 and the > uct value is not defined) Plain UCT always selects unvisited nodes first ( n=0 -> Q=infinite ). > - neiher rave nor uct value is defined This cannot happen if you always select unvisited nodes first (because if you select a move it leads to an update for both Q_rave and Q_uct). Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Yet another question on uct and rave
Hi all, After a long search on the computer go mailing list archive and reading and reading again the paper of Gelly and Silver (ICML 2007) I didn't find answers to my question. In this paper they introduce a way to select the next move, at a given state, using the rave and uct value of its childs. They do this by comparing (1-beta)*Q_uct + beta*Q_rave But, by the definition of the rave and uct value, for each child of a given node we may have the following situation : - its rave and uct value are defined ( in this case we can compute the above score) - only the rave value is defined (in this situation the n(s,a) = 0 and the uct value is not defined) - neiher rave nor uct value is defined So my question is how they handle these case when they traverse the tree ? Because their score are not always defined for every childs of a node. Cheers, Jaonary ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/