Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Erik van der Werf
On Fri, Mar 28, 2008 at 3:10 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]> wrote:
> So to sum up we have the following pseudo code :
>
> at a given node :
> - find the child (among the visited child only) that maximizes de UCT-RAVE
> value
> - if this maximum UCT-RAVE value is less than FPU value and if there still
> exisits unvisited nodes :
>
> choose one unvisited node- continue
>
> Is this correct ?

Maybe, but it depends on how exactly you compute your Upper Confidence
Bounds. If you don't add UCB's to the rave values you may either have
to compare based on the UCT part alone, or do as GCP suggests (which
sort of turns FPU in an UCT prior).

However, I would suggest that you first start out with the standard
UCT (always explore unvisited nodes) and see how you can improve from
that as a baseline. If your branching factor becomes too big maybe try
some kind of metabandit.

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Gian-Carlo Pascutto
> So to sum up we have the following pseudo code :
> at a given node :
>
> - find the child (among the visited child only) that maximizes de UCT-RAVE
> value
>
> - if this maximum UCT-RAVE value is less than FPU value and if there still
> exisits unvisited nodes :
>
> choose one unvisited node
>
> - continue
>
>
> Is this correct ?

I don't think so. You simply substitute the FPU in the UCT-RAVE
formula for the UCT score if you have not explored this move before.
You can not encounter the case where there is no RAVE score because of
priors, so there is never a problem filling that part of the formula in.

So, you simply put FPU instead of the UCT score if you don't
have an UCT score.

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Jaonary Rabarisoa
So to sum up we have the following pseudo code :
at a given node :

- find the child (among the visited child only) that maximizes de UCT-RAVE
value

- if this maximum UCT-RAVE value is less than FPU value and if there still
exisits unvisited nodes :

choose one unvisited node

- continue


Is this correct ?


On Fri, Mar 28, 2008 at 2:57 PM, Erik van der Werf <[EMAIL PROTECTED]>
wrote:

> On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]>
> wrote:
> > So if I understand, at each node we need to play every possible action
> once
> > at first, even many of these actions are surely non optimal. And this
> may be
> > slow if the number of the possible action at this node is huge.
>
> Well, as discussed in their ICML paper you could also initialize nodes
> with prior knowledge.
>
> > When you talk about FPU, does it mean that you give  a kind of default
> value
> > for unvisited node and compare this value with (1-beta)*Q_uct +
> beta*Q_rave
> > if we can compute it ?
>
> Yes, you do the normal UCT-RAVE selection for the moves that have been
> already been explored at least once, then if the highest upper
> confidence bound (from the move you would normally select if there are
> no unexplored nodes) does not exceed the FPU value you select an
> unexplored node (FPU=infinity gives standard UCT).
>
> Erik
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] State of the art of pattern matching

2008-03-28 Thread Mark Boon


On 28-mrt-08, at 09:43, Jacques Basaldúa wrote:


The first source code was just an example to see what kind of code
is generated. The second is useful, if you understand asm you  
should understand it.


Well, the only serious assembler I ever wrote was on a 6502 :-) And  
that was a very long time ago, so I'm sorry to say the asm is a bit  
too hard for me to see what it does. I suppose there's no reason why  
it has to be assembler, you could just as well generate C-code. Maybe  
C-compilers aren't good enough still compared to hand-crafted code in  
cases like these?


So although I start to understand some things, there's still a lot  
unclear to me. I think I see how you generate functions to  
efficiently update the hash-codes but I don't see yet how you go from  
there to finding patterns. I assume you allow some of the points to  
be undefined (also called 'don't care points') but I don't see how.  
And if you allow undefined points, why would you need masks of  
smaller manhatten distances?


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Erik van der Werf
On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa <[EMAIL PROTECTED]> wrote:
> So if I understand, at each node we need to play every possible action once
> at first, even many of these actions are surely non optimal. And this may be
> slow if the number of the possible action at this node is huge.

Well, as discussed in their ICML paper you could also initialize nodes
with prior knowledge.

> When you talk about FPU, does it mean that you give  a kind of default value
> for unvisited node and compare this value with (1-beta)*Q_uct + beta*Q_rave
> if we can compute it ?

Yes, you do the normal UCT-RAVE selection for the moves that have been
already been explored at least once, then if the highest upper
confidence bound (from the move you would normally select if there are
no unexplored nodes) does not exceed the FPU value you select an
unexplored node (FPU=infinity gives standard UCT).

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] State of the art of pattern matching

2008-03-28 Thread Jacques Basaldúa

terry mcintyre wrote:


It is possible to get some remarkably high correlation
between the moves played by pros and a predictor - yet
still not have a good program. Why?



We have a random variable, the place at which a player
plays, and some variables that we can compute. The
distribution of the random variable, depends on this 
variables. Our variables are shape urgency and life.


Of course, I agree that it would be great to include
life in our statistical model. It would be a much better 
model. The problem is we can compute shape in few clockcycles

but we cannot compute life/death. (The revised Benson methods
are of no use in real games because they detect life/death 
when it way too late.)


All the pattern logic is not intended to determine what
move has to be played, but only to select candidates for 
later methods. Also, it is not about what a player did

once, but what is statistically sound in over 10 M positions.
We ignored life/death when learning and we ignore it
when we just suggest the best moves in terms of shape.
That is fair, but, of course, the interaction between 
shape an life would change things for better.


UCT methods are (among other reasons) strong in 9x9 
because a random move (not in the first 2 lines in the 
beginning) is a reasonable move, and then the stochastic

parts of the search finds good candidates (RAVE and similar).

But in real go, there are 361 legal moves for move 1!

And CrazyStone (as far as I know still the strongest 19x19 
program) is "still" 2 kyu. That is the state of computer go 
in 2008, not the remark made by a pro about how hard it was

to beat mogo by 9 stones.

Most people in this list seem to be against human learned 
shape, but the truth is that we don't know if it is useful
or not. That depends on the price we pay for urgency. In 
terms of RAM it is a couple of tenths Mb, that's nothing 
today, it was a lot years ago. Some people assume that you 
can predict what airplane society would be without building 
an airplane. Just move people in a bus, the airplane is the 
same only faster. That's not how things work. Unfortunately, 
you have to build an airplane to see what it can or cannot do.

Some call it overengineering, but if it is not fast, then
I agree it will be useless almost for sure.


Jacques.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Jason House

I use FPU for both values for precisely the reasons you describe.

Sent from my iPhone

On Mar 28, 2008, at 9:36 AM, "Jaonary Rabarisoa" <[EMAIL PROTECTED]>  
wrote:


So if I understand, at each node we need to play every possible  
action once at first, even many of these actions are surely non  
optimal. And this may be slow if the number of the possible action  
at this node is huge.


When you talk about FPU, does it mean that you give  a kind of  
default value for unvisited node and compare this value with (1-beta) 
*Q_uct + beta*Q_rave if we can compute it ?


Finally does it make sense to throw away all non visited node and  
only consider the node that have a rave value first. Precisely use  
the FPU but only for unvisited node that have Q_rave value.



On Fri, Mar 28, 2008 at 12:41 PM, Jason House <[EMAIL PROTECTED] 
> wrote:


On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote:

> - its rave and uct value are defined ( in this case we can
> compute the above score)
> - only the rave value is defined (in this situation the n 
(s,a)

> = 0 and the uct value is not defined)
> - neiher rave nor uct value is defined
>
> So my question is how they handle these case when they traverse the
> tree ? Because their score are not always defined for every childs  
of

> a node.


I handled this by using FPU values.  FPU = first play urgency, the  
value
to initially assign to unvisited nodes.  Other Gelley papers  
recommended

a value of 1.1

>

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Jaonary Rabarisoa
So if I understand, at each node we need to play every possible action once
at first, even many of these actions are surely non optimal. And this may be
slow if the number of the possible action at this node is huge.
When you talk about FPU, does it mean that you give  a kind of default value
for unvisited node and compare this value with (1-beta)*Q_uct + beta*Q_rave
if we can compute it ?

Finally does it make sense to throw away all non visited node and only
consider the node that have a rave value first. Precisely use the FPU but
only for unvisited node that have Q_rave value.


On Fri, Mar 28, 2008 at 12:41 PM, Jason House <[EMAIL PROTECTED]>
wrote:

>
> On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote:
>
> > - its rave and uct value are defined ( in this case we can
> > compute the above score)
> > - only the rave value is defined (in this situation the n(s,a)
> > = 0 and the uct value is not defined)
> > - neiher rave nor uct value is defined
> >
> > So my question is how they handle these case when they traverse the
> > tree ? Because their score are not always defined for every childs of
> > a node.
>
>
> I handled this by using FPU values.  FPU = first play urgency, the value
> to initially assign to unvisited nodes.  Other Gelley papers recommended
> a value of 1.1
>
> >
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] State of the art of pattern matching

2008-03-28 Thread Jacques Basaldúa

Mark Boon wrote:

Sorry, without a bit more explanation, the assembler code is very  
hard to understand. What exactly does it do?


The first source code was just an example to see what kind of code
is generated. The second is useful, if you understand asm you should 
understand it. The board has 4 modes, as far as patterns are concerned.
So the following applies for each mode and for each possible mapping 
scheme. When 40 neighbors are used, (i.e. Manhattan distance 4) there 
are 81 possible situations (all the cells of a 9x9 board are different 
as far as board limits are concerned). On bigger boards, each cell maps 
one of the 81 possible 9x9 cells. The board system cannot play on less 
than 9x9. And for each of these 4x(maximum)81 (some board modes have 
smaller distance) the generator writes 2 functions: one for creating 
the mask/hash from scratch and an other to update all masks/hashes 
in the neighborhood of a new stone.



Does it relate to a pattern? 


Yes the complete pattern is:

   +---+
   | 4 |
   +---+---+---+
   | 4 | 3 | 4 |
   +---+---+---+---+---+
   | 4 | 3 | 2 | 3 | 4 |
   +---+---+---+---+---+---+---+
   | 4 | 3 | 2 | 1 | 2 | 3 | 4 |
+---+---+---+---+---+---+---+---+---+
| 4 | 3 | 2 | 1 | · | 1 | 2 | 3 | 4 |
+---+---+---+---+---+---+---+---+---+
   | 4 | 3 | 2 | 1 | 2 | 3 | 4 |
   +---+---+---+---+---+---+---+
   | 4 | 3 | 2 | 3 | 4 |
   +---+---+---+---+---+
   | 4 | 3 | 4 |
   +---+---+---+
   | 4 |
   +---+

Justification of this shape:

1. It includes all standard nobi, iken tobi, niken tobi, kosumi, 
keima & ogeima relations (+ double iken tobi + double kosumi)
2. It detects if a pattern is at the 4-th line or the 4x4 corner 
(and below obviously). Less than that is too small.
The 4-th line is too different from the center. 
3. It is a nested structure, {dist12, dist12 + dist3} are both usable.

4. It has reasonable size for the information it contains.
5. The bit arrangement is optimized for 90 deg rotation.

Additionally, I keep urgency information for: normal, the pattern
shows up for the first time and urgency in a ko.



Did you write a paper or post explaining this in more detail?


Not yet.



Do I understand correctly that you generate this code automatically?


Yes. It would be a nightmare to write about 70K lines by hand and 
debugging would be hard as well. Although what the functions do is

simple enough to create a test that verifies 100% of the cases.


You are talking about updating 40 hashes. Does it mean your patterns  
have fixed size? 


Yes. 3 sizes: Manhattan distance 2, 3 and 4


In the 500 nanoseconds, how many patterns do you compare? 


That was updating (max) 40 hashes in the neighborhood of a newly 
placed stone. The precise number of hashes depends on the board
coordinate and the surrounding stones although that is achieved 
without a single conditional jump. (It is a very conservative 
estimation for approx 140 instructions in a jump free sequence. 
The typical case is probably more like half of that.) But, as 
mentioned, it does not include neither the board logic nor the

search that translates hash -> urgency.


How about rotations and color inversions? 


In the slowest mode the pattern is rotated using macros like this 
one (that rotates a 40 neighbor pattern)


   asm
   mov edx, eax// @jm
   mov eax, [edx + 8]  // jm.mask4
   rol eax, 8
   mov [edx + 8], eax  // jm.mask4

   mov eax, [edx + 4]  // jm.mask3
   shl eax, 6
   mov ecx, eax
   and eax, 0ffh
   rol ecx, 8
   or  al, cl
   mov [edx + 4], eax  // jm.mask3

   mov eax, [edx]  // jm.msk12
   mov ecx, eax
   shl eax, 4
   rol cl, 2
   mov al, cl
   mov ecx, eax
   shr ecx, 16
   and eax, 0ffF0FFh
   and ecx, F00h
   or  eax, ecx
   mov [edx], eax  // jm.msk12
   end

and converted to canonical. Color inversion is automatic
because the pattern is own/opponent instead of black/white.

In the fastest mode, the hash table has x8 size and includes
the hashes of the rotated patterns.


Jacques.



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Jason House

On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote:

> - its rave and uct value are defined ( in this case we can
> compute the above score)
> - only the rave value is defined (in this situation the n(s,a)
> = 0 and the uct value is not defined)
> - neiher rave nor uct value is defined
> 
> So my question is how they handle these case when they traverse the
> tree ? Because their score are not always defined for every childs of
> a node.


I handled this by using FPU values.  FPU = first play urgency, the value
to initially assign to unvisited nodes.  Other Gelley papers recommended
a value of 1.1

> 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Yet another question on uct and rave

2008-03-28 Thread Erik van der Werf
On Fri, Mar 28, 2008 at 11:20 AM, Jaonary Rabarisoa <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> After a long search on the computer go mailing list archive and reading and
> reading again the paper of Gelly and Silver (ICML 2007) I didn't find
> answers to my question.
> In this paper they introduce a way to select the next move, at a given
> state, using the rave and uct value of its childs. They do this by comparing
>
> (1-beta)*Q_uct + beta*Q_rave
>
>
> But, by the definition of the rave and uct value, for each child of a given
> node we may have the following situation :
>
>  - its rave and uct value are defined ( in this case we can compute the
> above score)
> - only the rave value is defined (in this situation the n(s,a) = 0 and the
> uct value is not defined)

Plain UCT always selects unvisited nodes first ( n=0 -> Q=infinite ).

> - neiher rave nor uct value is defined

This cannot happen if you always select unvisited nodes first (because
if you select a move it leads to an update for both Q_rave and Q_uct).

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Yet another question on uct and rave

2008-03-28 Thread Jaonary Rabarisoa
Hi all,

After a long search on the computer go mailing list archive and reading and
reading again the paper of Gelly and Silver (ICML 2007) I didn't find
answers to my question.
In this paper they introduce a way to select the next move, at a given
state, using the rave and uct value of its childs. They do this by
comparing


(1-beta)*Q_uct + beta*Q_rave


But, by the definition of the rave and uct value, for each child of a given
node we may have the following situation :

- its rave and uct value are defined ( in this case we can compute the above
score)

- only the rave value is defined (in this situation the n(s,a) = 0 and the
uct value is not defined)

- neiher rave nor uct value is defined



So my question is how they handle these case when they traverse the tree ?
Because their score are not always defined for every childs of a node.

Cheers,


Jaonary
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/