Re: [computer-go] batch size and impact on strength

2008-05-25 Thread Sylvain Gelly

 This seems to be the case and I still do not really on some level
 understand why. Since with the chinese go rules the board should be
 effectively stateless (exempting ko information) all the information be
 contained in the the current position. Why additional information is needed
 i.e. an annotation of the last played move is required on some level is a
 mystery to me.


 One can build nice examples of that for some artificial games: the
 knowledge of the last move helps *for finite computational cost*.
 Sylvain has shown interesting elements around that, but as far as I know
 this was not in his ph.D. thesis and this is not published.


If I understand correctly what it is about, I do have something in my thesis
about that.
p153: 4.4.3 Mathematical insights: strength VS accuracy, and more
precisely Theorem 4.4.1 (Accurate simulation player without memory is
strong)
It is certainly not of direct practical application though.

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] batch size and impact on strength

2008-05-25 Thread Olivier Teytaud

If I understand correctly what it is about, I do have something in my thesis
about that.
p153: 4.4.3 Mathematical insights: strength VS accuracy, and more
precisely Theorem 4.4.1 (Accurate simulation player without memory is
strong)
It is certainly not of direct practical application though.


Sorry for having missed that :-) I'm happy that you finished the writing
of this (very interesting) part.
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] batch size and impact on strength

2008-05-24 Thread Carter Cheng
I think I being was a bit unclear here. By UCT I meant the combination of 
UCT+Simulation. I am just curious why simulation based methods of evaluation 
are thus far found to be superior (are they on 19x19?) to traditional bots 
which have a different form of evaluation function for a given position. It 
would seem that simulation based methods of arriving at a scoring function for 
the current position (playing out to the end) might be quite inefficient 
especially on 19x19 where it seems to me on average the game lasts about 
400-450 moves. Heavy playouts from this viewpoint are an attempt to improve the 
evaluation function over the uniformly random light playout. 

As for UCT itself perhaps I dont still completely understand this algorithm but 
is a full episode or simulation required for it's use? or are there related 
family of algorithms which operate on trees like UCT does but grows them 
asymmetrically given some confidence bounds on the values returned by the 
heuristical positional evaluation function?

Perhaps my understanding of Mogo from the thesis is incorrect. From a certain 
standpoint it makes very limited usage of heuristics and seems to rely solely 
several published details (in the thesis):

1) UCT+simulation

2) learned pattern weights via self-play using TD(lambda). 

3) Proximity heuristics. (which is something I do not quite understand on a 
deep level as to why it improves play).

4) RAVE knowledge recycling between trees.

5) The dragon heuristic.

The first two can be viewed as online and offline learning. 3)  5) to me 
incorporate some domain knowledge and 4) perhaps some but it seems to perhaps 
work in games which have combinatorial properties and perhaps is more broadly 
applicable.

Obviously there are probably alot of unpublished details. Such as the use of a 
simple ladder search (as you indicated your program Valkyria does) and perhaps 
many other enhancements which we do not know about. But from Sylvain's 
description it seems that the amount of domain knowledge is limited and that 
the statistical learning procedures dominate is this a 
misinterpretation/misrepresentation? 


  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] batch size and impact on strength

2008-05-24 Thread Olivier Teytaud
1) does not hold in mogo anymore, because there's no upper-confidence terms 
in MoGo anymore.


I just want to add that this is also true in many codes (either no 
upper-confidence term, or with a very small term and no statistical

advantage on the null case).
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] batch size and impact on strength

2008-05-24 Thread Carter Cheng
Thanks for the reply.

 2) learned pattern weights are not learnt through
 TD(lambda). RLGO is not 
 used in mogo. It was used a long time ago. Hand-designed
 heuristics are 
 much more efficient (in particular after heavy
 cluster-based tuning of 
 coefficients).
I am not entirely sure what you mean here by tuning coefficients do the 
heuristics in question require some form of parameterization? How are these 
parameters tuned?


 
 3) holds in mogo, and in all efficient Monte-Carlo based
 techniques. This
 is particularly important, as seemingly knowing the last
 point if 
 important for a correct evaluation of a position in the
 case of bounded 
 computational resources. By the way, this might be true for
 humans, also.
 I'm afraid it's difficult to find sufficiently many
 human players and 
 situations for organizing an experiment about that :-)
 
This seems to be the case and I still do not really on some level understand 
why. Since with the chinese go rules the board should be effectively stateless 
(exempting ko information) all the information be contained in the the current 
position. Why additional information is needed i.e. an annotation of the last 
played move is required on some level is a mystery to me.   


 4) The rave heuristic only migrates
 informations from one node to nodes
 above that node - never to brothers, cousins,
 sons, and so on. Many 
 attempts have been done here and elsewhere around that
 without success.
 
 By the way, parallelizations (both multi-core and MPI) are
 *very* 
 efficient. The use of huge clusters could probably give
 much better
 results than the current limit of mogo (between 1k and 1d
 on KGS with
 64 quad-cores and myrinet switch).
This is obviously quite an impressive feat. However it is also on some level a 
bit disappointing to me that it will be sometime before we see strong desktop 
programs since the computational resources required is somewhat prohibitive. 


 
 Another point which was not in the thesis of Sylvain and
 which is also 
 central is the use of patterns in the tree. This is used
 since a long time
 in CrazyStone and Mango, we are currently developping this
 part in MoGo,
 and this is quite efficient (but involves tedious tuning of
 several 
 parameters).
 
I am sure I understand the distinction here between patterns in the tree and 
patterns used in the heavy playouts. I guess by this you mean they are not the 
same thing. 

I am in general basically curious how tuning of parameters occurs i.e. how you 
fit the parameters in a given situation if things like reinforcement learning 
are not used (which my understanding is is sort of an automated procedure for 
fine tuning various parameters in the system).   

Regards,

Carter.


  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/