Re: [computer-go] batch size and impact on strength
This seems to be the case and I still do not really on some level understand why. Since with the chinese go rules the board should be effectively stateless (exempting ko information) all the information be contained in the the current position. Why additional information is needed i.e. an annotation of the last played move is required on some level is a mystery to me. One can build nice examples of that for some artificial games: the knowledge of the last move helps *for finite computational cost*. Sylvain has shown interesting elements around that, but as far as I know this was not in his ph.D. thesis and this is not published. If I understand correctly what it is about, I do have something in my thesis about that. p153: 4.4.3 Mathematical insights: strength VS accuracy, and more precisely Theorem 4.4.1 (Accurate simulation player without memory is strong) It is certainly not of direct practical application though. Sylvain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] batch size and impact on strength
If I understand correctly what it is about, I do have something in my thesis about that. p153: 4.4.3 Mathematical insights: strength VS accuracy, and more precisely Theorem 4.4.1 (Accurate simulation player without memory is strong) It is certainly not of direct practical application though. Sorry for having missed that :-) I'm happy that you finished the writing of this (very interesting) part. Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] batch size and impact on strength
I think I being was a bit unclear here. By UCT I meant the combination of UCT+Simulation. I am just curious why simulation based methods of evaluation are thus far found to be superior (are they on 19x19?) to traditional bots which have a different form of evaluation function for a given position. It would seem that simulation based methods of arriving at a scoring function for the current position (playing out to the end) might be quite inefficient especially on 19x19 where it seems to me on average the game lasts about 400-450 moves. Heavy playouts from this viewpoint are an attempt to improve the evaluation function over the uniformly random light playout. As for UCT itself perhaps I dont still completely understand this algorithm but is a full episode or simulation required for it's use? or are there related family of algorithms which operate on trees like UCT does but grows them asymmetrically given some confidence bounds on the values returned by the heuristical positional evaluation function? Perhaps my understanding of Mogo from the thesis is incorrect. From a certain standpoint it makes very limited usage of heuristics and seems to rely solely several published details (in the thesis): 1) UCT+simulation 2) learned pattern weights via self-play using TD(lambda). 3) Proximity heuristics. (which is something I do not quite understand on a deep level as to why it improves play). 4) RAVE knowledge recycling between trees. 5) The dragon heuristic. The first two can be viewed as online and offline learning. 3) 5) to me incorporate some domain knowledge and 4) perhaps some but it seems to perhaps work in games which have combinatorial properties and perhaps is more broadly applicable. Obviously there are probably alot of unpublished details. Such as the use of a simple ladder search (as you indicated your program Valkyria does) and perhaps many other enhancements which we do not know about. But from Sylvain's description it seems that the amount of domain knowledge is limited and that the statistical learning procedures dominate is this a misinterpretation/misrepresentation? ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] batch size and impact on strength
1) does not hold in mogo anymore, because there's no upper-confidence terms in MoGo anymore. I just want to add that this is also true in many codes (either no upper-confidence term, or with a very small term and no statistical advantage on the null case). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] batch size and impact on strength
Thanks for the reply. 2) learned pattern weights are not learnt through TD(lambda). RLGO is not used in mogo. It was used a long time ago. Hand-designed heuristics are much more efficient (in particular after heavy cluster-based tuning of coefficients). I am not entirely sure what you mean here by tuning coefficients do the heuristics in question require some form of parameterization? How are these parameters tuned? 3) holds in mogo, and in all efficient Monte-Carlo based techniques. This is particularly important, as seemingly knowing the last point if important for a correct evaluation of a position in the case of bounded computational resources. By the way, this might be true for humans, also. I'm afraid it's difficult to find sufficiently many human players and situations for organizing an experiment about that :-) This seems to be the case and I still do not really on some level understand why. Since with the chinese go rules the board should be effectively stateless (exempting ko information) all the information be contained in the the current position. Why additional information is needed i.e. an annotation of the last played move is required on some level is a mystery to me. 4) The rave heuristic only migrates informations from one node to nodes above that node - never to brothers, cousins, sons, and so on. Many attempts have been done here and elsewhere around that without success. By the way, parallelizations (both multi-core and MPI) are *very* efficient. The use of huge clusters could probably give much better results than the current limit of mogo (between 1k and 1d on KGS with 64 quad-cores and myrinet switch). This is obviously quite an impressive feat. However it is also on some level a bit disappointing to me that it will be sometime before we see strong desktop programs since the computational resources required is somewhat prohibitive. Another point which was not in the thesis of Sylvain and which is also central is the use of patterns in the tree. This is used since a long time in CrazyStone and Mango, we are currently developping this part in MoGo, and this is quite efficient (but involves tedious tuning of several parameters). I am sure I understand the distinction here between patterns in the tree and patterns used in the heavy playouts. I guess by this you mean they are not the same thing. I am in general basically curious how tuning of parameters occurs i.e. how you fit the parameters in a given situation if things like reinforcement learning are not used (which my understanding is is sort of an automated procedure for fine tuning various parameters in the system). Regards, Carter. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/