Re: [computer-go] How to improve Orego
Hi, here is an explanation along with the code: First, my eyechecking is very fast: bool is_eyelike (color::t color, v::t v) { int diag_color_cnt[color::cnt]; assertc (board_ac, is_ready ()); if (((nbr_player_cnt[v] (color * 4)) 0xf) != 4) { assertc (board_ac slow_ac, slow_eyetest(color, v) != 4); return false; } assertc (board_ac slow_ac, slow_eyetest(color, v) == 4); rep (c, color::cnt) diag_color_cnt[c] = 0; diag_color_cnt[color_at[v::NW (v)]]++; diag_color_cnt[color_at[v::NE (v)]]++; diag_color_cnt[color_at[v::SW (v)]]++; diag_color_cnt[color_at[v::SE (v)]]++; if (diag_color_cnt[color::edge] 0) diag_color_cnt[color::opponent(color)]++; if (diag_color_cnt[color::opponent(color)] = 2) return false; return true; } Basicly it's all about this condition ((nbr_player_cnt[v] (color * 4)) 0xf) != 4 color is 0 (black) or 1(white) . nbr_player_cnt[v] is two 4bit counters in one int. Updating it is fast, as I add 0x01 or 0x10 depending on new neighbour. nbr_player_cnt[v] += (1 (new_nbr_color * 4)); So I do not allow for self eye filling. What I did is I was generating pass when I made full loop on my circular list (array), without finding legal move. What I do now (from yesterday), I move all rejected moves (for any of the players) to rejected array. When the main list gets empty, I just memcpy rejected onto main and try again. I generate pass if after memcpy, no legal move is found. This improved 30.1 - 33.3pps. If I allow for only one memcpy, then sometimes one-eye-only-no-other-liberties dead stones remain after the playout, but there is no to much thrashing on the end of the playout and I get 37 pps Hope this helps. Lukasz. On 12/9/06, Don Dailey [EMAIL PROTECTED] wrote: Lukasz, I have an implementation of your circular list in place, but I'm having a difficult time dealing with the single point eyes which my program does not allow as well as suicide which I don't allow either. I built a little circular list and I put captured points back into the circular list in random locations. That part is fine. My implementation does not allow suicide or moves to certain kinds of 1 point eyes. When I reach those points in the list of empty intersections, I cannot play them, but I cannot dismiss them either. They may be legal for the opponent and they may be legal later for the computer. Just for timing purposes I wanted to see what would happen if I just pushed those to the end of the list and continued until I tried all moves. This is clearly not random but it does work. Unfortunately, near the end of games I'm thrashing around in the list a lot and it's actually a slowdown. There are too many unplayable moves that I have to skip over but I cannot dismiss them. These moves dominate the list near the end of the game. The problem is more with the eyes than with suicide. You solved the suicide problem by making it legal, but I don't know how you solved the eye problem. I'm not sure what you are doing, but it sounds like you basically allow ANY move to an empty intersection except a single stone suicide, in which case you dismiss this intersection from the empty list forever. This would probably cause your loop to terminate before going to far even without an eye rule. But is this understanding correct? Can you clarify? - Don On Thu, 2006-12-07 at 16:05 +0100, Łukasz Lew wrote: I really do randomize a whole vector of empty intersections before the playout. When I get new empty intersection I just pick random place in vector, move it to the end, and use the place for new intersection. Here is the code: void add_rnd (v::t v) { int ii; empty_v_cnt++; ii = pm::rand () % empty_v_cnt; // TODO improve speed % tab[empty_v_cnt-1] = tab[ii]; tab[ii] = v; assertc (mc_legal_ac, empty_v_cnt = max_size*max_size); } I play the playout until there are no more legal moves. I allow ko recapture and large suicides. I have problem with signle stone suicides. So when one happens (I can only detect it by actually playing the stone), I just take next empty intersection. Circular reduces frequency of such wasteful events. It means that I check next empty intersection in the vector starting from the place I finished last time. I hope this is clear now. If not, just ask :) Lukasz Lew On 12/6/06, Don Dailey [EMAIL PROTECTED] wrote: On Mon, 2006-12-04 at 18:32 +0100, Łukasz Lew wrote: I deal with eyes by randomizing list of empty intersections before the playout, and while searching non-eye I go through them in circular manner. What do you mean by circular and what does this have to do with eyes? I'm looking to speed my random player up a little bit more. Here is what I've been doing for the last year or two: I collect all the empty intersections into an array and I randomize them as I go. When a capture happens, it screws up your list - you
Re: [computer-go] How to improve Orego
I really do randomize a whole vector of empty intersections before the playout. When I get new empty intersection I just pick random place in vector, move it to the end, and use the place for new intersection. Here is the code: void add_rnd (v::t v) { int ii; empty_v_cnt++; ii = pm::rand () % empty_v_cnt; // TODO improve speed % tab[empty_v_cnt-1] = tab[ii]; tab[ii] = v; assertc (mc_legal_ac, empty_v_cnt = max_size*max_size); } I play the playout until there are no more legal moves. I allow ko recapture and large suicides. I have problem with signle stone suicides. So when one happens (I can only detect it by actually playing the stone), I just take next empty intersection. Circular reduces frequency of such wasteful events. It means that I check next empty intersection in the vector starting from the place I finished last time. I hope this is clear now. If not, just ask :) Lukasz Lew On 12/6/06, Don Dailey [EMAIL PROTECTED] wrote: On Mon, 2006-12-04 at 18:32 +0100, Łukasz Lew wrote: I deal with eyes by randomizing list of empty intersections before the playout, and while searching non-eye I go through them in circular manner. What do you mean by circular and what does this have to do with eyes? I'm looking to speed my random player up a little bit more. Here is what I've been doing for the last year or two: I collect all the empty intersections into an array and I randomize them as I go. When a capture happens, it screws up your list - you must either add those intersections back to the pool of empty intersections or just reinitialize the list. Right now I'm just reinitializing the list from scratch again - probably not the fastest way. I could of course randomize the list of empty intersections with a Fisher-Yates shuffle FIRST and then traverse the list in order till there is a capture but I think this is more work on the average because if you scramble a big list and the next move is a capture, you have to rework the list anyway and randomize again.I guess you might call what I do a lazy shuffle - you don't do the work unless you need to. (However, if I knew I would always need the whole list, it probably produces a little bit faster code to scramble them all at once in a tight little shuffle loop.) The actually procedure for incremental randomization is that I pick randomly from the empty list and then fix the list back up by swapping this out of the way with the first element in the list (which gets moved out of the picture because I bump up the list start pointer.) - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
Yes, it's clear how you do it. I suspect this is better than what I do. My list of intersections include ALL the intersections not just the empty ones and use a fixed or static array. But this is all easy to change. I will give this a try at some point and let you know if it's faster. - Don On Thu, 2006-12-07 at 16:05 +0100, Łukasz Lew wrote: I really do randomize a whole vector of empty intersections before the playout. When I get new empty intersection I just pick random place in vector, move it to the end, and use the place for new intersection. Here is the code: void add_rnd (v::t v) { int ii; empty_v_cnt++; ii = pm::rand () % empty_v_cnt; // TODO improve speed % tab[empty_v_cnt-1] = tab[ii]; tab[ii] = v; assertc (mc_legal_ac, empty_v_cnt = max_size*max_size); } I play the playout until there are no more legal moves. I allow ko recapture and large suicides. I have problem with signle stone suicides. So when one happens (I can only detect it by actually playing the stone), I just take next empty intersection. Circular reduces frequency of such wasteful events. It means that I check next empty intersection in the vector starting from the place I finished last time. I hope this is clear now. If not, just ask :) Lukasz Lew On 12/6/06, Don Dailey [EMAIL PROTECTED] wrote: On Mon, 2006-12-04 at 18:32 +0100, Łukasz Lew wrote: I deal with eyes by randomizing list of empty intersections before the playout, and while searching non-eye I go through them in circular manner. What do you mean by circular and what does this have to do with eyes? I'm looking to speed my random player up a little bit more. Here is what I've been doing for the last year or two: I collect all the empty intersections into an array and I randomize them as I go. When a capture happens, it screws up your list - you must either add those intersections back to the pool of empty intersections or just reinitialize the list. Right now I'm just reinitializing the list from scratch again - probably not the fastest way. I could of course randomize the list of empty intersections with a Fisher-Yates shuffle FIRST and then traverse the list in order till there is a capture but I think this is more work on the average because if you scramble a big list and the next move is a capture, you have to rework the list anyway and randomize again.I guess you might call what I do a lazy shuffle - you don't do the work unless you need to. (However, if I knew I would always need the whole list, it probably produces a little bit faster code to scramble them all at once in a tight little shuffle loop.) The actually procedure for incremental randomization is that I pick randomly from the empty list and then fix the list back up by swapping this out of the way with the first element in the list (which gets moved out of the picture because I bump up the list start pointer.) - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
If this randint routine is critical, you can save some calls to rand() when you know that n is always below some value (see my previous post about bitmap go). For instance if n 128 (probably true for 9x9 go), you could try: while (true) { r = rand(); if ((r v) n) return (r v); r = 7; if ((r v) n) return (r v); r = 7; ... // 2 more times } Also it could be better to read the mask (v) from a table (provided the upper limit of n is small enough). At least on a P4, it will avoid a long data dependency chain and might give a (probably small) improvment. It might even be benefical on other processors. Antoine On Thu, 2006-12-07 at 16:05 +0100, Łukasz Lew wrote: ii = pm::rand () % empty_v_cnt; // TODO improve speed % Try this, I think it could be faster, not sure, but has the advantage that it's slightly more correct. // returns an integer between 0 and n-1 inclusive // unsigned long randint(unsigned long n) { unsigned long v = n; unsigned long r; v--; v |= v 1; v |= v 2; v |= v 4; v |= v 8; v |= v 16; do { r = rand(); } while ( (r v) = n ); return( r v ); } - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
I'm pretty sure the time of this function is dominated by the call to rand(), but it never occurred to do a table lookup for the mask, interesting idea. - Don On Thu, 2006-12-07 at 22:36 +0100, Antoine de Maricourt wrote: If this randint routine is critical, you can save some calls to rand() when you know that n is always below some value (see my previous post about bitmap go). For instance if n 128 (probably true for 9x9 go), you could try: while (true) { r = rand(); if ((r v) n) return (r v); r = 7; if ((r v) n) return (r v); r = 7; ... // 2 more times } Also it could be better to read the mask (v) from a table (provided the upper limit of n is small enough). At least on a P4, it will avoid a long data dependency chain and might give a (probably small) improvment. It might even be benefical on other processors. Antoine On Thu, 2006-12-07 at 16:05 +0100, Łukasz Lew wrote: ii = pm::rand () % empty_v_cnt; // TODO improve speed % Try this, I think it could be faster, not sure, but has the advantage that it's slightly more correct. // returns an integer between 0 and n-1 inclusive // unsigned long randint(unsigned long n) { unsigned long v = n; unsigned long r; v--; v |= v 1; v |= v 2; v |= v 4; v |= v 8; v |= v 16; do { r = rand(); } while ( (r v) = n ); return( r v ); } - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
By the way, I'm amazed that the code for playing random games is fast enough that getting random numbers is actually a bottleneck and it's worthy of a discussion on optimization. One of the fastest chess programs a few years ago in terms of nodes per second was Fritz. It heavily used a common technique of accessing piece square tables to get an evaluation instantly - basically keeping a score incrementally updated in a table driven way. This was partly why it was so incredibly fast. But the programmer would joke about those expensive memory references being a bottleneck probably because memory tables are such a well known technique for speeding up expensive operations. - Don On Fri, 2006-12-08 at 00:39 +0100, Antoine de Maricourt wrote: Last time I checked MT was under 20cc per call (I assume it's inlined). On a P4, the shift operator is supposed to have a 4 cc latency. The dependency chain to get the mask yields to 25 cc latency before the mask can be used (4 per shift + 1 for OR) * 5. So it's quite possible that this sequence dominates the call to rand(). Just give a try... unless you use another processor with a better latency for shift operations. But even with those processors it might be benefic to replace 15 instructions by only one. Of course if you make a specific implementation for small n (below 256), you can remove the 2 last shift as well. Antoine I'm pretty sure the time of this function is dominated by the call to rand(), but it never occurred to do a table lookup for the mask, interesting idea. - Don On Thu, 2006-12-07 at 22:36 +0100, Antoine de Maricourt wrote: If this randint routine is critical, you can save some calls to rand() when you know that n is always below some value (see my previous post about bitmap go). For instance if n 128 (probably true for 9x9 go), you could try: while (true) { r = rand(); if ((r v) n) return (r v); r = 7; if ((r v) n) return (r v); r = 7; ... // 2 more times } Also it could be better to read the mask (v) from a table (provided the upper limit of n is small enough). At least on a P4, it will avoid a long data dependency chain and might give a (probably small) improvment. It might even be benefical on other processors. Antoine On Thu, 2006-12-07 at 16:05 +0100, Łukasz Lew wrote: ii = pm::rand () % empty_v_cnt; // TODO improve speed % Try this, I think it could be faster, not sure, but has the advantage that it's slightly more correct. // returns an integer between 0 and n-1 inclusive // unsigned long randint(unsigned long n) { unsigned long v = n; unsigned long r; v--; v |= v 1; v |= v 2; v |= v 4; v |= v 8; v |= v 16; do { r = rand(); } while ( (r v) = n ); return( r v ); } - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
So it's quite possible that this sequence dominates the call to rand(). on another note, if the only reason that you need random numbers is to choose a number from a list (82, or 362), and the depth is being constrained to something reasonable, then what you need is not a super-duper random number generator, but just one that is good enough for the purposes at hand. and there are much, much faster ways to get at those beasties if you don't need for all n-length subsets of generated numbers to be uniformally distributed as vectors in n-dim space, for instance. s. Cheap talk? Check out Yahoo! Messenger's low PC-to-Phone call rates. http://voice.yahoo.com ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
In the Orego paper the problem of selecting a move when the board is filled with stones is mentioned. Orego uses a sort of double-hashing scheme. But isn't it trivial to keep a list of empty intersections? Before the MC-run is started, one builds up this list. If a stone is placed now on the board, the entry is removed from the list and the last entry is copied into this slot. In this way the list is always dense. One can of course not run linearly trough the list to find the entry which should be removed. Instead one builds at the beginning another array which is indexed by the board-point and which contains the index of the point in the empy-point-list. This second array has to be changed too when the last entry is copied into the removed slot. With a few operations one gets the big advantage to sample all the time only the empty points. I think this solution is much simpler and more efficient than the double-hashing scheme of Orego. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] How to improve Orego
It's not clear whether this is faster. Determining that a move is illegal because the point is occupied is very fast; determining that a move IS legal (i.e., not a suicide or ko violation) is the slow part, and I avoid that by taking the first legal move I find. (Of course, once all moves have been tried from a given move, UCT takes over.) In any case, my profiling data indicates that choosing the random move per se is not the expensive part; playing the move is. Thanks for the suggestion, Peter Drake Assistant Professor of Computer Science Lewis Clark College http://www.lclark.edu/~drake/ On Dec 4, 2006, at 7:52 AM, Chrilly wrote: In the Orego paper the problem of selecting a move when the board is filled with stones is mentioned. Orego uses a sort of double- hashing scheme. But isn't it trivial to keep a list of empty intersections? Before the MC-run is started, one builds up this list. If a stone is placed now on the board, the entry is removed from the list and the last entry is copied into this slot. In this way the list is always dense. One can of course not run linearly trough the list to find the entry which should be removed. Instead one builds at the beginning another array which is indexed by the board-point and which contains the index of the point in the empy-point-list. This second array has to be changed too when the last entry is copied into the removed slot. With a few operations one gets the big advantage to sample all the time only the empty points. I think this solution is much simpler and more efficient than the double-hashing scheme of Orego. Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/