RE: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread David Fotland
Getting similar speed to Libego is very impressive.  It’s important to start 
with a fast basic engine, since it will only get much slower as you add more 
knowledge.  Currently I only get about 19k playouts per second on a 2 core 
Core2Duo 2.3 GHz, on 9 line empty board.  I started at about 55k playouts per 
second on one core (for a UCT search).

 

The original one only beat gnugo 17% of the time, and the current one beats 
gnugo 93%, both with 5000 playouts, so in this case at least much slower is 
much stronger (testing on 9x9, against gnugo 3.7.10,

gnugo --mode gtp --chinese-rules --capture-all-dead --level 10 
--positional-superko).

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk
Sent: Monday, August 24, 2009 9:23 AM
To: computer-go
Subject: Re: [computer-go] Bitmap Go revisited and mockup implementation

 

David,

 

Confession: I have not tested 19x19. As you have noted, and others before you 
over the years, a 19x19 board does not fit in one but three 128-bit registers 
and there would be a rather big penalty as a result, perhaps (likely?) wiping 
out all of the benefits of bitmaps. Antoine voiced his disappointment about the 
speed advantage even on 9x9 in the e-mail to the list that served as my basis. 
My hope, as Hideko Kato points out, is in longer registers or perhaps being 
able to port this to a GPU. A 64-bit OS with twice the number registers would 
also surely help out. Nevertheless, I was surprised that I was able to get to 
almost the same level of speed as Libego provides.

 

The goals for this mockup were very humble: (1) to provide a basic 
implementation to see what it can do (and if I could do it), and (2) learn how 
to work with assembly and/or intrinsic functions (I had never done that 
before). It may not be too hard to try 19x19 and just see what happens. I may 
attempt this as a follow-up.

 

René van de Veerdonk

 

2009/8/23 David Fotland 

How much would you lose for 19x19 board?  A board representation is not very 
interesting unless it scales to 19 line boards.

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk
Sent: Sunday, August 23, 2009 1:11 PM
To: computer-go@computer-go.org
Subject: [computer-go] Bitmap Go revisited and mockup implementation

 

Hi all,

 

After years of reading this very interesting list, I decided to make my first 
contribution this week after reading once again about bitmap go. There is no 
freely available implementation of a bitmap based engine that I know of, so 
here is a start. Note that I am not a professional programmer, self-taught, and 
this is a part-time hobby, so do not expect perfect form.

 

The attachment is an attempt to create an efficient implementation of a bitmap 
based light playout engine, following mostly the recommendations as spelled out 
by Antoine de Maricourt in his November 2006 message to this list. The 
application is single threaded and achieves around 75-80 kpps on my Centrino 
Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++ using the Microsoft 
Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly suffers from 
portability issues as a result. The 128-bit SSE instructions are called through 
intrinsic functions, hopefully at least these interfaces are the same for other 
compilers. The single goal of this mockup was to do some benchmarking, so there 
is no gtp-interface, but that would be rather easy to add for those skilled in 
the art. The standard rules are used with one exception, the Brown eye-rule is 
used. Multi-stone suicide is explicitly forbidden. No superko checks are 
performed during playouts, and there is no mercy-rule either.

 

I would be particularly interested to know if a 64-bit OS will improve the 
performance significantly, as it does have a lot less register pressure. Moving 
to a 64-bit OS will also make some instructions available that allow for 
further simplifications/efficiency improvements in the code. As will enabling 
SSE3 or SSE4.1 instructions.

 

One note, the random number generator used is from Agner Fog's public library 
and was not included in the attachment. You will have to provide a random 
number generator yourself or download the appropriate version from his 
well-known website (www.agner.org/optimize)

 

I hope this is the start of a discussion about further improvements that can be 
made, but you may use this code as you wish with no restrictions or guarantees. 
See the readme.txt file included in the attachment for more details (also 
partially included below).

 

Comments and feedback welcome,

 

René van de Veerdonk

 

Excerpts from the readme.txt file:

 

Introduction:

=

 

Mockup of a high performance implementation of the game of Go using bitmaps

as the principle datastructures. This implementation is limited to measuring

the performance of the

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Michael Williams
For many people on this list, Computer Go is a hobby and that means that it is important to do whatever you find interesting and motivating, event if it may not 
be the best or most promising direction in order to become the next world champion program.  There is room for pushing limits in all directions, IMO.



Brian Sheppard wrote:
Maven was fast enough, but it was clearly the slowest 
of all major Scrabble programs. Some programs were 5x 
faster than Maven. Maven was slow because it was a 
lot more than a move generator: it played the full 
game with competent positional scoring. Scoring 
requires much more time than move generation. So why 
bother optimizing move generation?


I am finding that the same lesson applies to Go. But 
more so because Scrabble is largely first-order 
whereas Go is recursive. Pebbles averages ~10K trials 
per turn. Aya is clearly stronger at 10K trials than 
Pebbles. Fuego is stronger than Aya. Valkyria is 
stronger than Fuego, and my impression is that Zen 
trumps us all. Zen could be better than Pebbles + 400 
rating points at 10K trials.


Now look at scaling. When Fuego runs a machine that 
is ~20x as fast as mine, then Fuego is comparable to 
Zen. Valkyria plays in 19x19 tournaments running on a 
Pentium M, and scores 50% against much more powerful 
computers. My conclusion is that the analytic skills 
of Zen and Valkyria are worth a factor of 20, and I 
expect the advantage will increase with computer 
speeds.


The crucial issue is to support intelligence. Check 
out David’s description of MFoG, and Remi’s papers 
about CrazyStone. I am sure those programs pay 
attention to speed, but the real issue is whether the 
engine can do the necessary analysis. Magnus’s posts 
to this list lament that Valkyria is so slow, but 
Magnus is clearly excited that his engine supports 
great analysis.


So de-emphasize speed. Can you build a great pattern 
matcher using bit boards? Can you solve life & death 
positions better? Can you read ladders faster or more 
accurately? Do bit boards make the program easier to 
program and debug?


Pebbles started with a bit board move generator, but 
I have been adding other knowledge entities, like 
liberty counts, string iterators, and eyespace 
detectors. I am convinced that emphasizing domain 
knowledge structures will pay off.  (That being said, 
I will borrow your intrinsic functions for my 9x9 
template specialization. ☺ Every little bit helps.)


Best,
Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Brian Sheppard
Maven was fast enough, but it was clearly the slowest 
of all major Scrabble programs. Some programs were 5x 
faster than Maven. Maven was slow because it was a 
lot more than a move generator: it played the full 
game with competent positional scoring. Scoring 
requires much more time than move generation. So why 
bother optimizing move generation?

I am finding that the same lesson applies to Go. But 
more so because Scrabble is largely first-order 
whereas Go is recursive. Pebbles averages ~10K trials 
per turn. Aya is clearly stronger at 10K trials than 
Pebbles. Fuego is stronger than Aya. Valkyria is 
stronger than Fuego, and my impression is that Zen 
trumps us all. Zen could be better than Pebbles + 400 
rating points at 10K trials.

Now look at scaling. When Fuego runs a machine that 
is ~20x as fast as mine, then Fuego is comparable to 
Zen. Valkyria plays in 19x19 tournaments running on a 
Pentium M, and scores 50% against much more powerful 
computers. My conclusion is that the analytic skills 
of Zen and Valkyria are worth a factor of 20, and I 
expect the advantage will increase with computer 
speeds.

The crucial issue is to support intelligence. Check 
out David’s description of MFoG, and Remi’s papers 
about CrazyStone. I am sure those programs pay 
attention to speed, but the real issue is whether the 
engine can do the necessary analysis. Magnus’s posts 
to this list lament that Valkyria is so slow, but 
Magnus is clearly excited that his engine supports 
great analysis.

So de-emphasize speed. Can you build a great pattern 
matcher using bit boards? Can you solve life & death 
positions better? Can you read ladders faster or more 
accurately? Do bit boards make the program easier to 
program and debug?

Pebbles started with a bit board move generator, but 
I have been adding other knowledge entities, like 
liberty counts, string iterators, and eyespace 
detectors. I am convinced that emphasizing domain 
knowledge structures will pay off.  (That being said, 
I will borrow your intrinsic functions for my 9x9 
template specialization. ☺ Every little bit helps.)

Best,
Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread René van de Veerdonk
Oops,

Speed drops more that the (per move) estimate below due to the increase in
length of the game as well, of course.

René

On Mon, Aug 24, 2009 at 5:36 PM, René van de Veerdonk <
rene.vandeveerd...@gmail.com> wrote:

> Lukasz,
> I tested Libego version 0.125 on my labtop, compiled using Visual Studio
> 2008 under Windows XP with slight modifications:
>
> #ifdef NDEBUG section commented out in "testing.h"
> #include  added to "fast_random.cpp"
>
> From the command-line with "engine -b" I observed 67-71 kpps on the same
> labtop as I used to benchmark the 9x9 bitmap mockup achieving 75-80 kpps.
>
> It seems Libego performs much better (going from 70 to 105 kpps is an
> impressive 50% improvement) with other compilers and OS. For sure it was not
> developed for MS Visual Studio, so my numbers are not an apples to apples
> comparison. But I think it is fair to say that both implementations are at
> least in the same ball-park when compiled as is under Visual Studio in
> Windows XP. I do not have an explanation for why my results are 50% below
> the numbers you quoted other than the difference in compiler (inlining?) and
> OS (32 vs 64-bit?). It would be really interesting to have someone else do a
> comparison. Given that I was natively working and optimizing in Visual
> Studio, I am not expecting a 50% improvement for the bitmap mockup (but
> hopefully it will not decrease 50% either).
>
> As for testing on a 19x19 board, this is not as straightforward for me as
> modifying a single parameter in the program. The basic data-type needs to be
> changed from a single 128-bit variable to an array of three such elements.
> This requires a redefinition of most basic operations, in particular the
> shift operations will take some work as these go across array-elements. In
> the generated assembly I could already see that there are not enough 128-bit
> registers for 9x9 in some functions, and this will only get worse with
> arrays. Dereferencing array elements potentially also add overhead. I am
> going to give it a try, but do not expect results overnight, as this is a
> low intensity hobby for me (note that Antoine's e-mail was in November
> 2006). Optimistically, a 3x speed reduction may be expected, but the
> additional overhead and and switch-logic could easily push the drop to over
> 10x unless branching can be minimized.
>
> If someone could look over the mockup and suggest ways to reduce the amount
> of branching there I would really appreciate it. If someone would eventually
> try to port something like this to a GPU, that would also benefit
> tremendously from a reduction in branching. I find that I need to be very
> conscious about branching during coding, as I have observed a tendency to
> put ever more "if", "do" and "while" statements in my code.
>
> Also, does anybody know if gcc allows the same interface to the
> SSE-instructions as I have adopted in bitmap.cpp?
>
> Thanks,
>
> René
>
> On Mon, Aug 24, 2009 at 4:18 PM, Łukasz Lew  wrote:
>
>> 2009/8/24 René van de Veerdonk :
>> > David,
>> > Confession: I have not tested 19x19. As you have noted, and others
>> before
>> > you over the years, a 19x19 board does not fit in one but three 128-bit
>> > registers and there would be a rather big penalty as a result, perhaps
>> > (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
>> > disappointment about the speed advantage even on 9x9 in the e-mail to
>> the
>> > list that served as my basis. My hope, as Hideko Kato points out, is in
>> > longer registers or perhaps being able to port this to a GPU. A 64-bit
>> OS
>> > with twice the number registers would also surely help out.
>> Nevertheless, I
>> > was surprised that I was able to get to almost the same level of speed
>> as
>> > Libego provides.
>>
>> Libego provides 105-108 kpps on a single Core2 2.4 GHz on 9x9.
>> and 22-23 kpps on 19x19
>>
>> Can you test your bitmap implementation on 19x19?
>>
>> Lukasz
>>
>> > The goals for this mockup were very humble: (1) to provide a basic
>> > implementation to see what it can do (and if I could do it), and (2)
>> learn
>> > how to work with assembly and/or intrinsic functions (I had never done
>> that
>> > before). It may not be too hard to try 19x19 and just see what happens.
>> I
>> > may attempt this as a follow-up.
>> > René van de Veerdonk
>> > 2009/8/23 David Fotland 
>> >>
>> >> How much would you lose for 19x19 board?  A board representation is not
>> >> very interesting unless it scales to 19 line boards.
>> >>
>> >>
>> >>
>> >> David
>> >>
>> >>
>> >>
>> >> From: computer-go-boun...@computer-go.org
>> >> [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
>> >> Veerdonk
>> >> Sent: Sunday, August 23, 2009 1:11 PM
>> >> To: computer-go@computer-go.org
>> >> Subject: [computer-go] Bitmap Go revisited and mockup implementation
>> >>
>> >>
>> >>
>> >> Hi all,
>> >>
>> >>
>> >>
>> >> After years of reading this very interesting list, I decided to make my
>> >> fir

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread René van de Veerdonk
Lukasz,
I tested Libego version 0.125 on my labtop, compiled using Visual Studio
2008 under Windows XP with slight modifications:

#ifdef NDEBUG section commented out in "testing.h"
#include  added to "fast_random.cpp"

>From the command-line with "engine -b" I observed 67-71 kpps on the same
labtop as I used to benchmark the 9x9 bitmap mockup achieving 75-80 kpps.

It seems Libego performs much better (going from 70 to 105 kpps is an
impressive 50% improvement) with other compilers and OS. For sure it was not
developed for MS Visual Studio, so my numbers are not an apples to apples
comparison. But I think it is fair to say that both implementations are at
least in the same ball-park when compiled as is under Visual Studio in
Windows XP. I do not have an explanation for why my results are 50% below
the numbers you quoted other than the difference in compiler (inlining?) and
OS (32 vs 64-bit?). It would be really interesting to have someone else do a
comparison. Given that I was natively working and optimizing in Visual
Studio, I am not expecting a 50% improvement for the bitmap mockup (but
hopefully it will not decrease 50% either).

As for testing on a 19x19 board, this is not as straightforward for me as
modifying a single parameter in the program. The basic data-type needs to be
changed from a single 128-bit variable to an array of three such elements.
This requires a redefinition of most basic operations, in particular the
shift operations will take some work as these go across array-elements. In
the generated assembly I could already see that there are not enough 128-bit
registers for 9x9 in some functions, and this will only get worse with
arrays. Dereferencing array elements potentially also add overhead. I am
going to give it a try, but do not expect results overnight, as this is a
low intensity hobby for me (note that Antoine's e-mail was in November
2006). Optimistically, a 3x speed reduction may be expected, but the
additional overhead and and switch-logic could easily push the drop to over
10x unless branching can be minimized.

If someone could look over the mockup and suggest ways to reduce the amount
of branching there I would really appreciate it. If someone would eventually
try to port something like this to a GPU, that would also benefit
tremendously from a reduction in branching. I find that I need to be very
conscious about branching during coding, as I have observed a tendency to
put ever more "if", "do" and "while" statements in my code.

Also, does anybody know if gcc allows the same interface to the
SSE-instructions as I have adopted in bitmap.cpp?

Thanks,

René

On Mon, Aug 24, 2009 at 4:18 PM, Łukasz Lew  wrote:

> 2009/8/24 René van de Veerdonk :
> > David,
> > Confession: I have not tested 19x19. As you have noted, and others before
> > you over the years, a 19x19 board does not fit in one but three 128-bit
> > registers and there would be a rather big penalty as a result, perhaps
> > (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
> > disappointment about the speed advantage even on 9x9 in the e-mail to the
> > list that served as my basis. My hope, as Hideko Kato points out, is in
> > longer registers or perhaps being able to port this to a GPU. A 64-bit OS
> > with twice the number registers would also surely help out. Nevertheless,
> I
> > was surprised that I was able to get to almost the same level of speed as
> > Libego provides.
>
> Libego provides 105-108 kpps on a single Core2 2.4 GHz on 9x9.
> and 22-23 kpps on 19x19
>
> Can you test your bitmap implementation on 19x19?
>
> Lukasz
>
> > The goals for this mockup were very humble: (1) to provide a basic
> > implementation to see what it can do (and if I could do it), and (2)
> learn
> > how to work with assembly and/or intrinsic functions (I had never done
> that
> > before). It may not be too hard to try 19x19 and just see what happens. I
> > may attempt this as a follow-up.
> > René van de Veerdonk
> > 2009/8/23 David Fotland 
> >>
> >> How much would you lose for 19x19 board?  A board representation is not
> >> very interesting unless it scales to 19 line boards.
> >>
> >>
> >>
> >> David
> >>
> >>
> >>
> >> From: computer-go-boun...@computer-go.org
> >> [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
> >> Veerdonk
> >> Sent: Sunday, August 23, 2009 1:11 PM
> >> To: computer-go@computer-go.org
> >> Subject: [computer-go] Bitmap Go revisited and mockup implementation
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> After years of reading this very interesting list, I decided to make my
> >> first contribution this week after reading once again about bitmap go.
> There
> >> is no freely available implementation of a bitmap based engine that I
> know
> >> of, so here is a start. Note that I am not a professional programmer,
> >> self-taught, and this is a part-time hobby, so do not expect perfect
> form.
> >>
> >>
> >>
> >> The attachment is an attempt to create an efficien

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Łukasz Lew
2009/8/24 René van de Veerdonk :
> David,
> Confession: I have not tested 19x19. As you have noted, and others before
> you over the years, a 19x19 board does not fit in one but three 128-bit
> registers and there would be a rather big penalty as a result, perhaps
> (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
> disappointment about the speed advantage even on 9x9 in the e-mail to the
> list that served as my basis. My hope, as Hideko Kato points out, is in
> longer registers or perhaps being able to port this to a GPU. A 64-bit OS
> with twice the number registers would also surely help out. Nevertheless, I
> was surprised that I was able to get to almost the same level of speed as
> Libego provides.

Libego provides 105-108 kpps on a single Core2 2.4 GHz on 9x9.
and 22-23 kpps on 19x19

Can you test your bitmap implementation on 19x19?

Lukasz

> The goals for this mockup were very humble: (1) to provide a basic
> implementation to see what it can do (and if I could do it), and (2) learn
> how to work with assembly and/or intrinsic functions (I had never done that
> before). It may not be too hard to try 19x19 and just see what happens. I
> may attempt this as a follow-up.
> René van de Veerdonk
> 2009/8/23 David Fotland 
>>
>> How much would you lose for 19x19 board?  A board representation is not
>> very interesting unless it scales to 19 line boards.
>>
>>
>>
>> David
>>
>>
>>
>> From: computer-go-boun...@computer-go.org
>> [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
>> Veerdonk
>> Sent: Sunday, August 23, 2009 1:11 PM
>> To: computer-go@computer-go.org
>> Subject: [computer-go] Bitmap Go revisited and mockup implementation
>>
>>
>>
>> Hi all,
>>
>>
>>
>> After years of reading this very interesting list, I decided to make my
>> first contribution this week after reading once again about bitmap go. There
>> is no freely available implementation of a bitmap based engine that I know
>> of, so here is a start. Note that I am not a professional programmer,
>> self-taught, and this is a part-time hobby, so do not expect perfect form.
>>
>>
>>
>> The attachment is an attempt to create an efficient implementation of a
>> bitmap based light playout engine, following mostly the recommendations as
>> spelled out by Antoine de Maricourt in his November 2006 message to this
>> list. The application is single threaded and achieves around 75-80 kpps on
>> my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
>> using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
>> undoubtedly suffers from portability issues as a result. The 128-bit SSE
>> instructions are called through intrinsic functions, hopefully at least
>> these interfaces are the same for other compilers. The single goal of this
>> mockup was to do some benchmarking, so there is no gtp-interface, but that
>> would be rather easy to add for those skilled in the art. The standard rules
>> are used with one exception, the Brown eye-rule is used. Multi-stone suicide
>> is explicitly forbidden. No superko checks are performed during playouts,
>> and there is no mercy-rule either.
>>
>>
>>
>> I would be particularly interested to know if a 64-bit OS will improve the
>> performance significantly, as it does have a lot less register pressure.
>> Moving to a 64-bit OS will also make some instructions available that allow
>> for further simplifications/efficiency improvements in the code. As will
>> enabling SSE3 or SSE4.1 instructions.
>>
>>
>>
>> One note, the random number generator used is from Agner Fog's public
>> library and was not included in the attachment. You will have to provide a
>> random number generator yourself or download the appropriate version from
>> his well-known website (www.agner.org/optimize)
>>
>>
>>
>> I hope this is the start of a discussion about further improvements that
>> can be made, but you may use this code as you wish with no restrictions or
>> guarantees. See the readme.txt file included in the attachment for more
>> details (also partially included below).
>>
>>
>>
>> Comments and feedback welcome,
>>
>>
>>
>> René van de Veerdonk
>>
>>
>>
>> Excerpts from the readme.txt file:
>>
>>
>>
>> Introduction:
>>
>> =
>>
>>
>>
>> Mockup of a high performance implementation of the game of Go using
>> bitmaps
>>
>> as the principle datastructures. This implementation is limited to
>> measuring
>>
>> the performance of the principle components by playing random playouts
>> from
>>
>> an empty 9x9 board, using Chinese scoring, not allowing pass moves until
>>
>> no other moves are available. Suicide is not allowed, neither for single
>>
>> or multi-stone groups. No checks are implemented for superko, but simple
>>
>> ko's are disallowed explicitly.
>>
>>
>>
>> Some implementation details:
>>
>> 
>>
>>
>>
>> Benchmarking is done in test_board.cpp, where the number of playouts (n)
>> and
>>
>> repeats (j) for each

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Michael Williams

If an engine (or an engine's playout policy) needs to check the legality of 
every move before making a selection, this could still be a benefit.

René van de Veerdonk wrote:

David,

Confession: I have not tested 19x19. As you have noted, and others 
before you over the years, a 19x19 board does not fit in one but three 
128-bit registers and there would be a rather big penalty as a result, 
perhaps (likely?) wiping out all of the benefits of bitmaps. Antoine 
voiced his disappointment about the speed advantage even on 9x9 in the 
e-mail to the list that served as my basis. My hope, as Hideko Kato 
points out, is in longer registers or perhaps being able to port this to 
a GPU. A 64-bit OS with twice the number registers would also surely 
help out. Nevertheless, I was surprised that I was able to get to almost 
the same level of speed as Libego provides.


The goals for this mockup were very humble: (1) to provide a basic 
implementation to see what it can do (and if I could do it), and (2) 
learn how to work with assembly and/or intrinsic functions (I had never 
done that before). It may not be too hard to try 19x19 and just see what 
happens. I may attempt this as a follow-up.


René van de Veerdonk

2009/8/23 David Fotland >


How much would you lose for 19x19 board?  A board representation is
not very interesting unless it scales to 19 line boards.

 


David

 


*From:* computer-go-boun...@computer-go.org

[mailto:computer-go-boun...@computer-go.org
] *On Behalf Of *René
van de Veerdonk
*Sent:* Sunday, August 23, 2009 1:11 PM
*To:* computer-go@computer-go.org 
*Subject:* [computer-go] Bitmap Go revisited and mockup implementation

 


Hi all,

 


After years of reading this very interesting list, I decided to make
my first contribution this week after reading once again about
bitmap go. There is no freely available implementation of a bitmap
based engine that I know of, so here is a start. Note that I am not
a professional programmer, self-taught, and this is a part-time
hobby, so do not expect perfect form.

 


The attachment is an attempt to create an efficient implementation
of a bitmap based light playout engine, following mostly the
recommendations as spelled out by Antoine de Maricourt in his
November 2006 message to this list. The application is single
threaded and achieves around 75-80 kpps on my Centrino Duo 2.4 GHz
Intel labtop. The mockup was developed in C/C++ using the Microsoft
Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly
suffers from portability issues as a result. The 128-bit SSE
instructions are called through intrinsic functions, hopefully at
least these interfaces are the same for other compilers. The single
goal of this mockup was to do some benchmarking, so there is no
gtp-interface, but that would be rather easy to add for those
skilled in the art. The standard rules are used with one exception,
the Brown eye-rule is used. Multi-stone suicide is explicitly
forbidden. No superko checks are performed during playouts, and
there is no mercy-rule either.

 


I would be particularly interested to know if a 64-bit OS will
improve the performance significantly, as it does have a lot less
register pressure. Moving to a 64-bit OS will also make some
instructions available that allow for further
simplifications/efficiency improvements in the code. As will
enabling SSE3 or SSE4.1 instructions.

 


One note, the random number generator used is from Agner Fog's
public library and was not included in the attachment. You will have
to provide a random number generator yourself or download the
appropriate version from his well-known website
(www.agner.org/optimize )

 


I hope this is the start of a discussion about further improvements
that can be made, but you may use this code as you wish with no
restrictions or guarantees. See the readme.txt file included in the
attachment for more details (also partially included below).

 


Comments and feedback welcome,

 


René van de Veerdonk

 


Excerpts from the readme.txt file:

 


Introduction:

=

 


Mockup of a high performance implementation of the game of Go using
bitmaps

as the principle datastructures. This implementation is limited to
measuring

the performance of the principle components by playing random
playouts from

an empty 9x9 board, using Chinese scoring, not allowing pass moves until

no other moves are available. Suicide is not allowed, neither for single

or multi-stone groups. No checks are implemented for 

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread René van de Veerdonk
David,
Confession: I have not tested 19x19. As you have noted, and others before
you over the years, a 19x19 board does not fit in one but three 128-bit
registers and there would be a rather big penalty as a result, perhaps
(likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
disappointment about the speed advantage even on 9x9 in the e-mail to the
list that served as my basis. My hope, as Hideko Kato points out, is in
longer registers or perhaps being able to port this to a GPU. A 64-bit OS
with twice the number registers would also surely help out. Nevertheless, I
was surprised that I was able to get to almost the same level of speed as
Libego provides.

The goals for this mockup were very humble: (1) to provide a basic
implementation to see what it can do (and if I could do it), and (2) learn
how to work with assembly and/or intrinsic functions (I had never done that
before). It may not be too hard to try 19x19 and just see what happens. I
may attempt this as a follow-up.

René van de Veerdonk

2009/8/23 David Fotland 

>  How much would you lose for 19x19 board?  A board representation is not
> very interesting unless it scales to 19 line boards.
>
>
>
> David
>
>
>
> *From:* computer-go-boun...@computer-go.org [mailto:
> computer-go-boun...@computer-go.org] *On Behalf Of *René van de Veerdonk
> *Sent:* Sunday, August 23, 2009 1:11 PM
> *To:* computer-go@computer-go.org
> *Subject:* [computer-go] Bitmap Go revisited and mockup implementation
>
>
>
> Hi all,
>
>
>
> After years of reading this very interesting list, I decided to make my
> first contribution this week after reading once again about bitmap go. There
> is no freely available implementation of a bitmap based engine that I know
> of, so here is a start. Note that I am not a professional programmer,
> self-taught, and this is a part-time hobby, so do not expect perfect form.
>
>
>
> The attachment is an attempt to create an efficient implementation of a
> bitmap based light playout engine, following mostly the recommendations as
> spelled out by Antoine de Maricourt in his November 2006 message to this
> list. The application is single threaded and achieves around 75-80 kpps on
> my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
> using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
> undoubtedly suffers from portability issues as a result. The 128-bit SSE
> instructions are called through intrinsic functions, hopefully at least
> these interfaces are the same for other compilers. The single goal of this
> mockup was to do some benchmarking, so there is no gtp-interface, but that
> would be rather easy to add for those skilled in the art. The standard rules
> are used with one exception, the Brown eye-rule is used. Multi-stone suicide
> is explicitly forbidden. No superko checks are performed during playouts,
> and there is no mercy-rule either.
>
>
>
> I would be particularly interested to know if a 64-bit OS will improve the
> performance significantly, as it does have a lot less register pressure.
> Moving to a 64-bit OS will also make some instructions available that allow
> for further simplifications/efficiency improvements in the code. As will
> enabling SSE3 or SSE4.1 instructions.
>
>
>
> One note, the random number generator used is from Agner Fog's public
> library and was not included in the attachment. You will have to provide a
> random number generator yourself or download the appropriate version from
> his well-known website (www.agner.org/optimize)
>
>
>
> I hope this is the start of a discussion about further improvements that
> can be made, but you may use this code as you wish with no restrictions or
> guarantees. See the readme.txt file included in the attachment for more
> details (also partially included below).
>
>
>
> Comments and feedback welcome,
>
>
>
> René van de Veerdonk
>
>
>
> Excerpts from the readme.txt file:
>
>
>
> Introduction:
>
> =
>
>
>
> Mockup of a high performance implementation of the game of Go using bitmaps
>
> as the principle datastructures. This implementation is limited to
> measuring
>
> the performance of the principle components by playing random playouts from
>
> an empty 9x9 board, using Chinese scoring, not allowing pass moves until
>
> no other moves are available. Suicide is not allowed, neither for single
>
> or multi-stone groups. No checks are implemented for superko, but simple
>
> ko's are disallowed explicitly.
>
>
>
> Some implementation details:
>
> 
>
>
>
> Benchmarking is done in test_board.cpp, where the number of playouts (n)
> and
>
> repeats (j) for each test is set. Benchmarking is done starting from an
> empty
>
> 9x9 board until two (three with ko) passes are played in sequence. The
>
> benchmark playout follow the same scheme as method
> board_t::play_random_game,
>
> but with timing instrumentation added.
>
>
>
> Clock counts (cc) are measured using the cc_time_of macr