Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-26 Thread Michael Williams

I only showed the first line.  There were many more.  Gotta go to work though.

René van de Veerdonk wrote:

Micheal,

Is that all the output you get? Did you notice that the first line with 
test-error changed ... the second number went from 31 to 32. We did 
something! But not what I expected. BTW. The program does not normally 
crash, it just stops after its initial testing is completed and errors 
were detected. If you actually went from a gazillion line to just one, 
we have hit a nerve. I you just didn't bother to copy the rest of the 
output, we made no progress. But, since the output actually changed, it 
would still be nice to see a few more lines and see the trend.


Thanks for helping ... debugging for someone else must not be the best 
use of your time,


René

On Tue, Aug 25, 2009 at 8:03 PM, Michael Williams 
michaelwilliam...@gmail.com mailto:michaelwilliam...@gmail.com wrote:


No.  The code is now:


__forceinline
unsigned int bitmap_t::lowest_index () const {
 const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
 unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
 if (!_BitScanForward (m, m)) return 127;
 /* offending instruction */
 //return 8*m + __lzcnt (board.m128i_u32[m/4]);
 /* new code follows */
 unsigned long n;
 _BitScanForward (n, board.m128i_u32[m/4]);
 return 8*m + (32-n);
}


Debug:
testing _BitScanForward (m, 0): 0 1099227377

testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 1103042705

testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8
test-error [bitmap_t::lowest_index]: 0 32

Release:

testing _BitScanForward (m, 0): 0 16843009
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 4
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8
test-error [bitmap_t::lowest_index]: 0 32



René van de Veerdonk wrote:

Micheal,

Your output is actually correct. The output is undefined when
the second argument to _BitScanForward is zero and that is what
I am explicitly testing for in the method
bitmap_t::lowest_index, which is supposed to return an index to
any set bit (despite the name). Your output shows that it
returns a number that is off in a very predictable way (it seems
that replacing 8*m + n with 8*m + (32-n) on the last line
may do the trick for you, but it breaks my local copy). So, now
I am positively baffled.

Help?

René

On Tue, Aug 25, 2009 at 7:25 PM, Michael Williams
michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com wrote:

   Debug build:
   testing _BitScanForward (m, 0): 0 842032512

   testing _BitScanForward (m, 1): 1 0
   testing _BitScanForward (m, 256): 1 8
   testing _BitScanReverse (m, 0): 0 840277132

   testing _BitScanReverse (m, 1): 1 0
   testing _BitScanReverse (m, 256): 1 8

   Release build:

   testing _BitScanForward (m, 0): 0 16843009
   testing _BitScanForward (m, 1): 1 0
   testing _BitScanForward (m, 256): 1 8
   testing _BitScanReverse (m, 0): 0 4

   testing _BitScanReverse (m, 1): 1 0
   testing _BitScanReverse (m, 256): 1 8

   Sucks when they produce different results.



   René van de Veerdonk wrote:

   Micheal,

   Thanks for trying. Could you try one or two more things?

   (1) Replace the _BitScanForward in the new code to
   _BitScanReverse. This probably won't help.

   (2) Add this to bitmapgo.cpp:

   /* sandbox */
   const bool sandbox = true;
   void test_sandbox () {
unsigned long m = 220;
unsigned char test_it = _BitScanForward (m, 0);
std::cout  testing _BitScanForward (m, 0): 
   (int) test_it m  \n;
test_it = _BitScanForward (m, 1);
std::cout  testing _BitScanForward (m, 1): 
   (int) test_it m  \n;
test_it = _BitScanForward (m, 256);
std::cout  testing _BitScanForward (m, 256): 
   (int) test_it m  \n;
test_it = _BitScanReverse (m, 0);
std::cout  testing _BitScanReverse (m, 0): 
   (int) test_it m  \n;
test_it = _BitScanReverse (m, 1);
std::cout  testing _BitScanReverse (m, 1): 
   (int) test_it m  \n;
test_it = _BitScanReverse (m, 256);
std::cout  testing _BitScanReverse (m, 256): 
   

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Antoine de Maricourt

Hi René,

David,

Confession: I have not tested 19x19. As you have noted, and others 
before you over the years, a 19x19 board does not fit in one but three 
128-bit registers and there would be a rather big penalty as a result, 
perhaps (likely?) wiping out all of the benefits of bitmaps. Antoine 
voiced his disappointment about the speed advantage even on 9x9 in the 
e-mail to the list that served as my basis. My hope, as Hideko Kato 
points out, is in longer registers or perhaps being able to port this 
to a GPU. A 64-bit OS with twice the number registers would also 
surely help out. Nevertheless, I was surprised that I was able to get 
to almost the same level of speed as Libego provides.




As far as I remember, I was not disappointed by the speed itself on 9x9 
boards, but mainly with the following 2 points:


a) my feeling is that, as you say, it does not scale very well on 19x19 
(on current hardware).
I don't think other classical implementations suffer such a big 
penalty when scaling from 9x9 to 19x19.


b) I also felt that this kind of implementation was not very flexible.
For instance, I had another classical implementation, running at 
equivalent speed, but in which local 3x3 pattern matching was almost for 
free, as well as other more elaborated information.
When I started to think about introducing 3x3 local patterns in the 
bitmap only implementation, it was clear it would not be for free.


At that time, my conclusion was that if one only needs pure random play 
with no intelligence at all during playouts, then bitmap implementation 
could compete (at least on 9x9).
If one needs more elaborated knowledge (such as small patterns matching, 
or even knowledge about blocks of stones), then pure bitmap 
implementation is probably not so competitive.

I thus gave up with the idea and jumped to more promising ones.

Anyway, I'm glad my post has been usefull to you. And I encourage you to 
improve your implementation and let us know, especially if you have fun. 
Starting with something and playing with it is a good way to have new 
ideas, even if this makes your initial ones look less interesting a 
while after.


Best regards,

Antoine

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

(AMD Phenom CPU)

Michael Williams wrote:
It came through fine the first time.  Gmail probably hid the duplicate 
text from you for convenience since it saw that it was text that you 
already sent out. Or something.


I can compile the original (9x9) bitmap Go files in Windows 7 x64, but 
when I run it I get this:


test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error [bitmap_t::lowest_index]: 68 91
test-error [bitmap_t::lowest_index]: 69 90
test-error [bitmap_t::lowest_index]: 70 89
test-error [bitmap_t::lowest_index]: 71 88
test-error [bitmap_t::lowest_index]: 72 87
test-error [bitmap_t::lowest_index]: 73 86
test-error [bitmap_t::lowest_index]: 74 85
test-error [bitmap_t::lowest_index]: 75 84
test-error [bitmap_t::lowest_index]: 76 83
test-error [bitmap_t::lowest_index]: 77 82
test-error [bitmap_t::lowest_index]: 78 81
test-error [bitmap_t::lowest_index]: 79 80
test-error [bitmap_t::lowest_index]: 80 79
test-error-1 [bitmap_t::is_one_bit]: 0
test-error-1 [bitmap_t::is_one_bit]: 1
test-error-1 [bitmap_t::is_one_bit]: 2
test-error-1 [bitmap_t::is_one_bit]: 3
test-error-1 [bitmap_t::is_one_bit]: 4
test-error-1 [bitmap_t::is_one_bit]: 5
test-error-1 [bitmap_t::is_one_bit]: 6
test-error-1 [bitmap_t::is_one_bit]: 7
test-error-1 [bitmap_t::is_one_bit]: 8
test-error-1 [bitmap_t::is_one_bit]: 9
test-error-1 [bitmap_t::is_one_bit]: 10
test-error-1 [bitmap_t::is_one_bit]: 11
test-error-1 [bitmap_t::is_one_bit]: 12
test-error-1 [bitmap_t::is_one_bit]: 13
test-error-1 [bitmap_t::is_one_bit]: 14
test-error-1 [bitmap_t::is_one_bit]: 15
test-error-1 [bitmap_t::is_one_bit]: 16
test-error-1 [bitmap_t::is_one_bit]: 17
test-error-1 [bitmap_t::is_one_bit]: 18
test-error-1 [bitmap_t::is_one_bit]: 19
test-error-1 [bitmap_t::is_one_bit]: 20
test-error-1 [bitmap_t::is_one_bit]: 21
test-error-1 [bitmap_t::is_one_bit]: 22
test-error-1 [bitmap_t::is_one_bit]: 23
test-error-1 [bitmap_t::is_one_bit]: 24
test-error-1 [bitmap_t::is_one_bit]: 25
test-error-1 [bitmap_t::is_one_bit]: 26
test-error-1 [bitmap_t::is_one_bit]: 27
test-error-1 [bitmap_t::is_one_bit]: 28
test-error-1 [bitmap_t::is_one_bit]: 29
test-error-1 

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
Hi Micheal,

Thanks for the test (Intel - AMD, Windows XP - Windows 7, x32 - x64,
great, three birds with one stone!). So much for portability. But, hooray
for test-cases!

This may be related to the __lzcnt intrinsic, giving you a completely
different result, i.e., 32 - my result. Could you try replacing this
function in bitmap.cpp? I believe I tested this alternative during
development and it generates only slightly worse performing code.

__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
  if (!_BitScanForward (m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (n, board.m128i_u32[m/4]);
  return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many things to
try!

On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams 
michaelwilliam...@gmail.com wrote:

 (AMD Phenom CPU)


 Michael Williams wrote:

 It came through fine the first time.  Gmail probably hid the duplicate
 text from you for convenience since it saw that it was text that you already
 sent out. Or something.

 I can compile the original (9x9) bitmap Go files in Windows 7 x64, but
 when I run it I get this:

 test-error [bitmap_t::lowest_index]: 0 31
 test-error [bitmap_t::lowest_index]: 1 30
 test-error [bitmap_t::lowest_index]: 2 29
 test-error [bitmap_t::lowest_index]: 3 28
 test-error [bitmap_t::lowest_index]: 4 27
 test-error [bitmap_t::lowest_index]: 5 26
 test-error [bitmap_t::lowest_index]: 6 25
 test-error [bitmap_t::lowest_index]: 7 24
 test-error [bitmap_t::lowest_index]: 8 23
 test-error [bitmap_t::lowest_index]: 9 22
 test-error [bitmap_t::lowest_index]: 10 21
 test-error [bitmap_t::lowest_index]: 11 20
 test-error [bitmap_t::lowest_index]: 12 19
 test-error [bitmap_t::lowest_index]: 13 18
 test-error [bitmap_t::lowest_index]: 14 17
 test-error [bitmap_t::lowest_index]: 15 16
 test-error [bitmap_t::lowest_index]: 16 15
 test-error [bitmap_t::lowest_index]: 17 14
 test-error [bitmap_t::lowest_index]: 18 13
 test-error [bitmap_t::lowest_index]: 19 12
 test-error [bitmap_t::lowest_index]: 20 11
 test-error [bitmap_t::lowest_index]: 21 10
 test-error [bitmap_t::lowest_index]: 22 9
 test-error [bitmap_t::lowest_index]: 23 8
 test-error [bitmap_t::lowest_index]: 24 7
 test-error [bitmap_t::lowest_index]: 25 6
 test-error [bitmap_t::lowest_index]: 26 5
 test-error [bitmap_t::lowest_index]: 27 4
 test-error [bitmap_t::lowest_index]: 28 3
 test-error [bitmap_t::lowest_index]: 29 2
 test-error [bitmap_t::lowest_index]: 30 1
 test-error [bitmap_t::lowest_index]: 31 0
 test-error [bitmap_t::lowest_index]: 32 63
 test-error [bitmap_t::lowest_index]: 33 62
 test-error [bitmap_t::lowest_index]: 34 61
 test-error [bitmap_t::lowest_index]: 35 60
 test-error [bitmap_t::lowest_index]: 36 59
 test-error [bitmap_t::lowest_index]: 37 58
 test-error [bitmap_t::lowest_index]: 38 57
 test-error [bitmap_t::lowest_index]: 39 56
 test-error [bitmap_t::lowest_index]: 40 55
 test-error [bitmap_t::lowest_index]: 41 54
 test-error [bitmap_t::lowest_index]: 42 53
 test-error [bitmap_t::lowest_index]: 43 52
 test-error [bitmap_t::lowest_index]: 44 51
 test-error [bitmap_t::lowest_index]: 45 50
 test-error [bitmap_t::lowest_index]: 46 49
 test-error [bitmap_t::lowest_index]: 47 48
 test-error [bitmap_t::lowest_index]: 48 47
 test-error [bitmap_t::lowest_index]: 49 46
 test-error [bitmap_t::lowest_index]: 50 45
 test-error [bitmap_t::lowest_index]: 51 44
 test-error [bitmap_t::lowest_index]: 52 43
 test-error [bitmap_t::lowest_index]: 53 42
 test-error [bitmap_t::lowest_index]: 54 41
 test-error [bitmap_t::lowest_index]: 55 40
 test-error [bitmap_t::lowest_index]: 56 39
 test-error [bitmap_t::lowest_index]: 57 38
 test-error [bitmap_t::lowest_index]: 58 37
 test-error [bitmap_t::lowest_index]: 59 36
 test-error [bitmap_t::lowest_index]: 60 35
 test-error [bitmap_t::lowest_index]: 61 34
 test-error [bitmap_t::lowest_index]: 62 33
 test-error [bitmap_t::lowest_index]: 63 32
 test-error [bitmap_t::lowest_index]: 64 95
 test-error [bitmap_t::lowest_index]: 65 94
 test-error [bitmap_t::lowest_index]: 66 93
 test-error [bitmap_t::lowest_index]: 67 92
 test-error [bitmap_t::lowest_index]: 68 91
 test-error [bitmap_t::lowest_index]: 69 90
 test-error [bitmap_t::lowest_index]: 70 89
 test-error [bitmap_t::lowest_index]: 71 88
 test-error [bitmap_t::lowest_index]: 72 87
 test-error [bitmap_t::lowest_index]: 73 86
 test-error [bitmap_t::lowest_index]: 74 85
 test-error [bitmap_t::lowest_index]: 75 84
 test-error [bitmap_t::lowest_index]: 76 83
 test-error [bitmap_t::lowest_index]: 77 82
 test-error [bitmap_t::lowest_index]: 78 81
 test-error [bitmap_t::lowest_index]: 79 80
 test-error [bitmap_t::lowest_index]: 80 79
 test-error-1 [bitmap_t::is_one_bit]: 0
 test-error-1 [bitmap_t::is_one_bit]: 1
 test-error-1 

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

That doesn't seem to have any effect on the results.

René van de Veerdonk wrote:

Hi Micheal,

Thanks for the test (Intel - AMD, Windows XP - Windows 7, x32 - x64, 
great, three birds with one stone!). So much for portability. But, 
hooray for test-cases!


This may be related to the __lzcnt intrinsic, giving you a completely 
different result, i.e., 32 - my result. Could you try replacing this 
function in bitmap.cpp? I believe I tested this alternative during 
development and it generates only slightly worse performing code.


__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
  if (!_BitScanForward (m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (n, board.m128i_u32[m/4]);
  return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many things 
to try!


On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams 
michaelwilliam...@gmail.com mailto:michaelwilliam...@gmail.com wrote:


(AMD Phenom CPU)


Michael Williams wrote:

It came through fine the first time.  Gmail probably hid the
duplicate text from you for convenience since it saw that it was
text that you already sent out. Or something.

I can compile the original (9x9) bitmap Go files in Windows 7
x64, but when I run it I get this:

test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error 

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
Zach,
I can't help the windowsiness, but thanks for taking a look already.

Popcounting I believe to be optimal for 32-bit only instructions, especially
since I would like to keep the intermediate byte-count to help speed up the
random pick function. There is a really neat trick to speed up the final
accumulations, but I couldn't find the sse2 instructions to implement it.
Bitscanning is indeed awkward the way I implemented it, but I didn't see a
better way to do it as there is no compare instruction that yields a boolean
as the result. I would be interested to see what you come up with for
shifting, I believe that this is where a lot of time is spend.

René

On Tue, Aug 25, 2009 at 6:20 PM, Zach Wegner zweg...@gmail.com wrote:

 On an initial look, it seems that the shifting, popcounting, and
 bitscanning functions can be improved significantly. That's all I
 looked at closely, perhaps the other board routines could use some
 reworking too.

 I'll try my hand at optimizing it if I get the time, but the windowsy
 code makes it a bit difficult.

 Zach
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Don Dailey
A couple of notes.   Some of us on this computer-go forum are also Chess
programmers and many of write 64 bit chess programs.   I am one of them but
I know there are others.

My 64 bit chess program runs almost 2X faster if I compile it to run on a 64
bit OS - in other words I'm using real 64 bit values to represent the bit
boards.So this really is a big deal - you can expect a pretty good gain.

I also toyed with a bit board style go program and started working on one
long ago.   You still need several 64 bit words to represent the state of
each  color.I started to write some routines that did shift, popcount,
etc as fast as possible.  Basically you had to compile the program for each
board size, and it figured out how many bits were needed and simulated
(using struct) an integer data type that was large enough.   About half way
through I decided not to proceed - at the time I had some idea that I was
more interested in and I never picked it back up - assuming that it would be
too slow (but I didn't prove it.)

I think you could look at glaurung (the chess program) to find great
routines for the common bit operations.  I think the author of Glaurung does
it about as fast as it can be done and even has assembly code for finding
the right-most bit,  or something like that. But you could probably
learn something from looking at that program.

I'm not sure how useful this might be for go, but there are some pretty
clever tricks with minimal perfect hashing - that you might find some use
for.   Perhaps there is a way to make this work for eye-detection or
someting. There is some great explanations here:
http://chessprogramming.wikispaces.com/I think you want to look under
move generation perhaps. Perhaps this cannot be used for GO, but it's
fun to read anyway :-) The only issue of course is that you need
more than 64 bits for go - but the general principles might be made to work.


- Don



2009/8/25 René van de Veerdonk rene.vandeveerd...@gmail.com

 Antoine,
 I apologize for misrepresenting your feelings. What I wanted to convey is
 that your e-mail expressed that with the same amount of computation, other
 implementation strategies provide more information, so the information
 gained for the effort put in is disappointing. In other words, bitmap-go had
 better be much faster to make up for the lack of information gained ... and
 it wasn't. That's pretty much what you are saying as well, I believe. As are
 the others in this thread. I think we can all agree on this.

 Going into this project, I was well aware that I was going to end up with
 light playouts only, and that heavy playouts is the way to go except perhaps
 for some special purposes such as ladder readouts. I was also well aware of
 the scaling argument. But, I hadn't thought that this would be the first
 thing thrown at my first post, as there are many programmers that seem to be
 stuck at 9x9. Nonetheless, bitmap-go as a topic keeps resurfacing on this
 mailing list every once in a while and nobody ever put solid data and a
 reference implementation behind it. That is what I wanted to accomplish with
 my mockup.

 Confession #2: I have been working on my own MCTS implementation using the
 standard implementation methods that almost everybody else is using. But
 there is a never-ending laundry-list of things that I would like to include
 before I would reach reasonable strength (where reasonable is a moving
 target). In the mean time, there are many others that have demonstrated much
 more capable programmers than I ever will be. So, by providing this
 bitmap-go mockup at least I had the feeling that I was contributing
 something newsworthy to the conversation. This may have never happened if I
 would wait until my MCTS program is ready. I imagine that there are others
 on this list in a similar situation. Besides, this was an interesting
 side-project and perhaps someone else can benefit from it (go for it
 Brian!). And, yes, it was fun.

 Okay, enough mesmerizing, on to the main topic.

 David, Lukasz,

 I did modify my mockup to do 19x19 bitmap-go (attached). It is a hardcoded
 solution using arrays of three 128-bit variables. I did not even attempt to
 optimize this version, so this is not the best possible solution.
 Nonetheless, here is a comparison:

 Example output (9x9):
 =

 [game] = 30236.1, [moves] = 111.071
 [game] = 30249.7, [moves] = 111.068
 [game] = 30145.7, [moves] = 111.089
 [game] = 30237.7, [moves] = 111.122
 [game] = 30210.1, [moves] = 111.101

 [game] = 78.0488 kpps, [moves] = 111.023
 [game] = 78.0488 kpps, [moves] = 111.045
 [game] = 78.0488 kpps, [moves] = 111.046
 [game] = 79.0123 kpps, [moves] = 111.131
 [game] = 78.0488 kpps, [moves] = 111.082

 [legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44, [win] 0.4187
 [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40, [win] 0.4201
 [legal] 111/52, [pick] 111/75, [play] 106/168, [score] 42, [win] 0.4276
 

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
Mark,
I would argue that 10x vs 5x is not a deal-breaker if bitmaps would support
your particular goals better. They do provide the advantage of automatically
providing you all the liberties of strings, for instance. Requests on how to
do that efficiently come by on this mailing list on a regular basis. But I
agree that it does not look very good in general.

As for merging being time consuming. The merge portion of the play function
looks like this:

  /* merge friendly strings */
  bool single_stone_string = true;
  do {
bitmap_t adj_strings = bm_adjacent  bm_my_points;
while (!adj_strings.is_empty ()) {
  unsigned int i = adj_strings.lowest_index ();
  do {
i = anchor[i];
  } while (i != anchor[i]);
  anchor[i] = move;
  single_stone_string = false;
  adj_strings.clear (string[i]);
  bm_string.set (string[i]);
  bm_liberty.set (liberty[i]);
}
bm_liberty.clear (bm_move);
atari.clear (bm_string);
  } while (false);

The difference between 9x9 and 19x19 is in the average length of the linked
list anchor[]. The merge itself should be the same work, other than the fact
that for 19x19 I now have to perform the work on three elements per bitmap
as compared to just one for 9x9. Moving away from a linked list may prove
beneficial for 19x19, but than you need to ensure that all the anchors for a
string are correctly set during merge. That means looping over all the
stones in the string, which I do not believe to be very efficient with
bitmaps as it requires cumbersome bit-scanning.

René

On Tue, Aug 25, 2009 at 6:09 PM, Mark Boon tesujisoftw...@gmail.com wrote:

 2009/8/25 René van de Veerdonk rene.vandeveerd...@gmail.com:
  Nonetheless, bitmap-go as a topic keeps resurfacing on this
  mailing list every once in a while and nobody ever put solid data and a
  reference implementation behind it. That is what I wanted to accomplish
 with
  my mockup.

 I think this is interesting information, so I think it's good you
 tried it. It basically means unless someone has a much smarter idea
 how to implement bit-sets, the performance is not enough to go through
 the trouble for most programmers.

  So, it looks that I was overly pessimistic in terms of performance drop
 per
  move, which is a factor of 2.4x (with little effort to optimize for
 19x19).
  But, with 4.1x more moves, this still resulted in a 10x speed penalty.
 With
  the provided reference numbers, Libego only drops 4.5-5.0x, indicating
 that
  there is almost no performance loss per move (1.2x). Is this difference
  roughly in line with others expectation?

 Yes, I'd expect that. The main speed-determining factor in traditional
 implementations is merging. That depends on the average length of the
 merge. Obviously this can be much longer on 19x19 as a worst case, but
 on average I wouldn't expect it to make much difference compared to
 other board-sizes.

 Mark
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

Debug build:
testing _BitScanForward (m, 0): 0 842032512
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 840277132
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8

Release build:
testing _BitScanForward (m, 0): 0 16843009
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 4
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8

Sucks when they produce different results.


René van de Veerdonk wrote:

Micheal,

Thanks for trying. Could you try one or two more things?

(1) Replace the _BitScanForward in the new code to _BitScanReverse. This 
probably won't help.


(2) Add this to bitmapgo.cpp:

/* sandbox */
const bool sandbox = true;
void test_sandbox () {
  unsigned long m = 220;
  unsigned char test_it = _BitScanForward (m, 0);
  std::cout  testing _BitScanForward (m, 0): 
 (int) test_it m  \n;
  test_it = _BitScanForward (m, 1);
  std::cout  testing _BitScanForward (m, 1): 
 (int) test_it m  \n;
  test_it = _BitScanForward (m, 256);
  std::cout  testing _BitScanForward (m, 256): 
 (int) test_it m  \n;
  test_it = _BitScanReverse (m, 0);
  std::cout  testing _BitScanReverse (m, 0): 
 (int) test_it m  \n;
  test_it = _BitScanReverse (m, 1);
  std::cout  testing _BitScanReverse (m, 1): 
 (int) test_it m  \n;
  test_it = _BitScanReverse (m, 256);
  std::cout  testing _BitScanReverse (m, 256): 
 (int) test_it m  \n;
}

and add a line something like this in the main program:
  if (test  sandbox) test_sandbox (); /* new line */
  if (test  verify) test = test_bitmap_t::test (); /* already there */

That does not solve anything, but it may tell me if those functions are 
doing for you what they do for me.


My output is:
testing _BitScanForward (m, 0): 0 16843009
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 3432104
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8

Thanks, I really appreciate your help !

René

On Tue, Aug 25, 2009 at 6:38 PM, Michael Williams 
michaelwilliam...@gmail.com mailto:michaelwilliam...@gmail.com wrote:


That doesn't seem to have any effect on the results.


René van de Veerdonk wrote:

Hi Micheal,

Thanks for the test (Intel - AMD, Windows XP - Windows 7, x32
- x64, great, three birds with one stone!). So much for
portability. But, hooray for test-cases!

This may be related to the __lzcnt intrinsic, giving you a
completely different result, i.e., 32 - my result. Could you try
replacing this function in bitmap.cpp? I believe I tested this
alternative during development and it generates only slightly
worse performing code.

__forceinline
unsigned int bitmap_t::lowest_index () const {
 const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
 unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
 if (!_BitScanForward (m, m)) return 127;
 /* offending instruction */
 //return 8*m + __lzcnt (board.m128i_u32[m/4]);
 /* new code follows */
 unsigned long n;
 _BitScanForward (n, board.m128i_u32[m/4]);
 return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many
things to try!

On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams
michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com wrote:

   (AMD Phenom CPU)


   Michael Williams wrote:

   It came through fine the first time.  Gmail probably hid the
   duplicate text from you for convenience since it saw that
it was
   text that you already sent out. Or something.

   I can compile the original (9x9) bitmap Go files in Windows 7
   x64, but when I run it I get this:

   test-error [bitmap_t::lowest_index]: 0 31
   test-error [bitmap_t::lowest_index]: 1 30
   test-error [bitmap_t::lowest_index]: 2 29
   test-error [bitmap_t::lowest_index]: 3 28
   test-error [bitmap_t::lowest_index]: 4 27
   test-error [bitmap_t::lowest_index]: 5 26
   test-error [bitmap_t::lowest_index]: 6 25
   test-error [bitmap_t::lowest_index]: 7 24
   test-error [bitmap_t::lowest_index]: 8 23
   test-error [bitmap_t::lowest_index]: 9 22
   test-error [bitmap_t::lowest_index]: 10 21
   test-error [bitmap_t::lowest_index]: 11 20
   test-error [bitmap_t::lowest_index]: 12 19
   test-error [bitmap_t::lowest_index]: 13 18
   test-error [bitmap_t::lowest_index]: 14 17

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

No.  The code is now:

__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
  if (!_BitScanForward (m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (n, board.m128i_u32[m/4]);
  return 8*m + (32-n);
}


Debug:
testing _BitScanForward (m, 0): 0 1099227377
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 1103042705
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8
test-error [bitmap_t::lowest_index]: 0 32

Release:
testing _BitScanForward (m, 0): 0 16843009
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 4
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8
test-error [bitmap_t::lowest_index]: 0 32


René van de Veerdonk wrote:

Micheal,

Your output is actually correct. The output is undefined when the second 
argument to _BitScanForward is zero and that is what I am explicitly 
testing for in the method bitmap_t::lowest_index, which is supposed to 
return an index to any set bit (despite the name). Your output shows 
that it returns a number that is off in a very predictable way (it seems 
that replacing 8*m + n with 8*m + (32-n) on the last line may do the 
trick for you, but it breaks my local copy). So, now I am positively 
baffled.


Help?

René

On Tue, Aug 25, 2009 at 7:25 PM, Michael Williams 
michaelwilliam...@gmail.com mailto:michaelwilliam...@gmail.com wrote:


Debug build:
testing _BitScanForward (m, 0): 0 842032512

testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 840277132

testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8

Release build:

testing _BitScanForward (m, 0): 0 16843009
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 4

testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8

Sucks when they produce different results.



René van de Veerdonk wrote:

Micheal,

Thanks for trying. Could you try one or two more things?

(1) Replace the _BitScanForward in the new code to
_BitScanReverse. This probably won't help.

(2) Add this to bitmapgo.cpp:

/* sandbox */
const bool sandbox = true;
void test_sandbox () {
 unsigned long m = 220;
 unsigned char test_it = _BitScanForward (m, 0);
 std::cout  testing _BitScanForward (m, 0): 
(int) test_it m  \n;
 test_it = _BitScanForward (m, 1);
 std::cout  testing _BitScanForward (m, 1): 
(int) test_it m  \n;
 test_it = _BitScanForward (m, 256);
 std::cout  testing _BitScanForward (m, 256): 
(int) test_it m  \n;
 test_it = _BitScanReverse (m, 0);
 std::cout  testing _BitScanReverse (m, 0): 
(int) test_it m  \n;
 test_it = _BitScanReverse (m, 1);
 std::cout  testing _BitScanReverse (m, 1): 
(int) test_it m  \n;
 test_it = _BitScanReverse (m, 256);
 std::cout  testing _BitScanReverse (m, 256): 
(int) test_it m  \n;
}

and add a line something like this in the main program:
 if (test  sandbox) test_sandbox (); /* new line */
 if (test  verify) test = test_bitmap_t::test (); /* already
there */

That does not solve anything, but it may tell me if those
functions are doing for you what they do for me.

My output is:
testing _BitScanForward (m, 0): 0 16843009
testing _BitScanForward (m, 1): 1 0
testing _BitScanForward (m, 256): 1 8
testing _BitScanReverse (m, 0): 0 3432104
testing _BitScanReverse (m, 1): 1 0
testing _BitScanReverse (m, 256): 1 8

Thanks, I really appreciate your help !

René

On Tue, Aug 25, 2009 at 6:38 PM, Michael Williams
michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com
mailto:michaelwilliam...@gmail.com wrote:

   That doesn't seem to have any effect on the results.


   René van de Veerdonk wrote:

   Hi Micheal,

   Thanks for the test (Intel - AMD, Windows XP - Windows
7, x32
   - x64, great, three birds with one stone!). So much for
   portability. But, hooray for test-cases!

   This may be related to the __lzcnt intrinsic, giving you a
   completely different result, i.e., 32 - my result. Could
you try
   replacing this 

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread René van de Veerdonk
David,
Confession: I have not tested 19x19. As you have noted, and others before
you over the years, a 19x19 board does not fit in one but three 128-bit
registers and there would be a rather big penalty as a result, perhaps
(likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
disappointment about the speed advantage even on 9x9 in the e-mail to the
list that served as my basis. My hope, as Hideko Kato points out, is in
longer registers or perhaps being able to port this to a GPU. A 64-bit OS
with twice the number registers would also surely help out. Nevertheless, I
was surprised that I was able to get to almost the same level of speed as
Libego provides.

The goals for this mockup were very humble: (1) to provide a basic
implementation to see what it can do (and if I could do it), and (2) learn
how to work with assembly and/or intrinsic functions (I had never done that
before). It may not be too hard to try 19x19 and just see what happens. I
may attempt this as a follow-up.

René van de Veerdonk

2009/8/23 David Fotland fotl...@smart-games.com

  How much would you lose for 19x19 board?  A board representation is not
 very interesting unless it scales to 19 line boards.



 David



 *From:* computer-go-boun...@computer-go.org [mailto:
 computer-go-boun...@computer-go.org] *On Behalf Of *René van de Veerdonk
 *Sent:* Sunday, August 23, 2009 1:11 PM
 *To:* computer-go@computer-go.org
 *Subject:* [computer-go] Bitmap Go revisited and mockup implementation



 Hi all,



 After years of reading this very interesting list, I decided to make my
 first contribution this week after reading once again about bitmap go. There
 is no freely available implementation of a bitmap based engine that I know
 of, so here is a start. Note that I am not a professional programmer,
 self-taught, and this is a part-time hobby, so do not expect perfect form.



 The attachment is an attempt to create an efficient implementation of a
 bitmap based light playout engine, following mostly the recommendations as
 spelled out by Antoine de Maricourt in his November 2006 message to this
 list. The application is single threaded and achieves around 75-80 kpps on
 my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
 using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
 undoubtedly suffers from portability issues as a result. The 128-bit SSE
 instructions are called through intrinsic functions, hopefully at least
 these interfaces are the same for other compilers. The single goal of this
 mockup was to do some benchmarking, so there is no gtp-interface, but that
 would be rather easy to add for those skilled in the art. The standard rules
 are used with one exception, the Brown eye-rule is used. Multi-stone suicide
 is explicitly forbidden. No superko checks are performed during playouts,
 and there is no mercy-rule either.



 I would be particularly interested to know if a 64-bit OS will improve the
 performance significantly, as it does have a lot less register pressure.
 Moving to a 64-bit OS will also make some instructions available that allow
 for further simplifications/efficiency improvements in the code. As will
 enabling SSE3 or SSE4.1 instructions.



 One note, the random number generator used is from Agner Fog's public
 library and was not included in the attachment. You will have to provide a
 random number generator yourself or download the appropriate version from
 his well-known website (www.agner.org/optimize)



 I hope this is the start of a discussion about further improvements that
 can be made, but you may use this code as you wish with no restrictions or
 guarantees. See the readme.txt file included in the attachment for more
 details (also partially included below).



 Comments and feedback welcome,



 René van de Veerdonk



 Excerpts from the readme.txt file:



 Introduction:

 =



 Mockup of a high performance implementation of the game of Go using bitmaps

 as the principle datastructures. This implementation is limited to
 measuring

 the performance of the principle components by playing random playouts from

 an empty 9x9 board, using Chinese scoring, not allowing pass moves until

 no other moves are available. Suicide is not allowed, neither for single

 or multi-stone groups. No checks are implemented for superko, but simple

 ko's are disallowed explicitly.



 Some implementation details:

 



 Benchmarking is done in test_board.cpp, where the number of playouts (n)
 and

 repeats (j) for each test is set. Benchmarking is done starting from an
 empty

 9x9 board until two (three with ko) passes are played in sequence. The

 benchmark playout follow the same scheme as method
 board_t::play_random_game,

 but with timing instrumentation added.



 Clock counts (cc) are measured using the cc_time_of macro. Total run-time

 measurement relies on the windows library.



 For each move during a playout

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Michael Williams

If an engine (or an engine's playout policy) needs to check the legality of 
every move before making a selection, this could still be a benefit.

René van de Veerdonk wrote:

David,

Confession: I have not tested 19x19. As you have noted, and others 
before you over the years, a 19x19 board does not fit in one but three 
128-bit registers and there would be a rather big penalty as a result, 
perhaps (likely?) wiping out all of the benefits of bitmaps. Antoine 
voiced his disappointment about the speed advantage even on 9x9 in the 
e-mail to the list that served as my basis. My hope, as Hideko Kato 
points out, is in longer registers or perhaps being able to port this to 
a GPU. A 64-bit OS with twice the number registers would also surely 
help out. Nevertheless, I was surprised that I was able to get to almost 
the same level of speed as Libego provides.


The goals for this mockup were very humble: (1) to provide a basic 
implementation to see what it can do (and if I could do it), and (2) 
learn how to work with assembly and/or intrinsic functions (I had never 
done that before). It may not be too hard to try 19x19 and just see what 
happens. I may attempt this as a follow-up.


René van de Veerdonk

2009/8/23 David Fotland fotl...@smart-games.com 
mailto:fotl...@smart-games.com


How much would you lose for 19x19 board?  A board representation is
not very interesting unless it scales to 19 line boards.

 


David

 


*From:* computer-go-boun...@computer-go.org
mailto:computer-go-boun...@computer-go.org
[mailto:computer-go-boun...@computer-go.org
mailto:computer-go-boun...@computer-go.org] *On Behalf Of *René
van de Veerdonk
*Sent:* Sunday, August 23, 2009 1:11 PM
*To:* computer-go@computer-go.org mailto:computer-go@computer-go.org
*Subject:* [computer-go] Bitmap Go revisited and mockup implementation

 


Hi all,

 


After years of reading this very interesting list, I decided to make
my first contribution this week after reading once again about
bitmap go. There is no freely available implementation of a bitmap
based engine that I know of, so here is a start. Note that I am not
a professional programmer, self-taught, and this is a part-time
hobby, so do not expect perfect form.

 


The attachment is an attempt to create an efficient implementation
of a bitmap based light playout engine, following mostly the
recommendations as spelled out by Antoine de Maricourt in his
November 2006 message to this list. The application is single
threaded and achieves around 75-80 kpps on my Centrino Duo 2.4 GHz
Intel labtop. The mockup was developed in C/C++ using the Microsoft
Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly
suffers from portability issues as a result. The 128-bit SSE
instructions are called through intrinsic functions, hopefully at
least these interfaces are the same for other compilers. The single
goal of this mockup was to do some benchmarking, so there is no
gtp-interface, but that would be rather easy to add for those
skilled in the art. The standard rules are used with one exception,
the Brown eye-rule is used. Multi-stone suicide is explicitly
forbidden. No superko checks are performed during playouts, and
there is no mercy-rule either.

 


I would be particularly interested to know if a 64-bit OS will
improve the performance significantly, as it does have a lot less
register pressure. Moving to a 64-bit OS will also make some
instructions available that allow for further
simplifications/efficiency improvements in the code. As will
enabling SSE3 or SSE4.1 instructions.

 


One note, the random number generator used is from Agner Fog's
public library and was not included in the attachment. You will have
to provide a random number generator yourself or download the
appropriate version from his well-known website
(www.agner.org/optimize http://www.agner.org/optimize)

 


I hope this is the start of a discussion about further improvements
that can be made, but you may use this code as you wish with no
restrictions or guarantees. See the readme.txt file included in the
attachment for more details (also partially included below).

 


Comments and feedback welcome,

 


René van de Veerdonk

 


Excerpts from the readme.txt file:

 


Introduction:

=

 


Mockup of a high performance implementation of the game of Go using
bitmaps

as the principle datastructures. This implementation is limited to
measuring

the performance of the principle components by playing random
playouts from

an empty 9x9 board, using Chinese scoring, not allowing pass moves until

no other moves are available. Suicide is not allowed, neither for single

or multi-stone groups. No checks

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Łukasz Lew
2009/8/24 René van de Veerdonk rene.vandeveerd...@gmail.com:
 David,
 Confession: I have not tested 19x19. As you have noted, and others before
 you over the years, a 19x19 board does not fit in one but three 128-bit
 registers and there would be a rather big penalty as a result, perhaps
 (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
 disappointment about the speed advantage even on 9x9 in the e-mail to the
 list that served as my basis. My hope, as Hideko Kato points out, is in
 longer registers or perhaps being able to port this to a GPU. A 64-bit OS
 with twice the number registers would also surely help out. Nevertheless, I
 was surprised that I was able to get to almost the same level of speed as
 Libego provides.

Libego provides 105-108 kpps on a single Core2 2.4 GHz on 9x9.
and 22-23 kpps on 19x19

Can you test your bitmap implementation on 19x19?

Lukasz

 The goals for this mockup were very humble: (1) to provide a basic
 implementation to see what it can do (and if I could do it), and (2) learn
 how to work with assembly and/or intrinsic functions (I had never done that
 before). It may not be too hard to try 19x19 and just see what happens. I
 may attempt this as a follow-up.
 René van de Veerdonk
 2009/8/23 David Fotland fotl...@smart-games.com

 How much would you lose for 19x19 board?  A board representation is not
 very interesting unless it scales to 19 line boards.



 David



 From: computer-go-boun...@computer-go.org
 [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
 Veerdonk
 Sent: Sunday, August 23, 2009 1:11 PM
 To: computer-go@computer-go.org
 Subject: [computer-go] Bitmap Go revisited and mockup implementation



 Hi all,



 After years of reading this very interesting list, I decided to make my
 first contribution this week after reading once again about bitmap go. There
 is no freely available implementation of a bitmap based engine that I know
 of, so here is a start. Note that I am not a professional programmer,
 self-taught, and this is a part-time hobby, so do not expect perfect form.



 The attachment is an attempt to create an efficient implementation of a
 bitmap based light playout engine, following mostly the recommendations as
 spelled out by Antoine de Maricourt in his November 2006 message to this
 list. The application is single threaded and achieves around 75-80 kpps on
 my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
 using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
 undoubtedly suffers from portability issues as a result. The 128-bit SSE
 instructions are called through intrinsic functions, hopefully at least
 these interfaces are the same for other compilers. The single goal of this
 mockup was to do some benchmarking, so there is no gtp-interface, but that
 would be rather easy to add for those skilled in the art. The standard rules
 are used with one exception, the Brown eye-rule is used. Multi-stone suicide
 is explicitly forbidden. No superko checks are performed during playouts,
 and there is no mercy-rule either.



 I would be particularly interested to know if a 64-bit OS will improve the
 performance significantly, as it does have a lot less register pressure.
 Moving to a 64-bit OS will also make some instructions available that allow
 for further simplifications/efficiency improvements in the code. As will
 enabling SSE3 or SSE4.1 instructions.



 One note, the random number generator used is from Agner Fog's public
 library and was not included in the attachment. You will have to provide a
 random number generator yourself or download the appropriate version from
 his well-known website (www.agner.org/optimize)



 I hope this is the start of a discussion about further improvements that
 can be made, but you may use this code as you wish with no restrictions or
 guarantees. See the readme.txt file included in the attachment for more
 details (also partially included below).



 Comments and feedback welcome,



 René van de Veerdonk



 Excerpts from the readme.txt file:



 Introduction:

 =



 Mockup of a high performance implementation of the game of Go using
 bitmaps

 as the principle datastructures. This implementation is limited to
 measuring

 the performance of the principle components by playing random playouts
 from

 an empty 9x9 board, using Chinese scoring, not allowing pass moves until

 no other moves are available. Suicide is not allowed, neither for single

 or multi-stone groups. No checks are implemented for superko, but simple

 ko's are disallowed explicitly.



 Some implementation details:

 



 Benchmarking is done in test_board.cpp, where the number of playouts (n)
 and

 repeats (j) for each test is set. Benchmarking is done starting from an
 empty

 9x9 board until two (three with ko) passes are played in sequence. The

 benchmark playout follow the same scheme as method
 board_t

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread René van de Veerdonk
Lukasz,
I tested Libego version 0.125 on my labtop, compiled using Visual Studio
2008 under Windows XP with slight modifications:

#ifdef NDEBUG section commented out in testing.h
#include time.h added to fast_random.cpp

From the command-line with engine -b I observed 67-71 kpps on the same
labtop as I used to benchmark the 9x9 bitmap mockup achieving 75-80 kpps.

It seems Libego performs much better (going from 70 to 105 kpps is an
impressive 50% improvement) with other compilers and OS. For sure it was not
developed for MS Visual Studio, so my numbers are not an apples to apples
comparison. But I think it is fair to say that both implementations are at
least in the same ball-park when compiled as is under Visual Studio in
Windows XP. I do not have an explanation for why my results are 50% below
the numbers you quoted other than the difference in compiler (inlining?) and
OS (32 vs 64-bit?). It would be really interesting to have someone else do a
comparison. Given that I was natively working and optimizing in Visual
Studio, I am not expecting a 50% improvement for the bitmap mockup (but
hopefully it will not decrease 50% either).

As for testing on a 19x19 board, this is not as straightforward for me as
modifying a single parameter in the program. The basic data-type needs to be
changed from a single 128-bit variable to an array of three such elements.
This requires a redefinition of most basic operations, in particular the
shift operations will take some work as these go across array-elements. In
the generated assembly I could already see that there are not enough 128-bit
registers for 9x9 in some functions, and this will only get worse with
arrays. Dereferencing array elements potentially also add overhead. I am
going to give it a try, but do not expect results overnight, as this is a
low intensity hobby for me (note that Antoine's e-mail was in November
2006). Optimistically, a 3x speed reduction may be expected, but the
additional overhead and and switch-logic could easily push the drop to over
10x unless branching can be minimized.

If someone could look over the mockup and suggest ways to reduce the amount
of branching there I would really appreciate it. If someone would eventually
try to port something like this to a GPU, that would also benefit
tremendously from a reduction in branching. I find that I need to be very
conscious about branching during coding, as I have observed a tendency to
put ever more if, do and while statements in my code.

Also, does anybody know if gcc allows the same interface to the
SSE-instructions as I have adopted in bitmap.cpp?

Thanks,

René

On Mon, Aug 24, 2009 at 4:18 PM, Łukasz Lew lukasz@gmail.com wrote:

 2009/8/24 René van de Veerdonk rene.vandeveerd...@gmail.com:
  David,
  Confession: I have not tested 19x19. As you have noted, and others before
  you over the years, a 19x19 board does not fit in one but three 128-bit
  registers and there would be a rather big penalty as a result, perhaps
  (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
  disappointment about the speed advantage even on 9x9 in the e-mail to the
  list that served as my basis. My hope, as Hideko Kato points out, is in
  longer registers or perhaps being able to port this to a GPU. A 64-bit OS
  with twice the number registers would also surely help out. Nevertheless,
 I
  was surprised that I was able to get to almost the same level of speed as
  Libego provides.

 Libego provides 105-108 kpps on a single Core2 2.4 GHz on 9x9.
 and 22-23 kpps on 19x19

 Can you test your bitmap implementation on 19x19?

 Lukasz

  The goals for this mockup were very humble: (1) to provide a basic
  implementation to see what it can do (and if I could do it), and (2)
 learn
  how to work with assembly and/or intrinsic functions (I had never done
 that
  before). It may not be too hard to try 19x19 and just see what happens. I
  may attempt this as a follow-up.
  René van de Veerdonk
  2009/8/23 David Fotland fotl...@smart-games.com
 
  How much would you lose for 19x19 board?  A board representation is not
  very interesting unless it scales to 19 line boards.
 
 
 
  David
 
 
 
  From: computer-go-boun...@computer-go.org
  [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
  Veerdonk
  Sent: Sunday, August 23, 2009 1:11 PM
  To: computer-go@computer-go.org
  Subject: [computer-go] Bitmap Go revisited and mockup implementation
 
 
 
  Hi all,
 
 
 
  After years of reading this very interesting list, I decided to make my
  first contribution this week after reading once again about bitmap go.
 There
  is no freely available implementation of a bitmap based engine that I
 know
  of, so here is a start. Note that I am not a professional programmer,
  self-taught, and this is a part-time hobby, so do not expect perfect
 form.
 
 
 
  The attachment is an attempt to create an efficient implementation of a
  bitmap based light playout engine, following mostly

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread Michael Williams
For many people on this list, Computer Go is a hobby and that means that it is important to do whatever you find interesting and motivating, event if it may not 
be the best or most promising direction in order to become the next world champion program.  There is room for pushing limits in all directions, IMO.



Brian Sheppard wrote:
Maven was fast enough, but it was clearly the slowest 
of all major Scrabble programs. Some programs were 5x 
faster than Maven. Maven was slow because it was a 
lot more than a move generator: it played the full 
game with competent positional scoring. Scoring 
requires much more time than move generation. So why 
bother optimizing move generation?


I am finding that the same lesson applies to Go. But 
more so because Scrabble is largely first-order 
whereas Go is recursive. Pebbles averages ~10K trials 
per turn. Aya is clearly stronger at 10K trials than 
Pebbles. Fuego is stronger than Aya. Valkyria is 
stronger than Fuego, and my impression is that Zen 
trumps us all. Zen could be better than Pebbles + 400 
rating points at 10K trials.


Now look at scaling. When Fuego runs a machine that 
is ~20x as fast as mine, then Fuego is comparable to 
Zen. Valkyria plays in 19x19 tournaments running on a 
Pentium M, and scores 50% against much more powerful 
computers. My conclusion is that the analytic skills 
of Zen and Valkyria are worth a factor of 20, and I 
expect the advantage will increase with computer 
speeds.


The crucial issue is to support intelligence. Check 
out David’s description of MFoG, and Remi’s papers 
about CrazyStone. I am sure those programs pay 
attention to speed, but the real issue is whether the 
engine can do the necessary analysis. Magnus’s posts 
to this list lament that Valkyria is so slow, but 
Magnus is clearly excited that his engine supports 
great analysis.


So de-emphasize speed. Can you build a great pattern 
matcher using bit boards? Can you solve life  death 
positions better? Can you read ladders faster or more 
accurately? Do bit boards make the program easier to 
program and debug?


Pebbles started with a bit board move generator, but 
I have been adding other knowledge entities, like 
liberty counts, string iterators, and eyespace 
detectors. I am convinced that emphasizing domain 
knowledge structures will pay off.  (That being said, 
I will borrow your intrinsic functions for my 9x9 
template specialization. ☺ Every little bit helps.)


Best,
Brian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-24 Thread David Fotland
Getting similar speed to Libego is very impressive.  It’s important to start 
with a fast basic engine, since it will only get much slower as you add more 
knowledge.  Currently I only get about 19k playouts per second on a 2 core 
Core2Duo 2.3 GHz, on 9 line empty board.  I started at about 55k playouts per 
second on one core (for a UCT search).

 

The original one only beat gnugo 17% of the time, and the current one beats 
gnugo 93%, both with 5000 playouts, so in this case at least much slower is 
much stronger (testing on 9x9, against gnugo 3.7.10,

gnugo --mode gtp --chinese-rules --capture-all-dead --level 10 
--positional-superko).

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk
Sent: Monday, August 24, 2009 9:23 AM
To: computer-go
Subject: Re: [computer-go] Bitmap Go revisited and mockup implementation

 

David,

 

Confession: I have not tested 19x19. As you have noted, and others before you 
over the years, a 19x19 board does not fit in one but three 128-bit registers 
and there would be a rather big penalty as a result, perhaps (likely?) wiping 
out all of the benefits of bitmaps. Antoine voiced his disappointment about the 
speed advantage even on 9x9 in the e-mail to the list that served as my basis. 
My hope, as Hideko Kato points out, is in longer registers or perhaps being 
able to port this to a GPU. A 64-bit OS with twice the number registers would 
also surely help out. Nevertheless, I was surprised that I was able to get to 
almost the same level of speed as Libego provides.

 

The goals for this mockup were very humble: (1) to provide a basic 
implementation to see what it can do (and if I could do it), and (2) learn how 
to work with assembly and/or intrinsic functions (I had never done that 
before). It may not be too hard to try 19x19 and just see what happens. I may 
attempt this as a follow-up.

 

René van de Veerdonk

 

2009/8/23 David Fotland fotl...@smart-games.com

How much would you lose for 19x19 board?  A board representation is not very 
interesting unless it scales to 19 line boards.

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk
Sent: Sunday, August 23, 2009 1:11 PM
To: computer-go@computer-go.org
Subject: [computer-go] Bitmap Go revisited and mockup implementation

 

Hi all,

 

After years of reading this very interesting list, I decided to make my first 
contribution this week after reading once again about bitmap go. There is no 
freely available implementation of a bitmap based engine that I know of, so 
here is a start. Note that I am not a professional programmer, self-taught, and 
this is a part-time hobby, so do not expect perfect form.

 

The attachment is an attempt to create an efficient implementation of a bitmap 
based light playout engine, following mostly the recommendations as spelled out 
by Antoine de Maricourt in his November 2006 message to this list. The 
application is single threaded and achieves around 75-80 kpps on my Centrino 
Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++ using the Microsoft 
Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly suffers from 
portability issues as a result. The 128-bit SSE instructions are called through 
intrinsic functions, hopefully at least these interfaces are the same for other 
compilers. The single goal of this mockup was to do some benchmarking, so there 
is no gtp-interface, but that would be rather easy to add for those skilled in 
the art. The standard rules are used with one exception, the Brown eye-rule is 
used. Multi-stone suicide is explicitly forbidden. No superko checks are 
performed during playouts, and there is no mercy-rule either.

 

I would be particularly interested to know if a 64-bit OS will improve the 
performance significantly, as it does have a lot less register pressure. Moving 
to a 64-bit OS will also make some instructions available that allow for 
further simplifications/efficiency improvements in the code. As will enabling 
SSE3 or SSE4.1 instructions.

 

One note, the random number generator used is from Agner Fog's public library 
and was not included in the attachment. You will have to provide a random 
number generator yourself or download the appropriate version from his 
well-known website (www.agner.org/optimize)

 

I hope this is the start of a discussion about further improvements that can be 
made, but you may use this code as you wish with no restrictions or guarantees. 
See the readme.txt file included in the attachment for more details (also 
partially included below).

 

Comments and feedback welcome,

 

René van de Veerdonk

 

Excerpts from the readme.txt file:

 

Introduction:

=

 

Mockup of a high performance implementation of the game of Go using bitmaps

as the principle datastructures. This implementation is limited to measuring

RE: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-23 Thread David Fotland
How much would you lose for 19x19 board?  A board representation is not very
interesting unless it scales to 19 line boards.

 

David

 

From: computer-go-boun...@computer-go.org
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
Veerdonk
Sent: Sunday, August 23, 2009 1:11 PM
To: computer-go@computer-go.org
Subject: [computer-go] Bitmap Go revisited and mockup implementation

 

Hi all,

 

After years of reading this very interesting list, I decided to make my
first contribution this week after reading once again about bitmap go. There
is no freely available implementation of a bitmap based engine that I know
of, so here is a start. Note that I am not a professional programmer,
self-taught, and this is a part-time hobby, so do not expect perfect form.

 

The attachment is an attempt to create an efficient implementation of a
bitmap based light playout engine, following mostly the recommendations as
spelled out by Antoine de Maricourt in his November 2006 message to this
list. The application is single threaded and achieves around 75-80 kpps on
my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
undoubtedly suffers from portability issues as a result. The 128-bit SSE
instructions are called through intrinsic functions, hopefully at least
these interfaces are the same for other compilers. The single goal of this
mockup was to do some benchmarking, so there is no gtp-interface, but that
would be rather easy to add for those skilled in the art. The standard rules
are used with one exception, the Brown eye-rule is used. Multi-stone suicide
is explicitly forbidden. No superko checks are performed during playouts,
and there is no mercy-rule either.

 

I would be particularly interested to know if a 64-bit OS will improve the
performance significantly, as it does have a lot less register pressure.
Moving to a 64-bit OS will also make some instructions available that allow
for further simplifications/efficiency improvements in the code. As will
enabling SSE3 or SSE4.1 instructions.

 

One note, the random number generator used is from Agner Fog's public
library and was not included in the attachment. You will have to provide a
random number generator yourself or download the appropriate version from
his well-known website (www.agner.org/optimize)

 

I hope this is the start of a discussion about further improvements that can
be made, but you may use this code as you wish with no restrictions or
guarantees. See the readme.txt file included in the attachment for more
details (also partially included below).

 

Comments and feedback welcome,

 

René van de Veerdonk

 

Excerpts from the readme.txt file:

 

Introduction:

=

 

Mockup of a high performance implementation of the game of Go using bitmaps

as the principle datastructures. This implementation is limited to measuring

the performance of the principle components by playing random playouts from

an empty 9x9 board, using Chinese scoring, not allowing pass moves until

no other moves are available. Suicide is not allowed, neither for single

or multi-stone groups. No checks are implemented for superko, but simple

ko's are disallowed explicitly.

 

Some implementation details:



 

Benchmarking is done in test_board.cpp, where the number of playouts (n) and

repeats (j) for each test is set. Benchmarking is done starting from an
empty

9x9 board until two (three with ko) passes are played in sequence. The

benchmark playout follow the same scheme as method
board_t::play_random_game,

but with timing instrumentation added.

 

Clock counts (cc) are measured using the cc_time_of macro. Total run-time

measurement relies on the windows library.

 

For each move during a playout, the program goes through a sequence of

(1) identify all legal moves (~50cc)

this follows Brown's standard definition; this definition differs from

most Monte-Carlo engines and has a different set of pros and cons;

bitmaps allow all moves to be processed in parallel

(2) pick a uniformly random move from the available choices (~75cc)

relies on a modulus operation; is not 100% uniform to save speed

(3) playing it on the board (~170cc)

lengthiest method; but rather straightforward

 

Playouts end after 256 moves or whenever two passes are played in succession

(three in case of a ko). The routines for each of these basic steps are

contained in the file board.cpp. The underlying datastructures are 128-bit

bitmaps, for which the sse2 instructions used to manipulate them are
contained

in the file bitmap.cpp.

 

Further improvements can be expected from using any 64-bit OS that supports

several more assembly instructions, and from using sse4.1 instructions.

 

Example output (on a Intel 2.40 GHz Centrino Duo labtop):

=

 

[game] = 30469.7