date:20090910

Re: [computer-go] CUDA and GPU Performance

2009-09-10 Thread Christian Nentwich


Rene,

you're absolutely right, it's completely fishy! But don't worry, you're 
work is not in vain :) I noticed this morning, when I read your mail, 
that I had included the 9x9 results in my original mail instead of 
19x19! Indeed, for 19x19 the results are even worse. Here's a complete 
rundown:


- 9x9 CPU: 47,000 playouts per core per second
- 9x9 GPU: 170,000 playouts per second

- 19x19 CPU: 9,800 playouts per core per second
- 19x19 GPU: 11,000 playouts per second

I did mention in another mail that the performance difference for 9x9 
should be larger, I think. What I didn't realise was that I had reported 
the 9x9 numbers by mistake!


Additional statistics:
 - Processor occupancy for 19x19 was 6% instead of 9%
 - Branch divergence was less than half a percent. It was 2% for 9x9. 
This is perhaps because of the larger board size causing more moves to 
fall onto empty intersections, or fewer moves requiring merges/captures.


Christian



René van de Veerdonk wrote:

Christian,

Would you care to provide some more detail on your implementation for 
the playouts? Your results are very impressive. At 19x19 Go using 
bit-boards, your implementation is roughly 7x as fast as the bitboard 
implementation I presented just a few weeks back, and also outperforms 
libEgo by about a factor of two.


René

On Wed, Sep 9, 2009 at 2:57 PM, Christian Nentwich 
christ...@modeltwozero.com mailto:christ...@modeltwozero.com wrote:


Mark,

let me try to add some more context to answer your questions. When
I say in my conclusion that it's not worth it, I mean it's not
worth using the GPU to run playout algorithms of the sort that are
in use today. There may be many other algorithms that form part of
Go engines where the GPU can provide an order-of-magnitude
speedup. Still more where the GPU can run in parallel with the CPU
to help.

In my experiments, a CPU core got 47,000 playouts per second and
the GPU 170,000. But:
 - My computer has two cores (so it gets 94,000 playouts with 2
threads)
 - My computer's processor (intel core duo 6600) is 3 years old,
and far from state of the art
 - My graphics card (Geforce 285) on the other hand, is recently
purchased and one of the top graphics cards

That means that my old CPU already gets more than twice the speed
of the GPU. An Intel Nehalem processor would surely beat it, let
alone an 8-core system. Bearing in mind the severe drawbacks of
the GPU - these are not general purpose processors, there is much
you can't do on them - this limits their usefulness with this
algorithm. Compare this speedup to truly highly parallel
algorithms: random number generation, matrix multiplication,
monte-carlo simulation of options (which are highly parallel
because there is no branching and little data); you see speedups
of 10x to 100x over the CPU with those.

The 9% occupancy may be puzzling but there is little that can be
done about that. This, and the talk about threads and blocks would
take a while to explain, because GPUs don't work like general
purpose CPUs. They are SIMD processors meaning that each processor
can run many threads in parallel on different items of data but
only if *all threads are executing the same instruction*. There is
only one instruction decoding stage per processor cycle. If any
if statements or loops diverge, threads will be serialised until
they join again. The 9% occupancy is a function of the amount of
data needed to perform the task, and the branch divergence (caused
by the playouts being different). There is little that can be done
about it other than use a completely different algorithm.

If you google CUDA block threads you will find out more. In
short, the GPU runs like a grid cluster. In each block, 64 threads
run in parallel, conceptually. On the actual hardware, in each
processor 16 threads from one block will execute followed by 16
from another (half-warps). If any threads are blocked (memory
reads costs ~400 cycles!) then threads from another block are
scheduled instead. So the answer is: yes, there are 64 * 80
threads conceptually but they're not always scheduled at the same
time.

Comments on specific questions below.

If paralellism is what you're looking for, why not have one
thread per
move candidate? Use that to collect AMAF statistics. 16Kb is
not a lot
to work with, so the statistics may have to be shared.
 


One thread per move candidate is feasible with the architecture I
used, since every thread has its own board. I have not implemented
AMAF, so I cannot comment on the statistics bit, but the output
of your algorithm is typically not in the 16k shared memory
anyway. You'd write that to global memory (1GB). Would uniform
random playouts be good enough for this though?

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Christian Nentwich


Petr,

wow, I didn't expect to see so much experimentation being performed! 
It's great that you have taken the time to implement this approach, 
because this now shows people both alternatives for implementing a 
playout algorithm on the GPU.


I strongly suspect the low performance in the per-intersection case is 
down to two reasons - please let me know what you think:
 - A big proportion of moves place one stone on an empty intersection. 
In this case 80 of your 81 threads are asleep doing nothing. This means 
GPU time is wasted.
 - In quite a few other cases, we get to the old problem with Go: that 
the intersections are not independent. Thus when you process captures or 
merges, you possibly have to lock 80 of 81 threads.


You can't do much about the first. Did you find a solution for the 
second? I was thinking for a while about whether it would be possible to 
come up with a highly parallel algorithm for captures/merges that 
requires no lock. That's quite hard so I concentrated on my own 
experiment instead.


I agree with you that your approach could be very performant in pattern 
matching scenarios. In fact, your work somewhat strengthens my 
conclusion that doing just straight-forward playouts on the GPU is not 
the way to go.


Christian


Petr Baudis wrote:

  I have written a quick CUDA implementation of the per-intersection
GPGPU apprach; Christian's e-mail finally made me polish it up to
a somewhat presentable form.

  In this implementation, each GPU thread maintains a single
intersection.

  The implementation uses 9x9 board (10x11 internally); expanding to
19x19 would probably mean either moving some data to global memory
or splitting the playout of single board to multiple blocks or waiting
for GPUs with larger shared memory. ;-)

  The speed is unfortunately very disappointing; on GTX260, I see
17,300 playouts per second, with 50 playouts running in parallel.
There are several buts:

  - With some further obvious optimizations, I'm thinking it should be
possible to get it to at least 22,000 playouts per second easily.

  - Bit boards are an obvious thing to try, but I'm quite doubtful
they will be an improvement

  - The playouts are uniformly random now, but they have been
implemented in a way to make heavier playouts easy; precise liberty
count is maintained and it should be easy to add pattern matching;
the move selection can deal with arbitrary probabilities for
different intersections (I wonder, what are the standard
high-performance numbers for heavier playouts with pattern matching?)

  - Christian is playing 2650 games in parallel; I'm playing only 50
games in parallel, in the meantime the CPU can pick next games to
play from the tree

  - My code already accounts for transferring board images from CPU to
the GPU (different one for each playout) and getting scores back

  - Christian's GPU seems quite better than mine

  - Apparently I still suck at CUDA optimizations; this has been my
first real small CUDA project, it would be awesome if someone more
experienced could look at the code and suggest obvious improvements

  Still, I'm pretty unhappy with the slowness; I wonder how Christian
achieved such a high speed. One problem with my approach is that I have
to make use of a lot of atomic instrinsics (some of them could be worked
around) and __syncthreads() all the time to ensure that all threads have
consistent board image at each stage.

  I think with pattern matching, the per-intersection approach could
shine more, since I can easily match patterns everywhere on the board
at once. Still, I wonder if it will become at least on par with good
CPU implementations.

On Wed, Sep 09, 2009 at 04:54:23PM +0100, Christian Nentwich wrote:
  

In other words: no tree search is involved, and this is the lightest
possible playout. The raw numbers are as follows:
 - CPU Search: 47,000 playouts per CPU core per second, on an Intel
6600 Core-2 Duo
 - GPU Search: 170,000 playouts per second, on an NVidia Geforce 285 card



  I still find this quite astonishing; since you consider this
experiment a failure anyway, would you mind publishing the source code?
:-)

  



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



--

Christian Nentwich

Director, Model Two Zero Ltd.
+44-(0)7747-061302
http://www.modeltwozero.com

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Magnus Persson

This is very interesting. Here is a crazy idea, maybe it the same as  
Marks but I want to take it to its extreme.


Since AMAF values are so helpful, perhaps one can let go of the idea  
of sequential play following the rules of go, and basically play moves  
in parallel on all empty intersection. Compute new state (do captures)  
and repeat a fixed number of times and evaluate.


Since I do not understand GPU programming at all I cannot judge  
whether this would make things easier to implement.


Also a lot of weird thing will happens when you do things in parallel.  
What happens when two adjacent blocks are captured at the same time.  
Are both removed? Are there tricks to remove the one that was most  
likely to captured to begin with.


-Magnus

Quoting Christian Nentwich christ...@modeltwozero.com:


Petr,

wow, I didn't expect to see so much experimentation being performed!
It's great that you have taken the time to implement this approach,
because this now shows people both alternatives for implementing a
playout algorithm on the GPU.

I strongly suspect the low performance in the per-intersection case is
down to two reasons - please let me know what you think:
 - A big proportion of moves place one stone on an empty intersection.
In this case 80 of your 81 threads are asleep doing nothing. This means
GPU time is wasted.
 - In quite a few other cases, we get to the old problem with Go: that
the intersections are not independent. Thus when you process captures
or merges, you possibly have to lock 80 of 81 threads.

You can't do much about the first. Did you find a solution for the
second? I was thinking for a while about whether it would be possible
to come up with a highly parallel algorithm for captures/merges that
requires no lock. That's quite hard so I concentrated on my own
experiment instead.

I agree with you that your approach could be very performant in pattern
matching scenarios. In fact, your work somewhat strengthens my
conclusion that doing just straight-forward playouts on the GPU is not
the way to go.

Christian


Petr Baudis wrote:

 I have written a quick CUDA implementation of the per-intersection
GPGPU apprach; Christian's e-mail finally made me polish it up to
a somewhat presentable form.

 In this implementation, each GPU thread maintains a single
intersection.

 The implementation uses 9x9 board (10x11 internally); expanding to
19x19 would probably mean either moving some data to global memory
or splitting the playout of single board to multiple blocks or waiting
for GPUs with larger shared memory. ;-)

 The speed is unfortunately very disappointing; on GTX260, I see
17,300 playouts per second, with 50 playouts running in parallel.
There are several buts:

 - With some further obvious optimizations, I'm thinking it should be
   possible to get it to at least 22,000 playouts per second easily.

 - Bit boards are an obvious thing to try, but I'm quite doubtful
   they will be an improvement

 - The playouts are uniformly random now, but they have been
   implemented in a way to make heavier playouts easy; precise liberty
   count is maintained and it should be easy to add pattern matching;
   the move selection can deal with arbitrary probabilities for
   different intersections (I wonder, what are the standard
   high-performance numbers for heavier playouts with pattern matching?)

 - Christian is playing 2650 games in parallel; I'm playing only 50
   games in parallel, in the meantime the CPU can pick next games to
   play from the tree

 - My code already accounts for transferring board images from CPU to
   the GPU (different one for each playout) and getting scores back

 - Christian's GPU seems quite better than mine

 - Apparently I still suck at CUDA optimizations; this has been my
   first real small CUDA project, it would be awesome if someone more
   experienced could look at the code and suggest obvious improvements

 Still, I'm pretty unhappy with the slowness; I wonder how Christian
achieved such a high speed. One problem with my approach is that I have
to make use of a lot of atomic instrinsics (some of them could be worked
around) and __syncthreads() all the time to ensure that all threads have
consistent board image at each stage.

 I think with pattern matching, the per-intersection approach could
shine more, since I can easily match patterns everywhere on the board
at once. Still, I wonder if it will become at least on par with good
CPU implementations.

On Wed, Sep 09, 2009 at 04:54:23PM +0100, Christian Nentwich wrote:


In other words: no tree search is involved, and this is the lightest
possible playout. The raw numbers are as follows:
- CPU Search: 47,000 playouts per CPU core per second, on an Intel
6600 Core-2 Duo
- GPU Search: 170,000 playouts per second, on an NVidia Geforce 285 card



 I still find this quite astonishing; since you consider this
experiment a failure anyway, would you mind publishing the source code?
:-)

Re: [computer-go] CUDA and GPU Performance

2009-09-10 Thread Heikki Levanto


Interesting stuff. I don't have the skills nor the time to make such
experiments myself, but here is a simple idea:

When using a bitmap representation of the board, it is quite possible to find
all eye-like points with a constant number of bit-shifting operations. That
should reduce the number of branches.

It might even be possible to check liberties and remove captured stones with
constant number of loopings, at least for most reasonable board positions
(the number would have to be unreasonably large for pathological
positions with very long and narrow groups).

This still leaves a few places where the code will have to branch, most
notably choosing a random legal move. And the ending of the simulation, of
course.

If we solve all these, it should be possible to run simple playouts in
parallel, taking full advantage of the SIMD nature of the GPU. Of course,
each playout would be slower than necesary, but maybe that would even out
with many enough done in parallel. 

-H

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Petr Baudis

  Hi!

On Thu, Sep 10, 2009 at 12:26:06PM +0100, Christian Nentwich wrote:
 I strongly suspect the low performance in the per-intersection case
 is down to two reasons - please let me know what you think:
  - A big proportion of moves place one stone on an empty
 intersection. In this case 80 of your 81 threads are asleep doing
 nothing. This means GPU time is wasted.

  This is mostly true, though only through part of the loop body; I
think this is the biggest reason my approach is slower than your,
though, and it will be much worse on 19x19.

  It would be interesting to have N threads running independently and
getting contracted to various boards for massive tasks like capturing
and merging groups. But that would be quite complex task, not sure if
even practical to consider with the current state of the CUDA sandbox.

  - In quite a few other cases, we get to the old problem with Go:
 that the intersections are not independent. Thus when you process
 captures or merges, you possibly have to lock 80 of 81 threads.

  I do no locking, actually being able to do large captures and merges
very quickly was always my primary motivation in even entertaining this
way of parallelism.

  However, I do a lot of atomicAdd()s which have quite an overhead and
in some cases basically just serialize many threads.  Mainly:

  (i) When counting score. This is totally unoptimized and in O(N) now;
should be trivial to rewrite this as parallel summing in O(logN). This
is one of the easy optimizations I mentioned, in fact the most obvious
one.

  (ii) When recalculating liberties. All intersection threads adding
extra liberty to the same group will get serialized. Avoiding this
seems difficult.

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Petr Baudis

On Thu, Sep 10, 2009 at 01:54:49PM +0200, Magnus Persson wrote:
 This is very interesting. Here is a crazy idea, maybe it the same as
 Marks but I want to take it to its extreme.
 
 Since AMAF values are so helpful, perhaps one can let go of the idea
 of sequential play following the rules of go, and basically play
 moves in parallel on all empty intersection. Compute new state (do
 captures) and repeat a fixed number of times and evaluate.
 
 Since I do not understand GPU programming at all I cannot judge
 whether this would make things easier to implement.
 
 Also a lot of weird thing will happens when you do things in
 parallel. What happens when two adjacent blocks are captured at the
 same time. Are both removed? Are there tricks to remove the one that
 was most likely to captured to begin with.

Well, after one iteration, _all_ blocks should be captured since there
will be no empty intersections, right? (Even if you don't play in
eye-like spaces, you will have this situation until some of these
actually form.)

The easiest heuristic would seem to be to capture the smallest group,
but I have hard time imagining this working well, _and_ how to choose
the smallest group efficiently (other than that, modifying my source to
do this should be fairly easy).

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Magnus Persson


Quoting Petr Baudis pa...@ucw.cz:


On Thu, Sep 10, 2009 at 01:54:49PM +0200, Magnus Persson wrote:

This is very interesting. Here is a crazy idea, maybe it the same as
Marks but I want to take it to its extreme.

Since AMAF values are so helpful, perhaps one can let go of the idea
of sequential play following the rules of go, and basically play
moves in parallel on all empty intersection. Compute new state (do
captures) and repeat a fixed number of times and evaluate.

Since I do not understand GPU programming at all I cannot judge
whether this would make things easier to implement.

Also a lot of weird thing will happens when you do things in
parallel. What happens when two adjacent blocks are captured at the
same time. Are both removed? Are there tricks to remove the one that
was most likely to captured to begin with.


Well, after one iteration, _all_ blocks should be captured since there
will be no empty intersections, right? (Even if you don't play in
eye-like spaces, you will have this situation until some of these
actually form.)

The easiest heuristic would seem to be to capture the smallest group,
but I have hard time imagining this working well, _and_ how to choose
the smallest group efficiently (other than that, modifying my source to
do this should be fairly easy).

--
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/





--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Magnus Persson


Sorry for my empty reply to Petr, I misclicked

When a block is identified you could also sum up what the original  
strength of the original stones covered by the block and use that as a  
heuristic for what should be captured first.


Also to avoid filling all liberties. One can avoid that with pattern  
matching when the original position is copied to the GPU. For example  
if black has an empty triangle, black will rather pass than play that  
point. Thus either white plays there or it remains empty. Basically it  
mean we can impose heavy pruning on the original position. And then  
copy many copies of that position and evaluate them massively in  
parallel.


-Magnus


Quoting Petr Baudis pa...@ucw.cz:


On Thu, Sep 10, 2009 at 01:54:49PM +0200, Magnus Persson wrote:

This is very interesting. Here is a crazy idea, maybe it the same as
Marks but I want to take it to its extreme.

Since AMAF values are so helpful, perhaps one can let go of the idea
of sequential play following the rules of go, and basically play
moves in parallel on all empty intersection. Compute new state (do
captures) and repeat a fixed number of times and evaluate.

Since I do not understand GPU programming at all I cannot judge
whether this would make things easier to implement.

Also a lot of weird thing will happens when you do things in
parallel. What happens when two adjacent blocks are captured at the
same time. Are both removed? Are there tricks to remove the one that
was most likely to captured to begin with.


Well, after one iteration, _all_ blocks should be captured since there
will be no empty intersections, right? (Even if you don't play in
eye-like spaces, you will have this situation until some of these
actually form.)

The easiest heuristic would seem to be to capture the smallest group,
but I have hard time imagining this working well, _and_ how to choose
the smallest group efficiently (other than that, modifying my source to
do this should be fairly easy).

--
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/





--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Petr Baudis

On Thu, Sep 10, 2009 at 08:29:31AM -0400, Jason House wrote:
 I've thought of something similar in the past, but with a twist:
 pre-compute a subset of moves that could be safely played in
 parallel. Even if you can only play 285 moves in parallel on an
 empty 19x19, it could still be a huge speed boost.

Hmm, do you have some ideas how this pre-computation could be done in
less than O(N)?

I'm gonna think about this now during some travelling... ;-)

Petr Pasky Baudis
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Isaac Deutsch


maybe some divide  conquer algorithm?


Am 10.09.2009 um 14:43 schrieb Petr Baudis:


On Thu, Sep 10, 2009 at 08:29:31AM -0400, Jason House wrote:

I've thought of something similar in the past, but with a twist:
pre-compute a subset of moves that could be safely played in
parallel. Even if you can only play 285 moves in parallel on an
empty 19x19, it could still be a huge speed boost.


Hmm, do you have some ideas how this pre-computation could be done in
less than O(N)?

I'm gonna think about this now during some travelling... ;-)

Petr Pasky Baudis
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread steve uurtamo

 Since AMAF values are so helpful, perhaps one can let go of the idea of
 sequential play following the rules of go, and basically play moves in
 parallel on all empty intersection. Compute new state (do captures) and
 repeat a fixed number of times and evaluate.

two thoughts:

i) how do you determine color to play for each thread?

ii) if i) becomes unsynchronized with the rules of go, you're
basically exploring random boards instead of boards that are related
to the game that you're interested in.  the board should rapidly
converge to something that retains no information about the initial
state of the board (assuming that the initial board has stones on it).

s.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Jason House

Somewhat... One could generate a random number (r) and combine it with  
the move mask (m) as follows:

black moves = m  r
white moves = m  ~r

This has the drawback that the number of black and white moves may not  
be equal. It can be modified to give an equal number of moves such as  
requiring that each subset of n intersections generates the same  
number of black and white moves, or doing some kind of looping until  
the condition is met.


For looping, you could keep some subset of the generated moves in  
order to guarantee convergence. For example, if there are 14 more  
white moves than black moves, keep all generated moves except for 14  
white moves. Then you only have to pick 14 moves in the next loop  
iteration.


Sent from my iPhone

On Sep 10, 2009, at 8:43 AM, Petr Baudis pa...@ucw.cz wrote:


On Thu, Sep 10, 2009 at 08:29:31AM -0400, Jason House wrote:

I've thought of something similar in the past, but with a twist:
pre-compute a subset of moves that could be safely played in
parallel. Even if you can only play 285 moves in parallel on an
empty 19x19, it could still be a huge speed boost.


Hmm, do you have some ideas how this pre-computation could be done in
less than O(N)?

I'm gonna think about this now during some travelling... ;-)

   Petr Pasky Baudis
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

2009-09-10 Thread Magnus Persson


Quoting steve uurtamo uurt...@gmail.com:


Since AMAF values are so helpful, perhaps one can let go of the idea of
sequential play following the rules of go, and basically play moves in
parallel on all empty intersection. Compute new state (do captures) and
repeat a fixed number of times and evaluate.


two thoughts:

i) how do you determine color to play for each thread?


Random. Only one random bit necessary for each decision.



ii) if i) becomes unsynchronized with the rules of go, you're
basically exploring random boards instead of boards that are related
to the game that you're interested in.  the board should rapidly
converge to something that retains no information about the initial
state of the board (assuming that the initial board has stones on it).


Usual lightweight random games are also random creating situation that  
never happen in normal play. But violation of rules will probably make  
it more biased than a usual light playout. Hard to tell before trying.  
My feeling is that it might work surprisingly well but it might get  
extremely biased in some situations, which perhaps can be fixed by  
prefiltering of the move that can be played.


Magnus

--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] CGOS Engine does NOT use time control commands

2009-09-10 Thread Isaac Deutsch

I always see this message when booting up CGOS. However, my program  
supports time_left and I also state this in list_commands. Time  
control would be much better if I got feedback from the server. How  
can I use it? Is there a special command I'm missing?


Thanks,
Isaac
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CGOS Engine does NOT use time control commands

2009-09-10 Thread Go Fast

I think it needs *time_settings*

On Thu, Sep 10, 2009 at 12:37 PM, Isaac Deutsch i...@gmx.ch wrote:

 I always see this message when booting up CGOS. However, my program
 supports time_left and I also state this in list_commands. Time control
 would be much better if I got feedback from the server. How can I use it? Is
 there a special command I'm missing?

 Thanks,
 Isaac
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CGOS Engine does NOT use time control commands

2009-09-10 Thread Isaac Deutsch


thanks, it works!


Am 10.09.2009 um 18:49 schrieb Go Fast:


I think it needs time_settings

On Thu, Sep 10, 2009 at 12:37 PM, Isaac Deutsch i...@gmx.ch wrote:
I always see this message when booting up CGOS. However, my program  
supports time_left and I also state this in list_commands. Time  
control would be much better if I got feedback from the server. How  
can I use it? Is there a special command I'm missing?


Thanks,
Isaac
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA and GPU Performance

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA and GPU Performance

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

Re: [computer-go] CUDA implementation of the per-intersection GPGPU approach

[computer-go] CGOS Engine does NOT use time control commands

Re: [computer-go] CGOS Engine does NOT use time control commands

Re: [computer-go] CGOS Engine does NOT use time control commands

16 matches

Site Navigation

Mail list logo

Footer information