Re: [PyOpenCL] Trouble understanding/applying ReductionKernel

Andreas Kloeckner Thu, 19 Jan 2012 12:27:56 -0800

On Thu, 19 Jan 2012 12:30:54 -0700, Steve Spicklemire <[email protected]> wrote:
> opencl/cuda Newbie here.. trying to use pyopencl/pycuda to learn my way 
> around (use python a lot!) I have examples of what I've been trying to do to 
> get familiar with the software. I'm trying to do an MC calculation of pi 
> using the ReductionKernel. Here's what I've found:
> 
> <http://spvi.com/files/pyopencl-monte-carlo>
> 
> <http://spvi.com/files/pyopencl-mc-profile>
> 
> <http://spvi.com/files/pycuda-monte-carlo>
> 
> <http://spvi.com/files/pycuda-mc-profile>
> 
> I'm running on a macbook pro with GeForce GT 330M graphics.
> 
> I must be missing something basic. Both of these approaches are very
> slow.


I.e. 10**8 samples in 15s, that's 6M samples/s. What's your reference
value? Also note that clrandom has a 'luxury' value that can be turned
down to get random numbers faster. Further, it might be good to know
what part is slow. Python profiles are unfortunately unhelpful, as the
GPU runs asynchronously and only blocks on the outbound data transfer
(that's clearly visible in the CL profile, PyCUDA seems a bit more
complicated).

Use cl.enqueue_marker with a profiling-enabled command queue to figure
out what is actually taking the time, the reduction or the RNG.

HTH,
Andreas

pgp2KLF6eM9xX.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Re: [PyOpenCL] Trouble understanding/applying ReductionKernel

Reply via email to