On Thu, 19 Jan 2012 12:30:54 -0700, Steve Spicklemire <[email protected]> wrote: > opencl/cuda Newbie here.. trying to use pyopencl/pycuda to learn my way > around (use python a lot!) I have examples of what I've been trying to do to > get familiar with the software. I'm trying to do an MC calculation of pi > using the ReductionKernel. Here's what I've found: > > <http://spvi.com/files/pyopencl-monte-carlo> > > <http://spvi.com/files/pyopencl-mc-profile> > > <http://spvi.com/files/pycuda-monte-carlo> > > <http://spvi.com/files/pycuda-mc-profile> > > I'm running on a macbook pro with GeForce GT 330M graphics. > > I must be missing something basic. Both of these approaches are very > slow.
I.e. 10**8 samples in 15s, that's 6M samples/s. What's your reference value? Also note that clrandom has a 'luxury' value that can be turned down to get random numbers faster. Further, it might be good to know what part is slow. Python profiles are unfortunately unhelpful, as the GPU runs asynchronously and only blocks on the outbound data transfer (that's clearly visible in the CL profile, PyCUDA seems a bit more complicated). Use cl.enqueue_marker with a profiling-enabled command queue to figure out what is actually taking the time, the reduction or the RNG. HTH, Andreas
pgp2KLF6eM9xX.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
