Dear Joe,

please keep requests such as this to the mailing list. Thanks. I've cc'd
them on my reply.

Joe Haywood <[email protected]> writes:
> I was hoping to pick your brain a little more.  After rewriting my
> original Python/Cuda/C++ program to Python/PyOpenCL I have done some
> speed comparisons.  I cannot get the pyopencl version to run as fast
> as the original.  I have an NVidia GT 430 for testing.  Running the
> original code, the program takes ~10 seconds to complete.  Running the
> opencl version takes ~24 seconds to complete (not including build
> time).  Both programs produce the same results, within the uncertainty
> I expect from a Monte Carlo code.
>
> The differences between the two are, the CUDA code uses the CURAND
> libray for random numbers, whereas the OPENCL code uses ranlux from
> pyopencl-ranlux.cl.  The CUDA code is compiled as a callable library
> using nvcc with optimizations like -O3 -fast-math -mtune=native etc
> and called in Python using the weave library.  The Opencl kernel is
> compiled using the -cl-mad-enable -cl-unsafe-math etc. compile
> options.  In the CUDA code I have rewritten some of the functions to
> use the faster math like
> "theta=__fmul_rn(__fsqrt_rn(__fmul_rn(fac,dl)),__fdividef(__fmul_rn(A,PI),180.0f));"

Ranluxcl supports a 'luxury' setting that influences the speed of the
generator. This knob trades off speed against quality of random numbers.

> I have tried moving enqueue_copy commands around, not reinitializing
> the ranlux generator, etc but I cannot speed up the opencl version
> anymore. Is there something I am missing in Pyopencl that would help
> with this?

Have you tried measuring (using OpenCL event-based profiling) what is
actually taking time?

Andreas

Attachment: pgpgMFW_0AhQb.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to