Dear Joe, please keep requests such as this to the mailing list. Thanks. I've cc'd them on my reply.
Joe Haywood <[email protected]> writes: > I was hoping to pick your brain a little more. After rewriting my > original Python/Cuda/C++ program to Python/PyOpenCL I have done some > speed comparisons. I cannot get the pyopencl version to run as fast > as the original. I have an NVidia GT 430 for testing. Running the > original code, the program takes ~10 seconds to complete. Running the > opencl version takes ~24 seconds to complete (not including build > time). Both programs produce the same results, within the uncertainty > I expect from a Monte Carlo code. > > The differences between the two are, the CUDA code uses the CURAND > libray for random numbers, whereas the OPENCL code uses ranlux from > pyopencl-ranlux.cl. The CUDA code is compiled as a callable library > using nvcc with optimizations like -O3 -fast-math -mtune=native etc > and called in Python using the weave library. The Opencl kernel is > compiled using the -cl-mad-enable -cl-unsafe-math etc. compile > options. In the CUDA code I have rewritten some of the functions to > use the faster math like > "theta=__fmul_rn(__fsqrt_rn(__fmul_rn(fac,dl)),__fdividef(__fmul_rn(A,PI),180.0f));" Ranluxcl supports a 'luxury' setting that influences the speed of the generator. This knob trades off speed against quality of random numbers. > I have tried moving enqueue_copy commands around, not reinitializing > the ranlux generator, etc but I cannot speed up the opencl version > anymore. Is there something I am missing in Pyopencl that would help > with this? Have you tried measuring (using OpenCL event-based profiling) what is actually taking time? Andreas
pgpgMFW_0AhQb.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
