By the way, if it is not too much to ask: if anybody has access to ATI 59** series card and/or GTX 295 - could you please run performance tests from the module (pyfft_test/test_performance.py) and post the results here? I suspect that the poor performance in case of OpenCL can be (partially) caused by nVidia drivers.
Thank you in advance. On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com> wrote: > Hello all, > > I fixed some bugs in my pycudafft module and added PyOpenCL support, > so it is called just pyfft now (and it sort of resolves the question > about including it to PyCuda distribution). > > At the moment, the most annoying (me, at least) things are: > 1. OpenCL performance tests show up to 6 times slower speed as > compared to Cuda. Unfortunately, I still can't find the reason. > (The interesting thing is that PyOpenCL is still noticeably faster > than original Apple's C program with the same FFT algorithm). > 2. I tried to support different ways of using plans, including > precreated contexts, streams/queues and asynchronous execution. This > resulted in quite messy interface. Any suggestions about making it > more clear are welcome. > 3. Currently, the only criterion for kernel's block sizes is maximum > allowed by the number of used registers. Resulting occupancy in Cuda > kernels is 0.25 - 0.33 most of the time. But when I try to recompile > kernels with different block sizes in order to find maximum occupancy, > this makes kernels even slower. > > Best regards, > Bogdan > _______________________________________________ PyCUDA mailing list pyc...@host304.hostmonster.com http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net