By the way, if it is not too much to ask: if anybody has access to ATI
59** series card and/or GTX 295 - could you please run performance
tests from the module (pyfft_test/test_performance.py) and post the
results here? I suspect that the poor performance in case of OpenCL
can be (partially) caused by nVidia drivers.

Thank you in advance.

On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com> wrote:
> Hello all,
>
> I fixed some bugs in my pycudafft module and added PyOpenCL support,
> so it is called just pyfft now (and it sort of resolves the question
> about including it to PyCuda distribution).
>
> At the moment, the most annoying (me, at least)  things are:
> 1. OpenCL performance tests show up to 6 times slower speed as
> compared to Cuda. Unfortunately, I still can't find the reason.
> (The interesting thing is that PyOpenCL is still noticeably faster
> than original Apple's C program with the same FFT algorithm).
> 2. I tried to support different ways of using plans, including
> precreated contexts, streams/queues and asynchronous execution. This
> resulted in quite messy interface. Any suggestions about making it
> more clear are welcome.
> 3. Currently, the only criterion for kernel's block sizes is maximum
> allowed by the number of used registers. Resulting occupancy in Cuda
> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
> kernels with different block sizes in order to find maximum occupancy,
> this makes kernels even slower.
>
> Best regards,
> Bogdan
>

_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to