Hello Imran,

(sorry, forgot to add maillist to CC)

Thank you for prompt reply, results from 5870 are interesting too. If
you have pyopencl installed, just run test_performance.py from
pyfft_test folder, located in pyfft package. It will print the results
in stdout.

Best regards,
Bogdan.

On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <iha...@stanford.edu> wrote:
> Hi Bogdan,
>
> I have access to a Radeon 5870, but it's installed in a slow host machine
> (2.8GHz dual core Pentium 4). If this is still useful, I could run a test
> for you if you can send along a quick test case.
>
> Cheers,
>
> Imran
>
> Bogdan Opanchuk wrote:
>>
>> By the way, if it is not too much to ask: if anybody has access to ATI
>> 59** series card and/or GTX 295 - could you please run performance
>> tests from the module (pyfft_test/test_performance.py) and post the
>> results here? I suspect that the poor performance in case of OpenCL
>> can be (partially) caused by nVidia drivers.
>>
>> Thank you in advance.
>>
>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com>
>> wrote:
>>
>>>
>>> Hello all,
>>>
>>> I fixed some bugs in my pycudafft module and added PyOpenCL support,
>>> so it is called just pyfft now (and it sort of resolves the question
>>> about including it to PyCuda distribution).
>>>
>>> At the moment, the most annoying (me, at least)  things are:
>>> 1. OpenCL performance tests show up to 6 times slower speed as
>>> compared to Cuda. Unfortunately, I still can't find the reason.
>>> (The interesting thing is that PyOpenCL is still noticeably faster
>>> than original Apple's C program with the same FFT algorithm).
>>> 2. I tried to support different ways of using plans, including
>>> precreated contexts, streams/queues and asynchronous execution. This
>>> resulted in quite messy interface. Any suggestions about making it
>>> more clear are welcome.
>>> 3. Currently, the only criterion for kernel's block sizes is maximum
>>> allowed by the number of used registers. Resulting occupancy in Cuda
>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
>>> kernels with different block sizes in order to find maximum occupancy,
>>> this makes kernels even slower.
>>>
>>> Best regards,
>>> Bogdan
>>>
>>>
>>
>> _______________________________________________
>> PyCUDA mailing list
>> pyc...@host304.hostmonster.com
>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>>
>

_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to