Hello Imran, (sorry, forgot to add maillist to CC)
Thank you for prompt reply, results from 5870 are interesting too. If you have pyopencl installed, just run test_performance.py from pyfft_test folder, located in pyfft package. It will print the results in stdout. Best regards, Bogdan. On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <iha...@stanford.edu> wrote: > Hi Bogdan, > > I have access to a Radeon 5870, but it's installed in a slow host machine > (2.8GHz dual core Pentium 4). If this is still useful, I could run a test > for you if you can send along a quick test case. > > Cheers, > > Imran > > Bogdan Opanchuk wrote: >> >> By the way, if it is not too much to ask: if anybody has access to ATI >> 59** series card and/or GTX 295 - could you please run performance >> tests from the module (pyfft_test/test_performance.py) and post the >> results here? I suspect that the poor performance in case of OpenCL >> can be (partially) caused by nVidia drivers. >> >> Thank you in advance. >> >> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com> >> wrote: >> >>> >>> Hello all, >>> >>> I fixed some bugs in my pycudafft module and added PyOpenCL support, >>> so it is called just pyfft now (and it sort of resolves the question >>> about including it to PyCuda distribution). >>> >>> At the moment, the most annoying (me, at least) things are: >>> 1. OpenCL performance tests show up to 6 times slower speed as >>> compared to Cuda. Unfortunately, I still can't find the reason. >>> (The interesting thing is that PyOpenCL is still noticeably faster >>> than original Apple's C program with the same FFT algorithm). >>> 2. I tried to support different ways of using plans, including >>> precreated contexts, streams/queues and asynchronous execution. This >>> resulted in quite messy interface. Any suggestions about making it >>> more clear are welcome. >>> 3. Currently, the only criterion for kernel's block sizes is maximum >>> allowed by the number of used registers. Resulting occupancy in Cuda >>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile >>> kernels with different block sizes in order to find maximum occupancy, >>> this makes kernels even slower. >>> >>> Best regards, >>> Bogdan >>> >>> >> >> _______________________________________________ >> PyCUDA mailing list >> pyc...@host304.hostmonster.com >> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net >> > _______________________________________________ PyCUDA mailing list pyc...@host304.hostmonster.com http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net