Hello,

The project I am working on relies heavily on batched 3D FFTs. You all
know about the situation with CUFFT and PyCuda, and I decided that I
must put some effort in it. So, I ported Apple's OpenCL implementation
of FFT to PyCuda. The result you can see on
http://pypi.python.org/pypi/pycudafft . It is currently in beta stage,
but I will work on it - in case somebody needs it. It works with
experimental PyCuda branch, the one with complex numbers support.

In addition, the package contains CUFFT wrapper by Ying Wai (Daniel)
Fan (it appeared in this mail list, I just added class for plan and
batch support), if you prefer nVidia's implementation. I used it just
to test my code.

Main problems at the moment:
- On some problem sizes it is much slower than CUFFT (see table on PyPi page).
- The library requires heavy testing for different problem sizes and videocards
For other plans, see TODO.txt in package.

Known issue: see my letter to this maillist,
http://www.mail-archive.com/[email protected]/msg00952.html . Due to
this 1D 2048-element transform gives incorrect results.

To sum it all up: the development of this library depends mainly on
your reaction. Any comments/bug reports/propositions are appreciated.
For my personal purposes it works fine already )

_______________________________________________
PyCUDA mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to