Hello, The project I am working on relies heavily on batched 3D FFTs. You all know about the situation with CUFFT and PyCuda, and I decided that I must put some effort in it. So, I ported Apple's OpenCL implementation of FFT to PyCuda. The result you can see on http://pypi.python.org/pypi/pycudafft . It is currently in beta stage, but I will work on it - in case somebody needs it. It works with experimental PyCuda branch, the one with complex numbers support.
In addition, the package contains CUFFT wrapper by Ying Wai (Daniel) Fan (it appeared in this mail list, I just added class for plan and batch support), if you prefer nVidia's implementation. I used it just to test my code. Main problems at the moment: - On some problem sizes it is much slower than CUFFT (see table on PyPi page). - The library requires heavy testing for different problem sizes and videocards For other plans, see TODO.txt in package. Known issue: see my letter to this maillist, http://www.mail-archive.com/[email protected]/msg00952.html . Due to this 1D 2048-element transform gives incorrect results. To sum it all up: the development of this library depends mainly on your reaction. Any comments/bug reports/propositions are appreciated. For my personal purposes it works fine already ) _______________________________________________ PyCUDA mailing list [email protected] http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
