Hello Imran, kernel.py requires patching too: - from .kernel_helpers import * + from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo, getPadding, getSharedMemorySize
I hope this will be enough. Sorry for the inconvenience, I'm going to commit it in the repository. I need to add some version check too, because there will definitely be other bugs on Python 2.4, which is still used by some Linux distros ) Best regards, Bogdan On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk <manti...@gmail.com> wrote: > Hello Imran, > > I tested it only on 2.6, so it can be the case. Thanks for the bug > report though, this sort of compatibility is easy to add. Can you > please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel, > X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line? > > Best regards, > Bogdan > > On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <iha...@stanford.edu> wrote: >> Didn't work - does it require newer than Python 2.5? >> >> $ python test_performance.py >> Running performance tests... >> Traceback (most recent call last): >> File "test_performance.py", line 57, in <module> >> run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE) >> File "test_performance.py", line 52, in run >> testPerformance(ctx, shape, buffer_size) >> File "test_performance.py", line 22, in testPerformance >> plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True) >> File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in getPlan >> import pyfft.cl >> File "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/cl.py", >> line 9, in <module> >> from .plan import FFTPlan >> File "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/plan.py", >> line 3 >> from .kernel import * >> SyntaxError: 'import *' not allowed with 'from .' >> >> >> Bogdan Opanchuk wrote: >>> >>> Hello Imran, >>> >>> (sorry, forgot to add maillist to CC) >>> >>> Thank you for prompt reply, results from 5870 are interesting too. If >>> you have pyopencl installed, just run test_performance.py from >>> pyfft_test folder, located in pyfft package. It will print the results >>> in stdout. >>> >>> Best regards, >>> Bogdan. >>> >>> On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <iha...@stanford.edu> wrote: >>> >>>> >>>> Hi Bogdan, >>>> >>>> I have access to a Radeon 5870, but it's installed in a slow host machine >>>> (2.8GHz dual core Pentium 4). If this is still useful, I could run a test >>>> for you if you can send along a quick test case. >>>> >>>> Cheers, >>>> >>>> Imran >>>> >>>> Bogdan Opanchuk wrote: >>>> >>>>> >>>>> By the way, if it is not too much to ask: if anybody has access to ATI >>>>> 59** series card and/or GTX 295 - could you please run performance >>>>> tests from the module (pyfft_test/test_performance.py) and post the >>>>> results here? I suspect that the poor performance in case of OpenCL >>>>> can be (partially) caused by nVidia drivers. >>>>> >>>>> Thank you in advance. >>>>> >>>>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>>> >>>>>> Hello all, >>>>>> >>>>>> I fixed some bugs in my pycudafft module and added PyOpenCL support, >>>>>> so it is called just pyfft now (and it sort of resolves the question >>>>>> about including it to PyCuda distribution). >>>>>> >>>>>> At the moment, the most annoying (me, at least) things are: >>>>>> 1. OpenCL performance tests show up to 6 times slower speed as >>>>>> compared to Cuda. Unfortunately, I still can't find the reason. >>>>>> (The interesting thing is that PyOpenCL is still noticeably faster >>>>>> than original Apple's C program with the same FFT algorithm). >>>>>> 2. I tried to support different ways of using plans, including >>>>>> precreated contexts, streams/queues and asynchronous execution. This >>>>>> resulted in quite messy interface. Any suggestions about making it >>>>>> more clear are welcome. >>>>>> 3. Currently, the only criterion for kernel's block sizes is maximum >>>>>> allowed by the number of used registers. Resulting occupancy in Cuda >>>>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile >>>>>> kernels with different block sizes in order to find maximum occupancy, >>>>>> this makes kernels even slower. >>>>>> >>>>>> Best regards, >>>>>> Bogdan >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> PyCUDA mailing list >>>>> pyc...@host304.hostmonster.com >>>>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net >>>>> >>>>> >> > _______________________________________________ PyCUDA mailing list pyc...@host304.hostmonster.com http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net