Hello Imran,

kernel.py requires patching too:
- from .kernel_helpers import *
+ from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo,
getPadding, getSharedMemorySize

I hope this will be enough. Sorry for the inconvenience, I'm going to
commit it in the repository. I need to add some version check too,
because there will definitely be other bugs on Python 2.4, which is
still used by some Linux distros )

Best regards,
Bogdan

On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk <manti...@gmail.com> wrote:
> Hello Imran,
>
> I tested it only on 2.6, so it can be the case. Thanks for the bug
> report though, this sort of compatibility is easy to add. Can you
> please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel,
> X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line?
>
> Best regards,
> Bogdan
>
> On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <iha...@stanford.edu> wrote:
>> Didn't work - does it require newer than Python 2.5?
>>
>> $ python test_performance.py
>> Running performance tests...
>> Traceback (most recent call last):
>>  File "test_performance.py", line 57, in <module>
>>   run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE)
>>  File "test_performance.py", line 52, in run
>>   testPerformance(ctx, shape, buffer_size)
>>  File "test_performance.py", line 22, in testPerformance
>>   plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True)
>>  File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in getPlan
>>   import pyfft.cl
>>  File "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/cl.py",
>> line 9, in <module>
>>   from .plan import FFTPlan
>>  File "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/plan.py",
>> line 3
>>   from .kernel import *
>> SyntaxError: 'import *' not allowed with 'from .'
>>
>>
>> Bogdan Opanchuk wrote:
>>>
>>> Hello Imran,
>>>
>>> (sorry, forgot to add maillist to CC)
>>>
>>> Thank you for prompt reply, results from 5870 are interesting too. If
>>> you have pyopencl installed, just run test_performance.py from
>>> pyfft_test folder, located in pyfft package. It will print the results
>>> in stdout.
>>>
>>> Best regards,
>>> Bogdan.
>>>
>>> On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <iha...@stanford.edu> wrote:
>>>
>>>>
>>>> Hi Bogdan,
>>>>
>>>> I have access to a Radeon 5870, but it's installed in a slow host machine
>>>> (2.8GHz dual core Pentium 4). If this is still useful, I could run a test
>>>> for you if you can send along a quick test case.
>>>>
>>>> Cheers,
>>>>
>>>> Imran
>>>>
>>>> Bogdan Opanchuk wrote:
>>>>
>>>>>
>>>>> By the way, if it is not too much to ask: if anybody has access to ATI
>>>>> 59** series card and/or GTX 295 - could you please run performance
>>>>> tests from the module (pyfft_test/test_performance.py) and post the
>>>>> results here? I suspect that the poor performance in case of OpenCL
>>>>> can be (partially) caused by nVidia drivers.
>>>>>
>>>>> Thank you in advance.
>>>>>
>>>>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <manti...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I fixed some bugs in my pycudafft module and added PyOpenCL support,
>>>>>> so it is called just pyfft now (and it sort of resolves the question
>>>>>> about including it to PyCuda distribution).
>>>>>>
>>>>>> At the moment, the most annoying (me, at least)  things are:
>>>>>> 1. OpenCL performance tests show up to 6 times slower speed as
>>>>>> compared to Cuda. Unfortunately, I still can't find the reason.
>>>>>> (The interesting thing is that PyOpenCL is still noticeably faster
>>>>>> than original Apple's C program with the same FFT algorithm).
>>>>>> 2. I tried to support different ways of using plans, including
>>>>>> precreated contexts, streams/queues and asynchronous execution. This
>>>>>> resulted in quite messy interface. Any suggestions about making it
>>>>>> more clear are welcome.
>>>>>> 3. Currently, the only criterion for kernel's block sizes is maximum
>>>>>> allowed by the number of used registers. Resulting occupancy in Cuda
>>>>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
>>>>>> kernels with different block sizes in order to find maximum occupancy,
>>>>>> this makes kernels even slower.
>>>>>>
>>>>>> Best regards,
>>>>>> Bogdan
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> PyCUDA mailing list
>>>>> pyc...@host304.hostmonster.com
>>>>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>>>>>
>>>>>
>>
>

_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to