Dear Ahmed

Thank you for the resourceful reply.

But the CUFFT limit is ~2^27 and also in the benchmarks on the CUFFT reach upto 
2^25. In my case, I am able to reach only upto 2^24. In some way, I am missing 
another factor. Is this limited by my GPU's memory?
And also, in the same table, you can see for "Maximum width and height for a 2D 
texture reference bound to a CUDA
                                 array 
                              " is 65000*65000 which is way too high compared 
to mine. My GPU has a computing capacity of 2.x.
Thank you for the idea of performing two separate sequentual 1D FFTs. I will 
shed more light on it. The thing is my problem doesn't stop at 2D. My goal is 
to perform 3D FFT and I am not sure if I can calculate that way. 


For others in the list, here I am sending the complete traceback of the error 
message.


Traceback (most recent call last):


  File "<stdin>", line 1, in <module>


  File "/usr/lib/python2.7/dist-
packages/spyderlib/widgets/externalshell/sitecustomize.py", line 493, in runfile


    execfile(filename, namespace)


  File "/home/jayanth/Dropbox/fft/fft1d_AB.py", line 99, in <module>


    plan.execute(gpu_data)


  File 
"/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py", 
line 271, in _executeInterleaved


    batch, data_in, data_out)


  File 
"/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py", 
line 192, in _execute


    self._tempmemobj = self._context.allocate(buffer_size * 2)


pycuda._driver.MemoryError: cuMemAlloc failed: out of memory



Also, here is the simple program to which I was addressing to calculate FFT 
using pyfft : 


from pyfft.cuda import Plan
import numpy
import pycuda.driver as cuda
from pycuda.tools import make_default_context
import pycuda.gpuarray as gpuarray




cuda.init()
context = make_default_context()
stream = cuda.Stream()




plan = Plan((4096, 4096), stream=stream) #creating the plan
data = numpy.ones((4096, 4096), dtype = numpy.complex64) #My data with just 
ones to calculate the fft for single precision
gpu_data = gpuarray.to_gpu(data) #converting to gpu array
plan.execute(gpu_data)#calculating pyfft 
result = gpu_data.get() #the result



This is just a simple program to calculate the FFT for an array of 4096 *
 4096 in 2d. It works well for this array or a smaller array. As soon 
after I increase it to the higher values like 8192*8192 or 8192*4096 or 
anything, it gives an error message saying
out of memory.


So I wanted to know the reason behind it and how to overcome.
You can execute the same code and kindly let me know if you have the same 
limits in your respective GPUs.

Thank you



Date: Thu, 5 Dec 2013 20:27:45 -0500
Subject: Re: [PyCUDA] cuMemAlloc failed: out of memory
From: [email protected]
To: [email protected]
CC: [email protected]

I ran into a similar issue: 
http://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda


The long and short of it is that CUFFT seems to have a limit of approximately 
2^27 elements that it can operate on, in any combination of dimensions. In the 
StackOverflow post above, I was trying to make a plan for large batches of the 
same 1D FFTs and hit this limitation. You'll also notice that the benchmarks on 
the CUFFT site https://developer.nvidia.com/cuFFT go up to sizes of 2^25. 


I hypothesize that this is related to the 2^27 "Maximum width for a 1D texture 
reference bound to linear memory" limit that we see in Table 12 of the CUDA C 
Programming Guide 
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities.


So since 4096**2 is 2^24, increasing to 8096 by 8096 gets very close to the 
limit, even though you'd think 2D FFTs would not be governed by the same limits 
as 1D FFT batches.


You should be able to achieve 8096 by 8096 and larger 2D FFTs by performing two 
separate sequentual 1D FFTs, one horizontal and the other vertical. The 
runtimes should nominally be the same (they are for CPU FFTs), and the answer 
will be the same, up to machine precision.


On Thu, Dec 5, 2013 at 9:53 AM, Jayanth Channagiri <[email protected]> 
wrote:
















Hello 

I have a NVIDIA 2000 GPU. It has 192 CUDA cores and 1 Gb memory. GB GDDR5 
I am trying to calculate fft by GPU using pyfft.
I am able to calculate the fft only upto the array with maximum of 4096 x 4096.


But as soon after I increase the array size, it gives an error message saying:
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

Can anyone please tell me if this error means that my GPU is not sufficient to 
calculate this array? Or is it my computer's memory? Or a programming error? 
What is the maximum array size you can achieve with GPU?

Is there any information of how else can I calculate the huge arrays?

Thank you very much in advance for the help and sorry if it is too preliminary 
question.

Jayanth









                                          

_______________________________________________

PyCUDA mailing list

[email protected]

http://lists.tiker.net/listinfo/pycuda



                                          
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to