Dear Ahmed
Thank you for the resourceful reply.
But the CUFFT limit is ~2^27 and also in the benchmarks on the CUFFT reach upto
2^25. In my case, I am able to reach only upto 2^24. In some way, I am missing
another factor. Is this limited by my GPU's memory?
And also, in the same table, you can see for "Maximum width and height for a 2D
texture reference bound to a CUDA
array
" is 65000*65000 which is way too high compared
to mine. My GPU has a computing capacity of 2.x.
Thank you for the idea of performing two separate sequentual 1D FFTs. I will
shed more light on it. The thing is my problem doesn't stop at 2D. My goal is
to perform 3D FFT and I am not sure if I can calculate that way.
For others in the list, here I am sending the complete traceback of the error
message.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-
packages/spyderlib/widgets/externalshell/sitecustomize.py", line 493, in runfile
execfile(filename, namespace)
File "/home/jayanth/Dropbox/fft/fft1d_AB.py", line 99, in <module>
plan.execute(gpu_data)
File
"/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py",
line 271, in _executeInterleaved
batch, data_in, data_out)
File
"/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py",
line 192, in _execute
self._tempmemobj = self._context.allocate(buffer_size * 2)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
Also, here is the simple program to which I was addressing to calculate FFT
using pyfft :
from pyfft.cuda import Plan
import numpy
import pycuda.driver as cuda
from pycuda.tools import make_default_context
import pycuda.gpuarray as gpuarray
cuda.init()
context = make_default_context()
stream = cuda.Stream()
plan = Plan((4096, 4096), stream=stream) #creating the plan
data = numpy.ones((4096, 4096), dtype = numpy.complex64) #My data with just
ones to calculate the fft for single precision
gpu_data = gpuarray.to_gpu(data) #converting to gpu array
plan.execute(gpu_data)#calculating pyfft
result = gpu_data.get() #the result
This is just a simple program to calculate the FFT for an array of 4096 *
4096 in 2d. It works well for this array or a smaller array. As soon
after I increase it to the higher values like 8192*8192 or 8192*4096 or
anything, it gives an error message saying
out of memory.
So I wanted to know the reason behind it and how to overcome.
You can execute the same code and kindly let me know if you have the same
limits in your respective GPUs.
Thank you
Date: Thu, 5 Dec 2013 20:27:45 -0500
Subject: Re: [PyCUDA] cuMemAlloc failed: out of memory
From: [email protected]
To: [email protected]
CC: [email protected]
I ran into a similar issue:
http://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda
The long and short of it is that CUFFT seems to have a limit of approximately
2^27 elements that it can operate on, in any combination of dimensions. In the
StackOverflow post above, I was trying to make a plan for large batches of the
same 1D FFTs and hit this limitation. You'll also notice that the benchmarks on
the CUFFT site https://developer.nvidia.com/cuFFT go up to sizes of 2^25.
I hypothesize that this is related to the 2^27 "Maximum width for a 1D texture
reference bound to linear memory" limit that we see in Table 12 of the CUDA C
Programming Guide
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities.
So since 4096**2 is 2^24, increasing to 8096 by 8096 gets very close to the
limit, even though you'd think 2D FFTs would not be governed by the same limits
as 1D FFT batches.
You should be able to achieve 8096 by 8096 and larger 2D FFTs by performing two
separate sequentual 1D FFTs, one horizontal and the other vertical. The
runtimes should nominally be the same (they are for CPU FFTs), and the answer
will be the same, up to machine precision.
On Thu, Dec 5, 2013 at 9:53 AM, Jayanth Channagiri <[email protected]>
wrote:
Hello
I have a NVIDIA 2000 GPU. It has 192 CUDA cores and 1 Gb memory. GB GDDR5
I am trying to calculate fft by GPU using pyfft.
I am able to calculate the fft only upto the array with maximum of 4096 x 4096.
But as soon after I increase the array size, it gives an error message saying:
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
Can anyone please tell me if this error means that my GPU is not sufficient to
calculate this array? Or is it my computer's memory? Or a programming error?
What is the maximum array size you can achieve with GPU?
Is there any information of how else can I calculate the huge arrays?
Thank you very much in advance for the help and sorry if it is too preliminary
question.
Jayanth
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda