Re: [PyCUDA] cuMemAlloc failed: out of memory

Jayanth Channagiri Mon, 09 Dec 2013 00:08:45 -0800

Hello Bogdan

Thank you very much for some interesting ideas. 
The fact that you can run 8192 x 8192 on your C2050 clearly suggests that it 
was the limitation by my Quadro 2000. 
I had a look on Reikna and it is indeed helpful.


And Ahmed, 
I realised that creating a 2d array and making it into two separate sequentual 
1D FFTs, one horizontal and the other vertical, does not yield the same result. 
Clearly 1D FFT and 2D FFT are different.
They have done the same in http://wiki.tiker.net/PyCuda/Examples/2DFFT . It is 
not 2D FFT but 1D FFT for each row and then reshaping it back to 2D. The result 
is not 2DFFT.



For my problem, I need to find FFT in 3D for an array of the range 1024 * 4096 
*4096 using parallel computing by PyCUDA. 
Is it necessary to write a kernel in C while writing the program or I can 
proceed the way I had sent in the previous mail? With my program, I can readily 
see 10x speedup compared to numpy fft. But my GPU is unable to handle huge 
data. 
It will be really helpful if anyone can suggest any documentation/blogs/videos 
etc regarding it.
Thank you all.
Have a good day

Jayanth
 




> Date: Fri, 6 Dec 2013 16:47:17 +1100
> Subject: Re: [PyCUDA] cuMemAlloc failed: out of memory
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
> 
> Hi Jayanth,
> 
> I can run a 8192x8192 transform on a Tesla C2050 without problems. I
> think you are limited by the available video memory, see my previous
> message in this thread --- a 8192x4096 buffer takes 250Mb, and you
> have to factor in the temporary buffers PyFFT creates.
> 
> By the way, I would recommend you to switch from PyFFT to Reikna
> (http://reikna.publicfields.net). PyFFT is not supported anymore, and
> Reikna includes its code along with some additional features and
> optimizations (more robust block/grid size finder, temporary array
> management, launch optimizations and so on). Your code would look
> like:
> 
> import numpy
> import reikna.cluda as cluda
> from reikna.fft import FFT
> 
> api = cluda.cuda_api()
> thr = api.Thread.create()
> 
> # Or, if you want to use an external stream,
> #
> # cuda.init()
> # context = make_default_context()
> # stream = cuda.Stream()
> # thr = api.Thread(stream)
> 
> data = numpy.ones((4096, 4096), dtype = numpy.complex64)
> gpu_data = thr.to_device(data) #converting to gpu array
> 
> fft = FFT(data).compile(thr)
> fft(gpu_data, gpu_data)
> result = gpu_data.get()
> 
> print result
> 
> 
> On Fri, Dec 6, 2013 at 3:43 PM, Jayanth Channagiri
> <[email protected]> wrote:
> > Dear Ahmed
> >
> > Thank you for the resourceful reply.
> >
> > But the CUFFT limit is ~2^27 and also in the benchmarks on the CUFFT reach
> > upto 2^25. In my case, I am able to reach only upto 2^24. In some way, I am
> > missing another factor. Is this limited by my GPU's memory?
> > And also, in the same table, you can see for "Maximum width and height for a
> > 2D texture reference bound to a CUDA array " is 65000*65000 which is way too
> > high compared to mine. My GPU has a computing capacity of 2.x.
> > Thank you for the idea of performing two separate sequentual 1D FFTs. I will
> > shed more light on it. The thing is my problem doesn't stop at 2D. My goal
> > is to perform 3D FFT and I am not sure if I can calculate that way.
> >
> >
> > For others in the list, here I am sending the complete traceback of the
> > error message.
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "/usr/lib/python2.7/dist-
> > packages/spyderlib/widgets/externalshell/sitecustomize.py", line 493, in
> > runfile
> >     execfile(filename, namespace)
> >   File "/home/jayanth/Dropbox/fft/fft1d_AB.py", line 99, in <module>
> >     plan.execute(gpu_data)
> >   File
> > "/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py",
> > line 271, in _executeInterleaved
> >     batch, data_in, data_out)
> >   File
> > "/usr/local/lib/python2.7/dist-packages/pyfft-0.3.8-py2.7.egg/pyfft/plan.py",
> > line 192, in _execute
> >     self._tempmemobj = self._context.allocate(buffer_size * 2)
> >
> > pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
> >
> > Also, here is the simple program to which I was addressing to calculate FFT
> > using pyfft :
> > from pyfft.cuda import Plan
> > import numpy
> > import pycuda.driver as cuda
> > from pycuda.tools import make_default_context
> > import pycuda.gpuarray as gpuarray
> >
> > cuda.init()
> > context = make_default_context()
> > stream = cuda.Stream()
> >
> > plan = Plan((4096, 4096), stream=stream) #creating the plan
> > data = numpy.ones((4096, 4096), dtype = numpy.complex64) #My data with just
> > ones to calculate the fft for single precision
> > gpu_data = gpuarray.to_gpu(data) #converting to gpu array
> > plan.execute(gpu_data)#calculating pyfft
> > result = gpu_data.get() #the result
> >
> > This is just a simple program to calculate the FFT for an array of 4096 *
> > 4096 in 2d. It works well for this array or a smaller array. As soon after I
> > increase it to the higher values like 8192*8192 or 8192*4096 or anything, it
> > gives an error message saying out of memory.
> > So I wanted to know the reason behind it and how to overcome.
> > You can execute the same code and kindly let me know if you have the same
> > limits in your respective GPUs.
> >
> > Thank you
> >
> >
> >
> > ________________________________
> > Date: Thu, 5 Dec 2013 20:27:45 -0500
> > Subject: Re: [PyCUDA] cuMemAlloc failed: out of memory
> > From: [email protected]
> > To: [email protected]
> > CC: [email protected]
> >
> >
> > I ran into a similar issue:
> > http://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda
> >
> > The long and short of it is that CUFFT seems to have a limit of
> > approximately 2^27 elements that it can operate on, in any combination of
> > dimensions. In the StackOverflow post above, I was trying to make a plan for
> > large batches of the same 1D FFTs and hit this limitation. You'll also
> > notice that the benchmarks on the CUFFT site
> > https://developer.nvidia.com/cuFFT go up to sizes of 2^25.
> >
> > I hypothesize that this is related to the 2^27 "Maximum width for a 1D
> > texture reference bound to linear memory" limit that we see in Table 12 of
> > the CUDA C Programming Guide
> > http://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities.
> >
> > So since 4096**2 is 2^24, increasing to 8096 by 8096 gets very close to the
> > limit, even though you'd think 2D FFTs would not be governed by the same
> > limits as 1D FFT batches.
> >
> > You should be able to achieve 8096 by 8096 and larger 2D FFTs by performing
> > two separate sequentual 1D FFTs, one horizontal and the other vertical. The
> > runtimes should nominally be the same (they are for CPU FFTs), and the
> > answer will be the same, up to machine precision.
> >
> >
> > On Thu, Dec 5, 2013 at 9:53 AM, Jayanth Channagiri <[email protected]>
> > wrote:
> >
> > Hello
> >
> > I have a NVIDIA 2000 GPU. It has 192 CUDA cores and 1 Gb memory.
> >  GB GDDR5
> >
> > I am trying to calculate fft by GPU using pyfft.
> > I am able to calculate the fft only upto the array with maximum of 4096 x
> > 4096.
> >
> > But as soon after I increase the array size, it gives an error message
> > saying:
> > pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
> >
> > Can anyone please tell me if this error means that my GPU is not sufficient
> > to calculate this array? Or is it my computer's memory? Or a programming
> > error? What is the maximum array size you can achieve with GPU?
> > Is there any information of how else can I calculate the huge arrays?
> >
> > Thank you very much in advance for the help and sorry if it is too
> > preliminary question.
> >
> > Jayanth
> >
> >
> >
> >
> >
> > _______________________________________________
> > PyCUDA mailing list
> > [email protected]
> > http://lists.tiker.net/listinfo/pycuda
> >
> >
> >
> > _______________________________________________
> > PyCUDA mailing list
> > [email protected]
> > http://lists.tiker.net/listinfo/pycuda
> >

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] cuMemAlloc failed: out of memory

Reply via email to