Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Anton Murashov Tue, 29 Aug 2017 17:31:58 -0700

Hello all!

I have a very similar problem with new gpuarray backend, ) it has following 
undesired behaviour:


(a) with preallocation turned ON (any value above and including zero) it 
crashes with cuMemAlloc error (OutOfMemory) on problem of my size (smaller 
problems work)
(b) with preallocation turned ON and if small problem is being fitted - 
interrupting the kernel and restarting results in cuMemAlloc error 
(OutOfMemory)
(b) with preallocation turned OFF (preallocation=-1) it does not even start 
fitting with cuMemAlloc error (invalid argument!!! NOT OutOfMemory!!!!)

GpuArrayException: ('The following error happened while compiling the 
node', forall_inplace,gpu,grad_of_scan_fn}(TensorConstant{1000}, 
GpuSubtensor{int64:int64:int64}.0, GpuElemwise{Composite{(i0 - 
sqr(i1))}}[]<gpuarray>.0, GpuElemwise{tanh,no_inplace}.0, 
InplaceGpuDimShuffle{0,2,1}.0, GpuAlloc<None>{memset_0=True}.0, 
GpuSubtensor{int64:int64:int64}.0, GpuSubtensor{int64:int64:int64}.0, 
GpuSubtensor{int64:int64:int64}.0, GpuAlloc<None>{memset_0=True}.0, 
GpuAlloc<None>{memset_0=True}.0, GpuAlloc<None>{memset_0=True}.0, 
TensorConstant{1000}, GpuSubtensor{::, int64:int64:}.0, 
InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, :int64:}.0, GpuSubtensor{::, 
int64::}.0, InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, int64:int64:}.0, 
InplaceGpuDimShuffle{1,0}.0, InplaceGpuDimShuffle{1,0}.0, 
GpuAlloc<None>{memset_0=True}.0), '\n', 'cuMemAlloc: 
CUDA_ERROR_INVALID_VALUE: invalid argument')

Needless to say, on old backend all works fine, just 20% slower (on 
problems which actually start fitting on both backends). I use versions 
currently supplied with Anaconda (theano-0.9, libgpuarray 0.6.9, pygpu 
0.6.9)

On Tuesday, July 11, 2017 at 3:23:44 AM UTC+2, Pascal Lamblin wrote:
>
> On Monday, July 10, 2017 at 2:42:39 AM UTC-4, Fabian Stemmer wrote:
>>
>> Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior for 
>> the new backend as for the old.
>>
>> Do I understand correctly, that leaving preallocate at default behavior 
>> (new backend) will not result in higher memory consumption, but merely 
>> doesn't free memory once allocated, so what I see in nvidia-smi is 
>> max-memory consumption up to this point?
>>
>
> Not really, it can actually result in higher memory consumption due to the 
> way new memory blocks are allocated. For instance, in the worse case, if a 
> tensor of 1 MB gets allocated and deallocated, then a 2 MB tensor, a new 2 
> MB block will be added to the pool, however it will not be mergeable with 
> the first one, and if it gets freed, a 3 MB tensor cannot be "split" 
> between the first blocks. Due to that fragmentation effect, allocating / 
> deallocating 1 MB, then 2 MB, 3 MB, etc., will end up using 1 + 2 + 3 + ... 
> MB total on the GPU. 
>  
>
>> A related question: When I run with profile=True,profile_memory=True - 
>> shouldn't the max GPU memory stat in the profiling correspond to what I see 
>> in nvidia-smi when I run with preallocate on default?
>>
>
> Again, not really, due to that fragmentation effect.
>  
>
>> Currently, I see ~400MB GPU memory usage in profiling and that's what I 
>> see with preallocate=-1 too (although I can't guarantuee there aren't 
>> higher spikes that I don't see with nvidia-smi). When I leave preallocate 
>> at default, I see GPU memory usage ~2GB (but the profiling still reports 
>> only 400MB).
>>
>
> Preallocating 400 or 500 MB may avoid fragmentation and bring the total 
> consumption peak closer to what is actually allocated to arrays.
>  
>
>>
>> Thanks
>> Fabian
>>
>> On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote:
>>>
>>> The equivalent to the old back-end setting for memory is: 
>>> gpuarray.preallocate=-1.
>>>
>>> The new back-end by default will cache all call to cudaMalloc() to speed 
>>> up computation. This flag will disable this cache. THis is the same default 
>>> as the old back-end.
>>>
>>> On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected]> 
>>> wrote:
>>>
>>>> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2 and 
>>>> gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev.
>>>> For most experiments (including those in the log files) I did not use 
>>>> preallocation, because the only way I could see the difference in memory 
>>>> usage was through nvidia-smi, which only shows the static pre-allocation 
>>>> when it is used.
>>>> I believe the problem does not disappear with pre-allocation, since I 
>>>> see my training crash for much smaller models with the new backend even 
>>>> then. However, I cannot measure the effect of switching backends on GPU 
>>>> memory when I use preallocation.
>>>>
>>>>
>>>> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote:
>>>>
>>>>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried the 
>>>>> preallocation, how did you use it?
>>>>>
>>>>> Is is mostly equivalent to lib.cnmem. But our default is different and 
>>>>> by default give more speed up, but sometimes can cause memory 
>>>>> fragmentation. the flag above fix the new fragmentation that can happen 
>>>>> by 
>>>>> default.
>>>>>
>>>>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]> 
>>>>> wrote:
>>>>>
>>>> One addition:
>>>>>> The theano 0.9.0 setup used libgpuarray v0.6.2.
>>>>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated to 
>>>>>> v0.6.7 and tested again, but I still get ~2GB memory usage.
>>>>>>
>>>>>>
>>>>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I recently tried to switch my CNN implementation to the new theano 
>>>>>>> GPU backend. To do so, I switched from "device=gpu" to "device=cuda" 
>>>>>>> with 
>>>>>>> theano9 and libgpuarray installed. My theano code then works with the 
>>>>>>> new 
>>>>>>> backend without any further changes.
>>>>>>>
>>>>>>> However, when I do this, I see my GPU memory consumption increase 
>>>>>>> drastically. When I use theano memory profiling both GPU backends show 
>>>>>>> the 
>>>>>>> same memory consumption, but when I use nvidia-smi to monitor memory 
>>>>>>> usage 
>>>>>>> while the job is running, the old backend hovers somewhere around 
>>>>>>> 400MB, 
>>>>>>> while the new backend uses 2GB for the same model size and data. When I 
>>>>>>> try 
>>>>>>> to train larger models, the new GPU backend fails with memory errors 
>>>>>>> for 
>>>>>>> much smaller models than the old backend. This is also true when I 
>>>>>>> activate 
>>>>>>> memory pre-allocation.
>>>>>>>
>>>>>>> I tried to remove parts of my model or exclude certain theano 
>>>>>>> optimizations (e.g. exclude conv_dnn to force theano to use a different 
>>>>>>> convolution algorithm) but nothing I changed in the model structure had 
>>>>>>> an 
>>>>>>> impact on the discrepancy I see in memory usage.
>>>>>>>
>>>>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old 
>>>>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 
>>>>>>> releases. 
>>>>>>> For the new backend I tested the 0.9.0 release as well as a recent 
>>>>>>> github 
>>>>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both 
>>>>>>> showed 
>>>>>>> the same memory increase.
>>>>>>>
>>>>>>> I attached log files including my models computational graph and 
>>>>>>> information on libraries, environment variables, etc. Please let me 
>>>>>>> know if 
>>>>>>> I can supply any additional information to make it easier to look into 
>>>>>>> this. I tried to prepare a simple sample script to reproduce the 
>>>>>>> behavior, 
>>>>>>> but was so far unable to do so.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Fabian
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "theano-users" group.
>>>>>>
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected].
>>>>>
>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Reply via email to