Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Pascal Lamblin Mon, 10 Jul 2017 18:24:12 -0700

On Monday, July 10, 2017 at 2:42:39 AM UTC-4, Fabian Stemmer wrote:
>
> Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior for 
> the new backend as for the old.
>
> Do I understand correctly, that leaving preallocate at default behavior 
> (new backend) will not result in higher memory consumption, but merely 
> doesn't free memory once allocated, so what I see in nvidia-smi is 
> max-memory consumption up to this point?
>


Not really, it can actually result in higher memory consumption due to the 
way new memory blocks are allocated. For instance, in the worse case, if a 
tensor of 1 MB gets allocated and deallocated, then a 2 MB tensor, a new 2 
MB block will be added to the pool, however it will not be mergeable with 
the first one, and if it gets freed, a 3 MB tensor cannot be "split" 
between the first blocks. Due to that fragmentation effect, allocating / 
deallocating 1 MB, then 2 MB, 3 MB, etc., will end up using 1 + 2 + 3 + ... 
MB total on the GPU. 
 

> A related question: When I run with profile=True,profile_memory=True - 
> shouldn't the max GPU memory stat in the profiling correspond to what I see 
> in nvidia-smi when I run with preallocate on default?
>

Again, not really, due to that fragmentation effect.
 

> Currently, I see ~400MB GPU memory usage in profiling and that's what I 
> see with preallocate=-1 too (although I can't guarantuee there aren't 
> higher spikes that I don't see with nvidia-smi). When I leave preallocate 
> at default, I see GPU memory usage ~2GB (but the profiling still reports 
> only 400MB).
>

Preallocating 400 or 500 MB may avoid fragmentation and bring the total 
consumption peak closer to what is actually allocated to arrays.
 

>
> Thanks
> Fabian
>
> On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote:
>>
>> The equivalent to the old back-end setting for memory is: 
>> gpuarray.preallocate=-1.
>>
>> The new back-end by default will cache all call to cudaMalloc() to speed 
>> up computation. This flag will disable this cache. THis is the same default 
>> as the old back-end.
>>
>> On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected]> 
>> wrote:
>>
>>> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2 and 
>>> gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev.
>>> For most experiments (including those in the log files) I did not use 
>>> preallocation, because the only way I could see the difference in memory 
>>> usage was through nvidia-smi, which only shows the static pre-allocation 
>>> when it is used.
>>> I believe the problem does not disappear with pre-allocation, since I 
>>> see my training crash for much smaller models with the new backend even 
>>> then. However, I cannot measure the effect of switching backends on GPU 
>>> memory when I use preallocation.
>>>
>>>
>>> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote:
>>>
>>>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried the 
>>>> preallocation, how did you use it?
>>>>
>>>> Is is mostly equivalent to lib.cnmem. But our default is different and 
>>>> by default give more speed up, but sometimes can cause memory 
>>>> fragmentation. the flag above fix the new fragmentation that can happen by 
>>>> default.
>>>>
>>>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]> 
>>>> wrote:
>>>>
>>> One addition:
>>>>> The theano 0.9.0 setup used libgpuarray v0.6.2.
>>>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated to 
>>>>> v0.6.7 and tested again, but I still get ~2GB memory usage.
>>>>>
>>>>>
>>>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I recently tried to switch my CNN implementation to the new theano 
>>>>>> GPU backend. To do so, I switched from "device=gpu" to "device=cuda" 
>>>>>> with 
>>>>>> theano9 and libgpuarray installed. My theano code then works with the 
>>>>>> new 
>>>>>> backend without any further changes.
>>>>>>
>>>>>> However, when I do this, I see my GPU memory consumption increase 
>>>>>> drastically. When I use theano memory profiling both GPU backends show 
>>>>>> the 
>>>>>> same memory consumption, but when I use nvidia-smi to monitor memory 
>>>>>> usage 
>>>>>> while the job is running, the old backend hovers somewhere around 400MB, 
>>>>>> while the new backend uses 2GB for the same model size and data. When I 
>>>>>> try 
>>>>>> to train larger models, the new GPU backend fails with memory errors for 
>>>>>> much smaller models than the old backend. This is also true when I 
>>>>>> activate 
>>>>>> memory pre-allocation.
>>>>>>
>>>>>> I tried to remove parts of my model or exclude certain theano 
>>>>>> optimizations (e.g. exclude conv_dnn to force theano to use a different 
>>>>>> convolution algorithm) but nothing I changed in the model structure had 
>>>>>> an 
>>>>>> impact on the discrepancy I see in memory usage.
>>>>>>
>>>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old 
>>>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 
>>>>>> releases. 
>>>>>> For the new backend I tested the 0.9.0 release as well as a recent 
>>>>>> github 
>>>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both showed 
>>>>>> the same memory increase.
>>>>>>
>>>>>> I attached log files including my models computational graph and 
>>>>>> information on libraries, environment variables, etc. Please let me know 
>>>>>> if 
>>>>>> I can supply any additional information to make it easier to look into 
>>>>>> this. I tried to prepare a simple sample script to reproduce the 
>>>>>> behavior, 
>>>>>> but was so far unable to do so.
>>>>>>
>>>>>> Thanks
>>>>>> Fabian
>>>>>>
>>>>> -- 
>>>>>
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "theano-users" group.
>>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>
>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Reply via email to