Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Frédéric Bastien Wed, 30 Aug 2017 08:52:38 -0700

What is the name of the flag you used? The name changed with the new
back-end.


Make sure to use the github version. Not a tagged version.

Frédéric

On Wed, Aug 30, 2017 at 11:20 AM Anton Murashov <[email protected]> wrote:

> Actually, initially I tried theano-0.10-dev-0b1 or smth like this, which
> appears to be the most recent dev version, which I later re-installed to be
> theano-0.9 which is part of Anaconda package.
>
> As per preallocate flag I tried following options:
>
> (a) 1 and 0 (big problems crash with OutOfMem, some problems work
> initially but crash with OutOfMem if fit is restarted after kernel
> interrupt).
>
> (b) -1 (model.fit crashes on problem of any size (even which work in (a)
> initially) with invalid argument error in cuMemAlloc) --> this one appears
> to be an outright bug.
>
> Should I open github ticket?
>
> On 30 Aug 2017 5:59 pm, "Frédéric Bastien" <[email protected]>
> wrote:
>
>> Update to Theano dev version. There is many updates that could help you.
>>
>> If that don't fix your problem, open an issue on github.
>>
>> For preallocation, which flag to do you use?
>>
>> On Tue, Aug 29, 2017 at 8:30 PM Anton Murashov <[email protected]> wrote:
>>
>>> Hello all!
>>>
>>> I have a very similar problem with new gpuarray backend, ) it has
>>> following undesired behaviour:
>>>
>>> (a) with preallocation turned ON (any value above and including zero) it
>>> crashes with cuMemAlloc error (OutOfMemory) on problem of my size (smaller
>>> problems work)
>>> (b) with preallocation turned ON and if small problem is being fitted -
>>> interrupting the kernel and restarting results in cuMemAlloc error
>>> (OutOfMemory)
>>> (b) with preallocation turned OFF (preallocation=-1) it does not even
>>> start fitting with cuMemAlloc error (invalid argument!!! NOT
>>> OutOfMemory!!!!)
>>>
>>> GpuArrayException: ('The following error happened while compiling the
>>> node', forall_inplace,gpu,grad_of_scan_fn}(TensorConstant{1000},
>>> GpuSubtensor{int64:int64:int64}.0, GpuElemwise{Composite{(i0 -
>>> sqr(i1))}}[]<gpuarray>.0, GpuElemwise{tanh,no_inplace}.0,
>>> InplaceGpuDimShuffle{0,2,1}.0, GpuAlloc<None>{memset_0=True}.0,
>>> GpuSubtensor{int64:int64:int64}.0, GpuSubtensor{int64:int64:int64}.0,
>>> GpuSubtensor{int64:int64:int64}.0, GpuAlloc<None>{memset_0=True}.0,
>>> GpuAlloc<None>{memset_0=True}.0, GpuAlloc<None>{memset_0=True}.0,
>>> TensorConstant{1000}, GpuSubtensor{::, int64:int64:}.0,
>>> InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, :int64:}.0, GpuSubtensor{::,
>>> int64::}.0, InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, int64:int64:}.0,
>>> InplaceGpuDimShuffle{1,0}.0, InplaceGpuDimShuffle{1,0}.0,
>>> GpuAlloc<None>{memset_0=True}.0), '\n', 'cuMemAlloc:
>>> CUDA_ERROR_INVALID_VALUE: invalid argument')
>>>
>>> Needless to say, on old backend all works fine, just 20% slower (on
>>> problems which actually start fitting on both backends). I use versions
>>> currently supplied with Anaconda (theano-0.9, libgpuarray 0.6.9, pygpu
>>> 0.6.9)
>>>
>>> On Tuesday, July 11, 2017 at 3:23:44 AM UTC+2, Pascal Lamblin wrote:
>>>>
>>>> On Monday, July 10, 2017 at 2:42:39 AM UTC-4, Fabian Stemmer wrote:
>>>>>
>>>>> Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior
>>>>> for the new backend as for the old.
>>>>>
>>>>> Do I understand correctly, that leaving preallocate at default
>>>>> behavior (new backend) will not result in higher memory consumption, but
>>>>> merely doesn't free memory once allocated, so what I see in nvidia-smi is
>>>>> max-memory consumption up to this point?
>>>>>
>>>>
>>>> Not really, it can actually result in higher memory consumption due to
>>>> the way new memory blocks are allocated. For instance, in the worse case,
>>>> if a tensor of 1 MB gets allocated and deallocated, then a 2 MB tensor, a
>>>> new 2 MB block will be added to the pool, however it will not be mergeable
>>>> with the first one, and if it gets freed, a 3 MB tensor cannot be "split"
>>>> between the first blocks. Due to that fragmentation effect, allocating /
>>>> deallocating 1 MB, then 2 MB, 3 MB, etc., will end up using 1 + 2 + 3 + ...
>>>> MB total on the GPU.
>>>>
>>>>
>>>>> A related question: When I run with profile=True,profile_memory=True -
>>>>> shouldn't the max GPU memory stat in the profiling correspond to what I 
>>>>> see
>>>>> in nvidia-smi when I run with preallocate on default?
>>>>>
>>>>
>>>> Again, not really, due to that fragmentation effect.
>>>>
>>>>
>>>>> Currently, I see ~400MB GPU memory usage in profiling and that's what
>>>>> I see with preallocate=-1 too (although I can't guarantuee there aren't
>>>>> higher spikes that I don't see with nvidia-smi). When I leave preallocate
>>>>> at default, I see GPU memory usage ~2GB (but the profiling still reports
>>>>> only 400MB).
>>>>>
>>>>
>>>> Preallocating 400 or 500 MB may avoid fragmentation and bring the total
>>>> consumption peak closer to what is actually allocated to arrays.
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> Fabian
>>>>>
>>>>> On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote:
>>>>>>
>>>>>> The equivalent to the old back-end setting for memory is:
>>>>>> gpuarray.preallocate=-1.
>>>>>>
>>>>>> The new back-end by default will cache all call to cudaMalloc() to
>>>>>> speed up computation. This flag will disable this cache. THis is the same
>>>>>> default as the old back-end.
>>>>>>
>>>>>> On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2 and
>>>>>>> gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev.
>>>>>>> For most experiments (including those in the log files) I did not
>>>>>>> use preallocation, because the only way I could see the difference in
>>>>>>> memory usage was through nvidia-smi, which only shows the static
>>>>>>> pre-allocation when it is used.
>>>>>>> I believe the problem does not disappear with pre-allocation, since
>>>>>>> I see my training crash for much smaller models with the new backend 
>>>>>>> even
>>>>>>> then. However, I cannot measure the effect of switching backends on GPU
>>>>>>> memory when I use preallocation.
>>>>>>>
>>>>>>>
>>>>>>> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote:
>>>>>>>
>>>>>>>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried
>>>>>>>> the preallocation, how did you use it?
>>>>>>>>
>>>>>>>> Is is mostly equivalent to lib.cnmem. But our default is different
>>>>>>>> and by default give more speed up, but sometimes can cause memory
>>>>>>>> fragmentation. the flag above fix the new fragmentation that can 
>>>>>>>> happen by
>>>>>>>> default.
>>>>>>>>
>>>>>>>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>> One addition:
>>>>>>>>> The theano 0.9.0 setup used libgpuarray v0.6.2.
>>>>>>>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated
>>>>>>>>> to v0.6.7 and tested again, but I still get ~2GB memory usage.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I recently tried to switch my CNN implementation to the new
>>>>>>>>>> theano GPU backend. To do so, I switched from "device=gpu" to 
>>>>>>>>>> "device=cuda"
>>>>>>>>>> with theano9 and libgpuarray installed. My theano code then works 
>>>>>>>>>> with the
>>>>>>>>>> new backend without any further changes.
>>>>>>>>>>
>>>>>>>>>> However, when I do this, I see my GPU memory consumption increase
>>>>>>>>>> drastically. When I use theano memory profiling both GPU backends 
>>>>>>>>>> show the
>>>>>>>>>> same memory consumption, but when I use nvidia-smi to monitor memory 
>>>>>>>>>> usage
>>>>>>>>>> while the job is running, the old backend hovers somewhere around 
>>>>>>>>>> 400MB,
>>>>>>>>>> while the new backend uses 2GB for the same model size and data. 
>>>>>>>>>> When I try
>>>>>>>>>> to train larger models, the new GPU backend fails with memory errors 
>>>>>>>>>> for
>>>>>>>>>> much smaller models than the old backend. This is also true when I 
>>>>>>>>>> activate
>>>>>>>>>> memory pre-allocation.
>>>>>>>>>>
>>>>>>>>>> I tried to remove parts of my model or exclude certain theano
>>>>>>>>>> optimizations (e.g. exclude conv_dnn to force theano to use a 
>>>>>>>>>> different
>>>>>>>>>> convolution algorithm) but nothing I changed in the model structure 
>>>>>>>>>> had an
>>>>>>>>>> impact on the discrepancy I see in memory usage.
>>>>>>>>>>
>>>>>>>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old
>>>>>>>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 
>>>>>>>>>> releases.
>>>>>>>>>> For the new backend I tested the 0.9.0 release as well as a recent 
>>>>>>>>>> github
>>>>>>>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both 
>>>>>>>>>> showed
>>>>>>>>>> the same memory increase.
>>>>>>>>>>
>>>>>>>>>> I attached log files including my models computational graph and
>>>>>>>>>> information on libraries, environment variables, etc. Please let me 
>>>>>>>>>> know if
>>>>>>>>>> I can supply any additional information to make it easier to look 
>>>>>>>>>> into
>>>>>>>>>> this. I tried to prepare a simple sample script to reproduce the 
>>>>>>>>>> behavior,
>>>>>>>>>> but was so far unable to do so.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Fabian
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "theano-users" group.
>>>>>>>>>
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>>>
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>> --
>>>>>>>
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "theano-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Reply via email to