Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Anton Murashov Wed, 30 Aug 2017 10:23:33 -0700

1.

for example - in .theanorc:


[gpuarray]
preallocate=-1

in my jupyter notebook I confirm that by printing:

theano.config.gpuarray.preallocate, which shows it is -1

on import of Theano warning that GPU memory caching is off is also duly
printed.

2. I get theano dev version by following this manual:

http://deeplearning.net/software/theano_versions/dev/install_ubuntu.html

git clone git://github.com/Theano/Theano.git
cd Theano
pip install -e

later I confirm version of theano in jupyter by printing theano.__version__
which duly shows 0.10-dev-1b0 or smth similar.

3. I tried getting newer version of libgpuarray (0.7, also a development
one) but there is a check in Theano, it wants "1" as a major version of
API, while in 7.0 it is "2" already, so to make it work the latest version
of libgpuarray (which, presumably, contains the bug we are discussing here)
you can use is 0.6.9, which is already supplied with Anaconda as part of
pygpu-0.6.9. I tried compliming my own libgpuarray doing this:

git clone https://github.com/Theano/libgpuarray.gitcd libgpuarraygit
checkout tags/v0.6.5 -b v0.6.9


and then building it and making sure relevant libs find their way into
python instance - to no difference in the result, still outofmem in case of
preallocate>=0 and invalid argument in case preallocate < 0 when model.fit
gets called.

On Wed, Aug 30, 2017 at 5:52 PM, Frédéric Bastien <
[email protected]> wrote:

> What is the name of the flag you used? The name changed with the new
> back-end.
>
> Make sure to use the github version. Not a tagged version.
>
> Frédéric
>
> On Wed, Aug 30, 2017 at 11:20 AM Anton Murashov <[email protected]> wrote:
>
>> Actually, initially I tried theano-0.10-dev-0b1 or smth like this, which
>> appears to be the most recent dev version, which I later re-installed to be
>> theano-0.9 which is part of Anaconda package.
>>
>> As per preallocate flag I tried following options:
>>
>> (a) 1 and 0 (big problems crash with OutOfMem, some problems work
>> initially but crash with OutOfMem if fit is restarted after kernel
>> interrupt).
>>
>> (b) -1 (model.fit crashes on problem of any size (even which work in (a)
>> initially) with invalid argument error in cuMemAlloc) --> this one appears
>> to be an outright bug.
>>
>> Should I open github ticket?
>>
>> On 30 Aug 2017 5:59 pm, "Frédéric Bastien" <[email protected]>
>> wrote:
>>
>>> Update to Theano dev version. There is many updates that could help you.
>>>
>>> If that don't fix your problem, open an issue on github.
>>>
>>> For preallocation, which flag to do you use?
>>>
>>> On Tue, Aug 29, 2017 at 8:30 PM Anton Murashov <[email protected]> wrote:
>>>
>>>> Hello all!
>>>>
>>>> I have a very similar problem with new gpuarray backend, ) it has
>>>> following undesired behaviour:
>>>>
>>>> (a) with preallocation turned ON (any value above and including zero)
>>>> it crashes with cuMemAlloc error (OutOfMemory) on problem of my size
>>>> (smaller problems work)
>>>> (b) with preallocation turned ON and if small problem is being fitted -
>>>> interrupting the kernel and restarting results in cuMemAlloc error
>>>> (OutOfMemory)
>>>> (b) with preallocation turned OFF (preallocation=-1) it does not even
>>>> start fitting with cuMemAlloc error (invalid argument!!! NOT
>>>> OutOfMemory!!!!)
>>>>
>>>> GpuArrayException: ('The following error happened while compiling the
>>>> node', forall_inplace,gpu,grad_of_scan_fn}(TensorConstant{1000},
>>>> GpuSubtensor{int64:int64:int64}.0, GpuElemwise{Composite{(i0 -
>>>> sqr(i1))}}[]<gpuarray>.0, GpuElemwise{tanh,no_inplace}.0,
>>>> InplaceGpuDimShuffle{0,2,1}.0, GpuAlloc<None>{memset_0=True}.0,
>>>> GpuSubtensor{int64:int64:int64}.0, GpuSubtensor{int64:int64:int64}.0,
>>>> GpuSubtensor{int64:int64:int64}.0, GpuAlloc<None>{memset_0=True}.0,
>>>> GpuAlloc<None>{memset_0=True}.0, GpuAlloc<None>{memset_0=True}.0,
>>>> TensorConstant{1000}, GpuSubtensor{::, int64:int64:}.0,
>>>> InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, :int64:}.0, GpuSubtensor{::,
>>>> int64::}.0, InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, int64:int64:}.0,
>>>> InplaceGpuDimShuffle{1,0}.0, InplaceGpuDimShuffle{1,0}.0,
>>>> GpuAlloc<None>{memset_0=True}.0), '\n', 'cuMemAlloc:
>>>> CUDA_ERROR_INVALID_VALUE: invalid argument')
>>>>
>>>> Needless to say, on old backend all works fine, just 20% slower (on
>>>> problems which actually start fitting on both backends). I use versions
>>>> currently supplied with Anaconda (theano-0.9, libgpuarray 0.6.9, pygpu
>>>> 0.6.9)
>>>>
>>>> On Tuesday, July 11, 2017 at 3:23:44 AM UTC+2, Pascal Lamblin wrote:
>>>>>
>>>>> On Monday, July 10, 2017 at 2:42:39 AM UTC-4, Fabian Stemmer wrote:
>>>>>>
>>>>>> Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior
>>>>>> for the new backend as for the old.
>>>>>>
>>>>>> Do I understand correctly, that leaving preallocate at default
>>>>>> behavior (new backend) will not result in higher memory consumption, but
>>>>>> merely doesn't free memory once allocated, so what I see in nvidia-smi is
>>>>>> max-memory consumption up to this point?
>>>>>>
>>>>>
>>>>> Not really, it can actually result in higher memory consumption due to
>>>>> the way new memory blocks are allocated. For instance, in the worse case,
>>>>> if a tensor of 1 MB gets allocated and deallocated, then a 2 MB tensor, a
>>>>> new 2 MB block will be added to the pool, however it will not be mergeable
>>>>> with the first one, and if it gets freed, a 3 MB tensor cannot be "split"
>>>>> between the first blocks. Due to that fragmentation effect, allocating /
>>>>> deallocating 1 MB, then 2 MB, 3 MB, etc., will end up using 1 + 2 + 3 + 
>>>>> ...
>>>>> MB total on the GPU.
>>>>>
>>>>>
>>>>>> A related question: When I run with profile=True,profile_memory=True
>>>>>> - shouldn't the max GPU memory stat in the profiling correspond to what I
>>>>>> see in nvidia-smi when I run with preallocate on default?
>>>>>>
>>>>>
>>>>> Again, not really, due to that fragmentation effect.
>>>>>
>>>>>
>>>>>> Currently, I see ~400MB GPU memory usage in profiling and that's what
>>>>>> I see with preallocate=-1 too (although I can't guarantuee there aren't
>>>>>> higher spikes that I don't see with nvidia-smi). When I leave preallocate
>>>>>> at default, I see GPU memory usage ~2GB (but the profiling still reports
>>>>>> only 400MB).
>>>>>>
>>>>>
>>>>> Preallocating 400 or 500 MB may avoid fragmentation and bring the
>>>>> total consumption peak closer to what is actually allocated to arrays.
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Fabian
>>>>>>
>>>>>> On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote:
>>>>>>>
>>>>>>> The equivalent to the old back-end setting for memory is:
>>>>>>> gpuarray.preallocate=-1.
>>>>>>>
>>>>>>> The new back-end by default will cache all call to cudaMalloc() to
>>>>>>> speed up computation. This flag will disable this cache. THis is the 
>>>>>>> same
>>>>>>> default as the old back-end.
>>>>>>>
>>>>>>> On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2
>>>>>>>> and gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev.
>>>>>>>> For most experiments (including those in the log files) I did not
>>>>>>>> use preallocation, because the only way I could see the difference in
>>>>>>>> memory usage was through nvidia-smi, which only shows the static
>>>>>>>> pre-allocation when it is used.
>>>>>>>> I believe the problem does not disappear with pre-allocation, since
>>>>>>>> I see my training crash for much smaller models with the new backend 
>>>>>>>> even
>>>>>>>> then. However, I cannot measure the effect of switching backends on GPU
>>>>>>>> memory when I use preallocation.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote:
>>>>>>>>
>>>>>>>>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried
>>>>>>>>> the preallocation, how did you use it?
>>>>>>>>>
>>>>>>>>> Is is mostly equivalent to lib.cnmem. But our default is different
>>>>>>>>> and by default give more speed up, but sometimes can cause memory
>>>>>>>>> fragmentation. the flag above fix the new fragmentation that can 
>>>>>>>>> happen by
>>>>>>>>> default.
>>>>>>>>>
>>>>>>>>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>> One addition:
>>>>>>>>>> The theano 0.9.0 setup used libgpuarray v0.6.2.
>>>>>>>>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just
>>>>>>>>>> updated to v0.6.7 and tested again, but I still get ~2GB memory 
>>>>>>>>>> usage.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I recently tried to switch my CNN implementation to the new
>>>>>>>>>>> theano GPU backend. To do so, I switched from "device=gpu" to 
>>>>>>>>>>> "device=cuda"
>>>>>>>>>>> with theano9 and libgpuarray installed. My theano code then works 
>>>>>>>>>>> with the
>>>>>>>>>>> new backend without any further changes.
>>>>>>>>>>>
>>>>>>>>>>> However, when I do this, I see my GPU memory consumption
>>>>>>>>>>> increase drastically. When I use theano memory profiling both GPU 
>>>>>>>>>>> backends
>>>>>>>>>>> show the same memory consumption, but when I use nvidia-smi to 
>>>>>>>>>>> monitor
>>>>>>>>>>> memory usage while the job is running, the old backend hovers 
>>>>>>>>>>> somewhere
>>>>>>>>>>> around 400MB, while the new backend uses 2GB for the same model 
>>>>>>>>>>> size and
>>>>>>>>>>> data. When I try to train larger models, the new GPU backend fails 
>>>>>>>>>>> with
>>>>>>>>>>> memory errors for much smaller models than the old backend. This is 
>>>>>>>>>>> also
>>>>>>>>>>> true when I activate memory pre-allocation.
>>>>>>>>>>>
>>>>>>>>>>> I tried to remove parts of my model or exclude certain theano
>>>>>>>>>>> optimizations (e.g. exclude conv_dnn to force theano to use a 
>>>>>>>>>>> different
>>>>>>>>>>> convolution algorithm) but nothing I changed in the model structure 
>>>>>>>>>>> had an
>>>>>>>>>>> impact on the discrepancy I see in memory usage.
>>>>>>>>>>>
>>>>>>>>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old
>>>>>>>>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 
>>>>>>>>>>> releases.
>>>>>>>>>>> For the new backend I tested the 0.9.0 release as well as a recent 
>>>>>>>>>>> github
>>>>>>>>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) -
>>>>>>>>>>> both showed the same memory increase.
>>>>>>>>>>>
>>>>>>>>>>> I attached log files including my models computational graph and
>>>>>>>>>>> information on libraries, environment variables, etc. Please let me 
>>>>>>>>>>> know if
>>>>>>>>>>> I can supply any additional information to make it easier to look 
>>>>>>>>>>> into
>>>>>>>>>>> this. I tried to prepare a simple sample script to reproduce the 
>>>>>>>>>>> behavior,
>>>>>>>>>>> but was so far unable to do so.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Fabian
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "theano-users" group.
>>>>>>>>>>
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>
>>>>>>>> ---
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "theano-users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
_______________________________

Anton Murashov
Quantstellation.Centaurus
desk  +44 748 1916031

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Significant increase in GPU memory consumption with new GPU backend

Reply via email to