What is the name of the flag you used? The name changed with the new back-end.
Make sure to use the github version. Not a tagged version. Frédéric On Wed, Aug 30, 2017 at 11:20 AM Anton Murashov <[email protected]> wrote: > Actually, initially I tried theano-0.10-dev-0b1 or smth like this, which > appears to be the most recent dev version, which I later re-installed to be > theano-0.9 which is part of Anaconda package. > > As per preallocate flag I tried following options: > > (a) 1 and 0 (big problems crash with OutOfMem, some problems work > initially but crash with OutOfMem if fit is restarted after kernel > interrupt). > > (b) -1 (model.fit crashes on problem of any size (even which work in (a) > initially) with invalid argument error in cuMemAlloc) --> this one appears > to be an outright bug. > > Should I open github ticket? > > On 30 Aug 2017 5:59 pm, "Frédéric Bastien" <[email protected]> > wrote: > >> Update to Theano dev version. There is many updates that could help you. >> >> If that don't fix your problem, open an issue on github. >> >> For preallocation, which flag to do you use? >> >> On Tue, Aug 29, 2017 at 8:30 PM Anton Murashov <[email protected]> wrote: >> >>> Hello all! >>> >>> I have a very similar problem with new gpuarray backend, ) it has >>> following undesired behaviour: >>> >>> (a) with preallocation turned ON (any value above and including zero) it >>> crashes with cuMemAlloc error (OutOfMemory) on problem of my size (smaller >>> problems work) >>> (b) with preallocation turned ON and if small problem is being fitted - >>> interrupting the kernel and restarting results in cuMemAlloc error >>> (OutOfMemory) >>> (b) with preallocation turned OFF (preallocation=-1) it does not even >>> start fitting with cuMemAlloc error (invalid argument!!! NOT >>> OutOfMemory!!!!) >>> >>> GpuArrayException: ('The following error happened while compiling the >>> node', forall_inplace,gpu,grad_of_scan_fn}(TensorConstant{1000}, >>> GpuSubtensor{int64:int64:int64}.0, GpuElemwise{Composite{(i0 - >>> sqr(i1))}}[]<gpuarray>.0, GpuElemwise{tanh,no_inplace}.0, >>> InplaceGpuDimShuffle{0,2,1}.0, GpuAlloc<None>{memset_0=True}.0, >>> GpuSubtensor{int64:int64:int64}.0, GpuSubtensor{int64:int64:int64}.0, >>> GpuSubtensor{int64:int64:int64}.0, GpuAlloc<None>{memset_0=True}.0, >>> GpuAlloc<None>{memset_0=True}.0, GpuAlloc<None>{memset_0=True}.0, >>> TensorConstant{1000}, GpuSubtensor{::, int64:int64:}.0, >>> InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, :int64:}.0, GpuSubtensor{::, >>> int64::}.0, InplaceGpuDimShuffle{1,0}.0, GpuSubtensor{::, int64:int64:}.0, >>> InplaceGpuDimShuffle{1,0}.0, InplaceGpuDimShuffle{1,0}.0, >>> GpuAlloc<None>{memset_0=True}.0), '\n', 'cuMemAlloc: >>> CUDA_ERROR_INVALID_VALUE: invalid argument') >>> >>> Needless to say, on old backend all works fine, just 20% slower (on >>> problems which actually start fitting on both backends). I use versions >>> currently supplied with Anaconda (theano-0.9, libgpuarray 0.6.9, pygpu >>> 0.6.9) >>> >>> On Tuesday, July 11, 2017 at 3:23:44 AM UTC+2, Pascal Lamblin wrote: >>>> >>>> On Monday, July 10, 2017 at 2:42:39 AM UTC-4, Fabian Stemmer wrote: >>>>> >>>>> Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior >>>>> for the new backend as for the old. >>>>> >>>>> Do I understand correctly, that leaving preallocate at default >>>>> behavior (new backend) will not result in higher memory consumption, but >>>>> merely doesn't free memory once allocated, so what I see in nvidia-smi is >>>>> max-memory consumption up to this point? >>>>> >>>> >>>> Not really, it can actually result in higher memory consumption due to >>>> the way new memory blocks are allocated. For instance, in the worse case, >>>> if a tensor of 1 MB gets allocated and deallocated, then a 2 MB tensor, a >>>> new 2 MB block will be added to the pool, however it will not be mergeable >>>> with the first one, and if it gets freed, a 3 MB tensor cannot be "split" >>>> between the first blocks. Due to that fragmentation effect, allocating / >>>> deallocating 1 MB, then 2 MB, 3 MB, etc., will end up using 1 + 2 + 3 + ... >>>> MB total on the GPU. >>>> >>>> >>>>> A related question: When I run with profile=True,profile_memory=True - >>>>> shouldn't the max GPU memory stat in the profiling correspond to what I >>>>> see >>>>> in nvidia-smi when I run with preallocate on default? >>>>> >>>> >>>> Again, not really, due to that fragmentation effect. >>>> >>>> >>>>> Currently, I see ~400MB GPU memory usage in profiling and that's what >>>>> I see with preallocate=-1 too (although I can't guarantuee there aren't >>>>> higher spikes that I don't see with nvidia-smi). When I leave preallocate >>>>> at default, I see GPU memory usage ~2GB (but the profiling still reports >>>>> only 400MB). >>>>> >>>> >>>> Preallocating 400 or 500 MB may avoid fragmentation and bring the total >>>> consumption peak closer to what is actually allocated to arrays. >>>> >>>> >>>>> >>>>> Thanks >>>>> Fabian >>>>> >>>>> On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote: >>>>>> >>>>>> The equivalent to the old back-end setting for memory is: >>>>>> gpuarray.preallocate=-1. >>>>>> >>>>>> The new back-end by default will cache all call to cudaMalloc() to >>>>>> speed up computation. This flag will disable this cache. THis is the same >>>>>> default as the old back-end. >>>>>> >>>>>> On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2 and >>>>>>> gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev. >>>>>>> For most experiments (including those in the log files) I did not >>>>>>> use preallocation, because the only way I could see the difference in >>>>>>> memory usage was through nvidia-smi, which only shows the static >>>>>>> pre-allocation when it is used. >>>>>>> I believe the problem does not disappear with pre-allocation, since >>>>>>> I see my training crash for much smaller models with the new backend >>>>>>> even >>>>>>> then. However, I cannot measure the effect of switching backends on GPU >>>>>>> memory when I use preallocation. >>>>>>> >>>>>>> >>>>>>> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote: >>>>>>> >>>>>>>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried >>>>>>>> the preallocation, how did you use it? >>>>>>>> >>>>>>>> Is is mostly equivalent to lib.cnmem. But our default is different >>>>>>>> and by default give more speed up, but sometimes can cause memory >>>>>>>> fragmentation. the flag above fix the new fragmentation that can >>>>>>>> happen by >>>>>>>> default. >>>>>>>> >>>>>>>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>> One addition: >>>>>>>>> The theano 0.9.0 setup used libgpuarray v0.6.2. >>>>>>>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated >>>>>>>>> to v0.6.7 and tested again, but I still get ~2GB memory usage. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I recently tried to switch my CNN implementation to the new >>>>>>>>>> theano GPU backend. To do so, I switched from "device=gpu" to >>>>>>>>>> "device=cuda" >>>>>>>>>> with theano9 and libgpuarray installed. My theano code then works >>>>>>>>>> with the >>>>>>>>>> new backend without any further changes. >>>>>>>>>> >>>>>>>>>> However, when I do this, I see my GPU memory consumption increase >>>>>>>>>> drastically. When I use theano memory profiling both GPU backends >>>>>>>>>> show the >>>>>>>>>> same memory consumption, but when I use nvidia-smi to monitor memory >>>>>>>>>> usage >>>>>>>>>> while the job is running, the old backend hovers somewhere around >>>>>>>>>> 400MB, >>>>>>>>>> while the new backend uses 2GB for the same model size and data. >>>>>>>>>> When I try >>>>>>>>>> to train larger models, the new GPU backend fails with memory errors >>>>>>>>>> for >>>>>>>>>> much smaller models than the old backend. This is also true when I >>>>>>>>>> activate >>>>>>>>>> memory pre-allocation. >>>>>>>>>> >>>>>>>>>> I tried to remove parts of my model or exclude certain theano >>>>>>>>>> optimizations (e.g. exclude conv_dnn to force theano to use a >>>>>>>>>> different >>>>>>>>>> convolution algorithm) but nothing I changed in the model structure >>>>>>>>>> had an >>>>>>>>>> impact on the discrepancy I see in memory usage. >>>>>>>>>> >>>>>>>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old >>>>>>>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 >>>>>>>>>> releases. >>>>>>>>>> For the new backend I tested the 0.9.0 release as well as a recent >>>>>>>>>> github >>>>>>>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both >>>>>>>>>> showed >>>>>>>>>> the same memory increase. >>>>>>>>>> >>>>>>>>>> I attached log files including my models computational graph and >>>>>>>>>> information on libraries, environment variables, etc. Please let me >>>>>>>>>> know if >>>>>>>>>> I can supply any additional information to make it easier to look >>>>>>>>>> into >>>>>>>>>> this. I tried to prepare a simple sample script to reproduce the >>>>>>>>>> behavior, >>>>>>>>>> but was so far unable to do so. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Fabian >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> --- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "theano-users" group. >>>>>>>>> >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>> >>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> -- >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "theano-users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "theano-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "theano-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
