On Monday, July 10, 2017 at 2:42:39 AM UTC-4, Fabian Stemmer wrote: > > Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior for > the new backend as for the old. > > Do I understand correctly, that leaving preallocate at default behavior > (new backend) will not result in higher memory consumption, but merely > doesn't free memory once allocated, so what I see in nvidia-smi is > max-memory consumption up to this point? >
Not really, it can actually result in higher memory consumption due to the way new memory blocks are allocated. For instance, in the worse case, if a tensor of 1 MB gets allocated and deallocated, then a 2 MB tensor, a new 2 MB block will be added to the pool, however it will not be mergeable with the first one, and if it gets freed, a 3 MB tensor cannot be "split" between the first blocks. Due to that fragmentation effect, allocating / deallocating 1 MB, then 2 MB, 3 MB, etc., will end up using 1 + 2 + 3 + ... MB total on the GPU. > A related question: When I run with profile=True,profile_memory=True - > shouldn't the max GPU memory stat in the profiling correspond to what I see > in nvidia-smi when I run with preallocate on default? > Again, not really, due to that fragmentation effect. > Currently, I see ~400MB GPU memory usage in profiling and that's what I > see with preallocate=-1 too (although I can't guarantuee there aren't > higher spikes that I don't see with nvidia-smi). When I leave preallocate > at default, I see GPU memory usage ~2GB (but the profiling still reports > only 400MB). > Preallocating 400 or 500 MB may avoid fragmentation and bring the total consumption peak closer to what is actually allocated to arrays. > > Thanks > Fabian > > On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote: >> >> The equivalent to the old back-end setting for memory is: >> gpuarray.preallocate=-1. >> >> The new back-end by default will cache all call to cudaMalloc() to speed >> up computation. This flag will disable this cache. THis is the same default >> as the old back-end. >> >> On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected]> >> wrote: >> >>> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2 and >>> gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev. >>> For most experiments (including those in the log files) I did not use >>> preallocation, because the only way I could see the difference in memory >>> usage was through nvidia-smi, which only shows the static pre-allocation >>> when it is used. >>> I believe the problem does not disappear with pre-allocation, since I >>> see my training crash for much smaller models with the new backend even >>> then. However, I cannot measure the effect of switching backends on GPU >>> memory when I use preallocation. >>> >>> >>> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote: >>> >>>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried the >>>> preallocation, how did you use it? >>>> >>>> Is is mostly equivalent to lib.cnmem. But our default is different and >>>> by default give more speed up, but sometimes can cause memory >>>> fragmentation. the flag above fix the new fragmentation that can happen by >>>> default. >>>> >>>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]> >>>> wrote: >>>> >>> One addition: >>>>> The theano 0.9.0 setup used libgpuarray v0.6.2. >>>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated to >>>>> v0.6.7 and tested again, but I still get ~2GB memory usage. >>>>> >>>>> >>>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I recently tried to switch my CNN implementation to the new theano >>>>>> GPU backend. To do so, I switched from "device=gpu" to "device=cuda" >>>>>> with >>>>>> theano9 and libgpuarray installed. My theano code then works with the >>>>>> new >>>>>> backend without any further changes. >>>>>> >>>>>> However, when I do this, I see my GPU memory consumption increase >>>>>> drastically. When I use theano memory profiling both GPU backends show >>>>>> the >>>>>> same memory consumption, but when I use nvidia-smi to monitor memory >>>>>> usage >>>>>> while the job is running, the old backend hovers somewhere around 400MB, >>>>>> while the new backend uses 2GB for the same model size and data. When I >>>>>> try >>>>>> to train larger models, the new GPU backend fails with memory errors for >>>>>> much smaller models than the old backend. This is also true when I >>>>>> activate >>>>>> memory pre-allocation. >>>>>> >>>>>> I tried to remove parts of my model or exclude certain theano >>>>>> optimizations (e.g. exclude conv_dnn to force theano to use a different >>>>>> convolution algorithm) but nothing I changed in the model structure had >>>>>> an >>>>>> impact on the discrepancy I see in memory usage. >>>>>> >>>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old >>>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 >>>>>> releases. >>>>>> For the new backend I tested the 0.9.0 release as well as a recent >>>>>> github >>>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both showed >>>>>> the same memory increase. >>>>>> >>>>>> I attached log files including my models computational graph and >>>>>> information on libraries, environment variables, etc. Please let me know >>>>>> if >>>>>> I can supply any additional information to make it easier to look into >>>>>> this. I tried to prepare a simple sample script to reproduce the >>>>>> behavior, >>>>>> but was so far unable to do so. >>>>>> >>>>>> Thanks >>>>>> Fabian >>>>>> >>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "theano-users" group. >>>>> >>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>> >>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "theano-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
