Thanks, by setting gpuarray.preallocate=-1 I now get similar behavior for the new backend as for the old.
Do I understand correctly, that leaving preallocate at default behavior (new backend) will not result in higher memory consumption, but merely doesn't free memory once allocated, so what I see in nvidia-smi is max-memory consumption up to this point? A related question: When I run with profile=True,profile_memory=True - shouldn't the max GPU memory stat in the profiling correspond to what I see in nvidia-smi when I run with preallocate on default? Currently, I see ~400MB GPU memory usage in profiling and that's what I see with preallocate=-1 too (although I can't guarantuee there aren't higher spikes that I don't see with nvidia-smi). When I leave preallocate at default, I see GPU memory usage ~2GB (but the profiling still reports only 400MB). Thanks Fabian On Thursday, June 22, 2017 at 3:45:07 PM UTC+2, nouiz wrote: > > The equivalent to the old back-end setting for memory is: > gpuarray.preallocate=-1. > > The new back-end by default will cache all call to cudaMalloc() to speed > up computation. This flag will disable this cache. THis is the same default > as the old back-end. > > On Thu, Jun 22, 2017 at 9:41 AM Fabian Stemmer <[email protected] > <javascript:>> wrote: > >> When I did use preallocation I used lib.cnmem=1 for theano 0.8.2 and >> gpuarray.preallocate=1 for theano 0.9.0 and 0.10.dev. >> For most experiments (including those in the log files) I did not use >> preallocation, because the only way I could see the difference in memory >> usage was through nvidia-smi, which only shows the static pre-allocation >> when it is used. >> I believe the problem does not disappear with pre-allocation, since I see >> my training crash for much smaller models with the new backend even then. >> However, I cannot measure the effect of switching backends on GPU memory >> when I use preallocation. >> >> >> On Thursday, June 22, 2017 at 3:23:15 PM UTC+2, nouiz wrote: >> >>> Do you use the Theano flag: gpuarray.preallocate=1? When you tried the >>> preallocation, how did you use it? >>> >>> Is is mostly equivalent to lib.cnmem. But our default is different and >>> by default give more speed up, but sometimes can cause memory >>> fragmentation. the flag above fix the new fragmentation that can happen by >>> default. >>> >>> On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]> >>> wrote: >>> >> One addition: >>>> The theano 0.9.0 setup used libgpuarray v0.6.2. >>>> The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated to >>>> v0.6.7 and tested again, but I still get ~2GB memory usage. >>>> >>>> >>>> On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer wrote: >>>>> >>>>> Hi, >>>>> >>>>> I recently tried to switch my CNN implementation to the new theano GPU >>>>> backend. To do so, I switched from "device=gpu" to "device=cuda" with >>>>> theano9 and libgpuarray installed. My theano code then works with the new >>>>> backend without any further changes. >>>>> >>>>> However, when I do this, I see my GPU memory consumption increase >>>>> drastically. When I use theano memory profiling both GPU backends show >>>>> the >>>>> same memory consumption, but when I use nvidia-smi to monitor memory >>>>> usage >>>>> while the job is running, the old backend hovers somewhere around 400MB, >>>>> while the new backend uses 2GB for the same model size and data. When I >>>>> try >>>>> to train larger models, the new GPU backend fails with memory errors for >>>>> much smaller models than the old backend. This is also true when I >>>>> activate >>>>> memory pre-allocation. >>>>> >>>>> I tried to remove parts of my model or exclude certain theano >>>>> optimizations (e.g. exclude conv_dnn to force theano to use a different >>>>> convolution algorithm) but nothing I changed in the model structure had >>>>> an >>>>> impact on the discrepancy I see in memory usage. >>>>> >>>>> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old >>>>> backend I see very similar behavior for both the 0.8.2 and 0.9.0 >>>>> releases. >>>>> For the new backend I tested the 0.9.0 release as well as a recent github >>>>> checkout (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both showed >>>>> the same memory increase. >>>>> >>>>> I attached log files including my models computational graph and >>>>> information on libraries, environment variables, etc. Please let me know >>>>> if >>>>> I can supply any additional information to make it easier to look into >>>>> this. I tried to prepare a simple sample script to reproduce the >>>>> behavior, >>>>> but was so far unable to do so. >>>>> >>>>> Thanks >>>>> Fabian >>>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "theano-users" group. >>>> >>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>> >>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "theano-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
