Do you use the Theano flag: gpuarray.preallocate=1? When you tried the preallocation, how did you use it?
Is is mostly equivalent to lib.cnmem. But our default is different and by default give more speed up, but sometimes can cause memory fragmentation. the flag above fix the new fragmentation that can happen by default. On Thu, Jun 22, 2017 at 5:33 AM Fabian Stemmer <[email protected]> wrote: > One addition: > The theano 0.9.0 setup used libgpuarray v0.6.2. > The theano 0.10.dev setup used libgpuarray v0.6.5 - I just updated to > v0.6.7 and tested again, but I still get ~2GB memory usage. > > > On Thursday, June 22, 2017 at 8:38:26 AM UTC+2, Fabian Stemmer wrote: >> >> Hi, >> >> I recently tried to switch my CNN implementation to the new theano GPU >> backend. To do so, I switched from "device=gpu" to "device=cuda" with >> theano9 and libgpuarray installed. My theano code then works with the new >> backend without any further changes. >> >> However, when I do this, I see my GPU memory consumption increase >> drastically. When I use theano memory profiling both GPU backends show the >> same memory consumption, but when I use nvidia-smi to monitor memory usage >> while the job is running, the old backend hovers somewhere around 400MB, >> while the new backend uses 2GB for the same model size and data. When I try >> to train larger models, the new GPU backend fails with memory errors for >> much smaller models than the old backend. This is also true when I activate >> memory pre-allocation. >> >> I tried to remove parts of my model or exclude certain theano >> optimizations (e.g. exclude conv_dnn to force theano to use a different >> convolution algorithm) but nothing I changed in the model structure had an >> impact on the discrepancy I see in memory usage. >> >> I use CUDA 8.0 and cuDNN 5105 for these experiments. For the old backend >> I see very similar behavior for both the 0.8.2 and 0.9.0 releases. For the >> new backend I tested the 0.9.0 release as well as a recent github checkout >> (commit c5cd87fa7895dc44c7acd54cb85e6d232b33bd3a) - both showed the same >> memory increase. >> >> I attached log files including my models computational graph and >> information on libraries, environment variables, etc. Please let me know if >> I can supply any additional information to make it easier to look into >> this. I tried to prepare a simple sample script to reproduce the behavior, >> but was so far unable to do so. >> >> Thanks >> Fabian >> > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
