Hi, My guess is that:
- without cnmem, allocation and deallocation of intermediate results force synchronization of the GPU more often, so the overall time is slower - with cnmem and borrow=False, there is no synchronization at all, and what is measured is just the time to launch the GPU kernels, not the time to actually execute them. - with cnmem and borrow=True, there seems to be one synchronization forced after each function call, I'm not sure why. On Sun, Oct 09, 2016, Chris Hanning wrote: > Testing the following code from: > > http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowfunction > > copy : https://paste.pound-python.org/show/vGCQlEMIoOPWZuUPo2DJ/ > > I found that running it on an iMac, i5, GeForce GT 640M gave significant > gains when enabling lib.cnmm > > With CNMM disabled: > > $ THEANO_FLAGS='device=gpu0,lib.cnmm=0' python borrow_test.py > > Looping 1000 times took 0.49251699447631836 seconds without borrow and > 0.34339094161987305 seconds with borrow > > With CNMM enabled: > > $ THEANO_FLAGS='device=gpu0,lib.cnmm=0.3' python borrow_test.py > > Looping 1000 times took 0.019893884658813477 seconds without borrow and > 0.3345789909362793 seconds with borrow > > On this system, any value for cnmm over 0.4 would crash the program due to > memory constraints. > There no significant difference in performance between 0.1 and 0.4. > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- Pascal -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
