My guess is that:
- without cnmem, allocation and deallocation of intermediate results
force synchronization of the GPU more often, so the overall time is
- with cnmem and borrow=False, there is no synchronization at all, and
what is measured is just the time to launch the GPU kernels, not the
time to actually execute them.
- with cnmem and borrow=True, there seems to be one synchronization
forced after each function call, I'm not sure why.
On Sun, Oct 09, 2016, Chris Hanning wrote:
> Testing the following code from:
> copy : https://paste.pound-python.org/show/vGCQlEMIoOPWZuUPo2DJ/
> I found that running it on an iMac, i5, GeForce GT 640M gave significant
> gains when enabling lib.cnmm
> With CNMM disabled:
> $ THEANO_FLAGS='device=gpu0,lib.cnmm=0' python borrow_test.py
> Looping 1000 times took 0.49251699447631836 seconds without borrow and
> 0.34339094161987305 seconds with borrow
> With CNMM enabled:
> $ THEANO_FLAGS='device=gpu0,lib.cnmm=0.3' python borrow_test.py
> Looping 1000 times took 0.019893884658813477 seconds without borrow and
> 0.3345789909362793 seconds with borrow
> On this system, any value for cnmm over 0.4 would crash the program due to
> memory constraints.
> There no significant difference in performance between 0.1 and 0.4.
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to theano-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.