I'm experiencing bizarre behaviour while enabling cudnn. I'm able to 
compile code with and without cudnn enabled, but it seems that having it 
disabled yields faster compile time. My setup is as follows:

- Windows 10 
- visual studio community 2013 
- cuda 7.5 
- cudnn 5.0.

I'm testing on the code found 
on http://deeplearning.net/software/theano/tutorial/using_gpu.html and my 
.theanorc file resembles (without "<- message"):

[global]
floatX = float32
device = gpu
allow_gc = False
#optimizer_including = cudnn <- This can be enabled and disabled

[lib]
cnmem = 0.8

[dnn]
#enabled = True <- This can be enabled and disabled

[dnn.conv]
algo_fwd = time_once
algo_bwd_data = time_once
algo_bwd_filter = time_once

[blas]
ldflags=-LC:\openblas -lopenblas

[nvcc]
flags = -LC:\Users\lukas\Anaconda3\libs
compiler_bindir = "C:\Program Files (x86)\Microsoft Visual Studio 
12.0\VC\bin" <- This can be enabled and disabled

with the above options configured I get:

Using gpu device 0: GeForce 940M (CNMeM is enabled with initial size: 80.0% 
of memory, cuDNN not available)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), 
HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.908416 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
[Finished in 5.6s]

which is the expected outcome and a fast compile time. When I set
[global]
...
optimizer_including = cudnn

[dnn]
enable = True

I get the following error message

Using gpu device 0: GeForce 940M (CNMeM is enabled with initial size: 80.0% 
of memory, cuDNN Can not compile with cuDNN. We got this error:
b'')
ERROR (theano.gof.opt): SeqOptimizer apply 
<theano.sandbox.cuda.dnn.NoCuDNNRaise object at 0x000002137DDDB278>
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Users\lukas\Anaconda3\lib\site-packages\theano\gof\opt.py", line 
230, in apply
    sub_prof = optimizer.optimize(fgraph)
  File "C:\Users\lukas\Anaconda3\lib\site-packages\theano\gof\opt.py", line 
89, in optimize
    ret = self.apply(fgraph, *args, **kwargs)
  File 
"C:\Users\lukas\Anaconda3\lib\site-packages\theano\sandbox\cuda\dnn.py", 
line 2564, in apply
    if not dnn_available():
  File 
"C:\Users\lukas\Anaconda3\lib\site-packages\theano\sandbox\cuda\__init__.py", 
line 340, in dnn_available
    dnn_available.msg)
RuntimeError: You enabled cuDNN, but we aren't able to use it: Can not 
compile with cuDNN. We got this error:
b''

ERROR (theano.gof.opt): SeqOptimizer apply 
<theano.sandbox.cuda.dnn.NoCuDNNRaise object at 0x000002137DDDB278>
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Users\lukas\Anaconda3\lib\site-packages\theano\gof\opt.py", line 
230, in apply
    sub_prof = optimizer.optimize(fgraph)
  File "C:\Users\lukas\Anaconda3\lib\site-packages\theano\gof\opt.py", line 
89, in optimize
    ret = self.apply(fgraph, *args, **kwargs)
  File 
"C:\Users\lukas\Anaconda3\lib\site-packages\theano\sandbox\cuda\dnn.py", 
line 2564, in apply
    if not dnn_available():
  File 
"C:\Users\lukas\Anaconda3\lib\site-packages\theano\sandbox\cuda\__init__.py", 
line 340, in dnn_available
    dnn_available.msg)
RuntimeError: You enabled cuDNN, but we aren't able to use it: Can not 
compile with cuDNN. We got this error:
b''

This is resolved by setting 
[nvcc]
...
# compiler_bindir = "C:\Program Files (x86)\Microsoft Visual Studio 
12.0\VC\bin"

When running the same code now I get:
DEBUG: nvcc STDOUT mod.cu

   Creating library 
C:/Users/lukas/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-SP0-Intel64_Family_6_Model_78_Stepping_3_GenuineIntel-3.5.2-64/tmpoavpotbu/m91973e5c136ea49268a916ff971b7377.lib
 
and object 
C:/Users/lukas/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-SP0-Intel64_Family_6_Model_78_Stepping_3_GenuineIntel-3.5.2-64/tmpoavpotbu/m91973e5c136ea49268a916ff971b7377.exp


Using gpu device 0: GeForce 940M (CNMeM is enabled with initial size: 80.0% 
of memory, cuDNN 5005)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), 
HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.910419 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
[Finished in 17.7s]

I figure that I could ignore the message: "DEBUG: nvcc STDOUT mod.cu..." So 
this seems to resolve the problem but my compile time for this simple code 
has nearly tripled. Is there a way to have compiler_bindir enabled since 
that seems to yield faster compile time and does not produce the "DEBUG: 
nvcc STDOUT mod.cu message"?


-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to