[theano-users] Re: CUDNN_STATUS_EXECUTION_FAILED when using dnn.conv.algo_bwd_filter=deterministic

Pascal Lamblin Mon, 19 Jun 2017 12:39:36 -0700

Hi,

Unfortunately, it looks like a runtime issue in cuDNN rather than somehting 
in the Theano wrapper, but I could be wrong.
A recent PR introduced more algorithms that you can specify for 
dnn.conv.algo_bwd_filter. In particular, 
dnn.conv.algo_bwd_filter=fft_tiling should be deterministic as well.


Does it work with an input and kernel that are smaller than 541211 on that 
dimension?
Does it work using corrMM instead of cuDNN?

On Wednesday, June 7, 2017 at 11:19:31 AM UTC-4, Fabian Stemmer wrote:
>
> Hi,
>
> I'm using theano.tensor.nnet.conv2d in my model and I want to set 
> dnn.conv.algo_bwd_filter=deterministic to make this run deterministically 
> on GPUs. I work on three different GPU architectures (K10, M40, P6000) and 
> setting the mentioned flag works well on the K10, but fails with error 
> message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried 
> several combinations of theano, nvidia driver and cuDNN versions, but none 
> fix the issue. 
>
> Below are details about the respective GPU configurations I tried and the 
> full error message. Any help you can give me is greatly appreciated.
>
> Thanks
> Fabian
>
>
> *Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit 
> 6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time 
> of this writing)
> cuda 8.0
> cuDNN 5105
> THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
> *dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for 
> theano 0.8.2
>
> *GPU and Nvidia driver:*
> Tesla K10 Architecture (Driver 361.93.03)
> Tesla M40 Architecture (Driver: 375.26)
> Quadro P6000 (Driver 375.26)
>
> Alternative driver versions (all tested on Tesla M40):
>
>    1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - No 
>    difference. Application fails on the M40 node
>    2. 375.26 - Current Production driver on M40/P100/P6000 servers - App 
>    fails
>    3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
>    4. 375.66 - Most recent official driver for Quadro/Tesla cards - App 
>    fails
>
> I also tried upgrading to cuDNN 6.0 and still got the same error.
>
>
> *Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
>
> Using cuDNN version 5105 on context None
> Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
> Traceback (most recent call last):
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
>  
> line 9, in <module>
>     load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
>  
> line 507, in main
>     valid_error, test_error = exp.run()
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
>  
> line 475, in run
>     return self.run_one(self.train_corpus, self.valid_corpus)
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
>  
> line 384, in run_one
>     learner.run()
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
>  
> line 448, in run
>     train_outputs = self.train(*batch)
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
>  
> line 898, in __call__
>     storage_map=getattr(self.fn, 'storage_map', None))
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
>  
> line 325, in raise_with_op
>     reraise(exc_type, exc_value, exc_trace)
>   File 
> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
>  
> line 884, in __call__
>     self.fn() if output_subset is None else\
> *RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
> Apply node that caused the error: GpuDnnConvGradW{algo='deterministic', 
> inplace=True}(GpuContiguous.0, GpuContiguous.0, 
> GpuAllocEmpty{dtype='float32', context_name=None}.0, 
> GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross', 
> precision='float32'}.0, Constant{1.0}, Constant{0.0})
> Toposort index: 234
> Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)), 
> GpuArrayType<None>(float32, (True, False, False, False)), 
> GpuArrayType<None>(float32, (False, True, False, False)), 
> <theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32), 
> Scalar(float32)]
> Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10), 
> 'No shapes', (), ()]
> Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, 4), 
> (120, 120, 40, 4), 'No strides', (), ()]
> Inputs values: ['not shown', 'not shown', 'not shown', <capsule object 
> NULL at 0x7ff55d00fe10>, 1.0, 0.0]
> Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::, 
> int64:int64:}(GpuAlloc<None>{memset_0=True}.0, 
> GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0}, 
> Constant{10})]]
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: CUDNN_STATUS_EXECUTION_FAILED when using dnn.conv.algo_bwd_filter=deterministic

Reply via email to