Thanks for the update.
I managed to reproduce the issue with cuDNN v6 as well, with a simple 
script (below).
- with 'deterministic' it fails with CUDNN_STATUS_EXECUTION_FAILED
- with 'fft_tiling' it fails with CUDNN_STATUS_NOT_SUPPORTED
- with 'fft', surprisingly, it works. 'fft' is supposed to be deterministic 
as well, so you could also use that one.

Thanks for the report, we'll forward that to Nvidia.

```
import theano
import numpy as np


x = theano.shared(np.ones((1, 1, 541211, 10), 'f'))
y = theano.shared(np.ones((1, 50, 541211, 1), 'f'))
z = theano.tensor.nnet.abstract_conv.conv2d_grad_wrt_weights(x, y, 
filter_shape=(50, 1, 3, 10), border_mode=(1, 0), filter_flip=False)
f = theano.function([], z)
f()
```

On Tuesday, June 20, 2017 at 5:28:34 AM UTC-4, Fabian Stemmer wrote:
>
> I tried using cudnn v6, but still got the same error.
>
> I also added 'fft_tiling' to SUPPORTED_DNN_CONV_ALGO_RUNTIME in 
> cofigdefaults.py, to be able to test it, but still got the cuDNN error (see 
> below).
>
> I then added 'optimizer_excluding=conv_dnn' to my THEANO_FLAGS, which gave 
> me GpuCorrMM nodes in the computational graph. This runs without errors.
>
> GpuCorrMM gives me deterministic results, so I can use it as an 
> alternative to the deterministic cuDNN algorithm.
>
> Thanks for your help.
>
> On Tuesday, June 20, 2017 at 12:15:55 AM UTC+2, nouiz wrote:
>>
>> Try cudnn v6. The GPU that have problem are more recent. Maybe it was not 
>> implemented case in v5.
>>
>> Le lun. 19 juin 2017 16:02, Pascal Lamblin <[email protected]> a 
>> écrit :
>>
>>>
>>>
>>> On Monday, June 19, 2017 at 3:39:17 PM UTC-4, Pascal Lamblin wrote:
>>>>
>>>> Hi,
>>>>
>>>> Unfortunately, it looks like a runtime issue in cuDNN rather than 
>>>> somehting in the Theano wrapper, but I could be wrong.
>>>> A recent PR introduced more algorithms that you can specify for 
>>>> dnn.conv.algo_bwd_filter. In particular, 
>>>> dnn.conv.algo_bwd_filter=fft_tiling should be deterministic as well.
>>>>
>>>
>>> Actually, I just realized the value gets rejected by the configuration, 
>>> but if we bypass it in theano/configdefaults.py it should work. This should 
>>> be fixed soon.
>>>  
>>>
>>>>
>>>> Does it work with an input and kernel that are smaller than 541211 on 
>>>> that dimension?
>>>> Does it work using corrMM instead of cuDNN?
>>>>
>>>> On Wednesday, June 7, 2017 at 11:19:31 AM UTC-4, Fabian Stemmer wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using theano.tensor.nnet.conv2d in my model and I want to set 
>>>>> dnn.conv.algo_bwd_filter=deterministic to make this run deterministically 
>>>>> on GPUs. I work on three different GPU architectures (K10, M40, P6000) 
>>>>> and 
>>>>> setting the mentioned flag works well on the K10, but fails with error 
>>>>> message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried 
>>>>> several combinations of theano, nvidia driver and cuDNN versions, but 
>>>>> none 
>>>>> fix the issue. 
>>>>>
>>>>> Below are details about the respective GPU configurations I tried and 
>>>>> the full error message. Any help you can give me is greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Fabian
>>>>>
>>>>>
>>>>> *Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit 
>>>>> 6b59449186b04225484b98951192c5867e0719ca, which was the latest at the 
>>>>> time 
>>>>> of this writing)
>>>>> cuda 8.0
>>>>> cuDNN 5105
>>>>> THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
>>>>> *dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for 
>>>>> theano 0.8.2
>>>>>
>>>>> *GPU and Nvidia driver:*
>>>>> Tesla K10 Architecture (Driver 361.93.03)
>>>>> Tesla M40 Architecture (Driver: 375.26)
>>>>> Quadro P6000 (Driver 375.26)
>>>>>
>>>>> Alternative driver versions (all tested on Tesla M40):
>>>>>
>>>>>    1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - 
>>>>>    No difference. Application fails on the M40 node
>>>>>    2. 375.26 - Current Production driver on M40/P100/P6000 servers - 
>>>>>    App fails
>>>>>    3. 375.51 - Most recent driver with CUDA Repo equivalent - App 
>>>>>    fails
>>>>>    4. 375.66 - Most recent official driver for Quadro/Tesla cards - 
>>>>>    App fails
>>>>>
>>>>> I also tried upgrading to cuDNN 6.0 and still got the same error.
>>>>>
>>>>>
>>>>> *Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
>>>>>
>>>>> Using cuDNN version 5105 on context None
>>>>> Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
>>>>> Traceback (most recent call last):
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
>>>>>  
>>>>> line 9, in <module>
>>>>>     load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
>>>>>  
>>>>> line 507, in main
>>>>>     valid_error, test_error = exp.run()
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
>>>>>  
>>>>> line 475, in run
>>>>>     return self.run_one(self.train_corpus, self.valid_corpus)
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
>>>>>  
>>>>> line 384, in run_one
>>>>>     learner.run()
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
>>>>>  
>>>>> line 448, in run
>>>>>     train_outputs = self.train(*batch)
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
>>>>>  
>>>>> line 898, in __call__
>>>>>     storage_map=getattr(self.fn, 'storage_map', None))
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
>>>>>  
>>>>> line 325, in raise_with_op
>>>>>     reraise(exc_type, exc_value, exc_trace)
>>>>>   File 
>>>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
>>>>>  
>>>>> line 884, in __call__
>>>>>     self.fn() if output_subset is None else\
>>>>> *RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
>>>>> Apply node that caused the error: 
>>>>> GpuDnnConvGradW{algo='deterministic', inplace=True}(GpuContiguous.0, 
>>>>> GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, 
>>>>> GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross', 
>>>>> precision='float32'}.0, Constant{1.0}, Constant{0.0})
>>>>> Toposort index: 234
>>>>> Inputs types: [GpuArrayType<None>(float32, (True, True, False, 
>>>>> False)), GpuArrayType<None>(float32, (True, False, False, False)), 
>>>>> GpuArrayType<None>(float32, (False, True, False, False)), 
>>>>> <theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32), 
>>>>> Scalar(float32)]
>>>>> Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 
>>>>> 10), 'No shapes', (), ()]
>>>>> Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, 
>>>>> 4), (120, 120, 40, 4), 'No strides', (), ()]
>>>>> Inputs values: ['not shown', 'not shown', 'not shown', <capsule object 
>>>>> NULL at 0x7ff55d00fe10>, 1.0, 0.0]
>>>>> Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::, 
>>>>> int64:int64:}(GpuAlloc<None>{memset_0=True}.0, 
>>>>> GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0}, 
>>>>> Constant{10})]]
>>>>>
>>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to