I tried using cudnn v6, but still got the same error. I also added 'fft_tiling' to SUPPORTED_DNN_CONV_ALGO_RUNTIME in cofigdefaults.py, to be able to test it, but still got the cuDNN error (see below).
I then added 'optimizer_excluding=conv_dnn' to my THEANO_FLAGS, which gave me GpuCorrMM nodes in the computational graph. This runs without errors. GpuCorrMM gives me deterministic results, so I can use it as an alternative to the deterministic cuDNN algorithm. Thanks for your help. On Tuesday, June 20, 2017 at 12:15:55 AM UTC+2, nouiz wrote: > > Try cudnn v6. The GPU that have problem are more recent. Maybe it was not > implemented case in v5. > > Le lun. 19 juin 2017 16:02, Pascal Lamblin <[email protected] > <javascript:>> a écrit : > >> >> >> On Monday, June 19, 2017 at 3:39:17 PM UTC-4, Pascal Lamblin wrote: >>> >>> Hi, >>> >>> Unfortunately, it looks like a runtime issue in cuDNN rather than >>> somehting in the Theano wrapper, but I could be wrong. >>> A recent PR introduced more algorithms that you can specify for >>> dnn.conv.algo_bwd_filter. In particular, >>> dnn.conv.algo_bwd_filter=fft_tiling should be deterministic as well. >>> >> >> Actually, I just realized the value gets rejected by the configuration, >> but if we bypass it in theano/configdefaults.py it should work. This should >> be fixed soon. >> >> >>> >>> Does it work with an input and kernel that are smaller than 541211 on >>> that dimension? >>> Does it work using corrMM instead of cuDNN? >>> >>> On Wednesday, June 7, 2017 at 11:19:31 AM UTC-4, Fabian Stemmer wrote: >>>> >>>> Hi, >>>> >>>> I'm using theano.tensor.nnet.conv2d in my model and I want to set >>>> dnn.conv.algo_bwd_filter=deterministic to make this run deterministically >>>> on GPUs. I work on three different GPU architectures (K10, M40, P6000) and >>>> setting the mentioned flag works well on the K10, but fails with error >>>> message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried >>>> several combinations of theano, nvidia driver and cuDNN versions, but none >>>> fix the issue. >>>> >>>> Below are details about the respective GPU configurations I tried and >>>> the full error message. Any help you can give me is greatly appreciated. >>>> >>>> Thanks >>>> Fabian >>>> >>>> >>>> *Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit >>>> 6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time >>>> of this writing) >>>> cuda 8.0 >>>> cuDNN 5105 >>>> THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1, >>>> *dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for >>>> theano 0.8.2 >>>> >>>> *GPU and Nvidia driver:* >>>> Tesla K10 Architecture (Driver 361.93.03) >>>> Tesla M40 Architecture (Driver: 375.26) >>>> Quadro P6000 (Driver 375.26) >>>> >>>> Alternative driver versions (all tested on Tesla M40): >>>> >>>> 1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - >>>> No difference. Application fails on the M40 node >>>> 2. 375.26 - Current Production driver on M40/P100/P6000 servers - >>>> App fails >>>> 3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails >>>> 4. 375.66 - Most recent official driver for Quadro/Tesla cards - >>>> App fails >>>> >>>> I also tried upgrading to cuDNN 6.0 and still got the same error. >>>> >>>> >>>> *Full error message (on Quadro P6000, using theano 0.10.0.dev1:* >>>> >>>> Using cuDNN version 5105 on context None >>>> Mapped name None to device cuda: Quadro P6000 (0000:04:00.0) >>>> Traceback (most recent call last): >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train", >>>> >>>> line 9, in <module> >>>> load_entry_point('n3lu', 'console_scripts', 'n3lu_train')() >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py", >>>> >>>> line 507, in main >>>> valid_error, test_error = exp.run() >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py", >>>> >>>> line 475, in run >>>> return self.run_one(self.train_corpus, self.valid_corpus) >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py", >>>> >>>> line 384, in run_one >>>> learner.run() >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py", >>>> >>>> line 448, in run >>>> train_outputs = self.train(*batch) >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py", >>>> >>>> line 898, in __call__ >>>> storage_map=getattr(self.fn, 'storage_map', None)) >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py", >>>> >>>> line 325, in raise_with_op >>>> reraise(exc_type, exc_value, exc_trace) >>>> File >>>> "/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py", >>>> >>>> line 884, in __call__ >>>> self.fn() if output_subset is None else\ >>>> *RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED* >>>> Apply node that caused the error: GpuDnnConvGradW{algo='deterministic', >>>> inplace=True}(GpuContiguous.0, GpuContiguous.0, >>>> GpuAllocEmpty{dtype='float32', context_name=None}.0, >>>> GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross', >>>> precision='float32'}.0, Constant{1.0}, Constant{0.0}) >>>> Toposort index: 234 >>>> Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)), >>>> GpuArrayType<None>(float32, (True, False, False, False)), >>>> GpuArrayType<None>(float32, (False, True, False, False)), >>>> <theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32), >>>> Scalar(float32)] >>>> Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10), >>>> 'No shapes', (), ()] >>>> Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, >>>> 4), (120, 120, 40, 4), 'No strides', (), ()] >>>> Inputs values: ['not shown', 'not shown', 'not shown', <capsule object >>>> NULL at 0x7ff55d00fe10>, 1.0, 0.0] >>>> Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::, >>>> int64:int64:}(GpuAlloc<None>{memset_0=True}.0, >>>> GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0}, >>>> Constant{10})]] >>>> >>>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "theano-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
