Sure, cuda-memcheck log is full of this:

========= CUDA-MEMCHECK
========= Invalid __global__ write of size 4
=========     at 0x00000310 in 
cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
=========     by thread (95,0,0) in block (63,0,0)
=========     Address 0x2316ffa5bc is out of bounds
=========     Saved host backtrace up to driver entry point at kernel 
launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 
(cuLaunchKernel + 0x2c5) [0x204205]
=========     Host Frame:/usr/local/cuda/lib64/libcudnn.so.5 [0x4ca501]
=========     Host Frame:/usr/local/cuda/lib64/libcudnn.so.5 [0x4e68d3]
=========     Host Frame:/usr/local/cuda/lib64/libcudnn.so.5 [0xf959e]
=========     Host Frame:/usr/local/cuda/lib64/libcudnn.so.5 [0xa6883]
=========     Host Frame:/usr/local/cuda/lib64/libcudnn.so.5 
(cudnnConvolutionForward + 0x7a9) [0x3a4b9]
=========     Host 
Frame:/home/facenx/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-3.5.2-64/tmpofz7z19g/m79b38ce26ae216596dbaccfe67469d8b.so
 
[0x2b9b]
=========     Host 
Frame:/home/facenx/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-3.5.2-64/lazylinker_ext/lazylinker_ext.so
 
[0x3d5c]
=========     Host 
Frame:/home/facenx/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-3.5.2-64/lazylinker_ext/lazylinker_ext.so
 
[0x47c7]


There's about 1K of such messages, with the address that is out of bounds 
gradually increasing by 4.

Regards,
Sergey.


суббота, 8 апреля 2017 г., 3:45:10 UTC+3 пользователь Pascal Lamblin 
написал:
>
> Could you try running the whole thing inside cuda-memcheck? 
>
> On Wed, Apr 05, 2017, Sergey Ovcharenko wrote: 
> > Hi, 
> > 
> > I'm struggling to get a theano graph spread over two GPU's working, but 
> I 
> > keep encountering the GpuArrayException: b'an illegal memory access was 
> > encountered' error (full traceback is in the end of this email). 
> > The basic idea is to do a forward pass through two neural networks, each 
> > located on a separate device each and combine the outputs. 
> > 
> > I'm using the latest Theano, libgpuarray and Lasagne to build the 
> networks, 
> > and have hacked Lasagne a bit to able to pass target='device' to the 
> shared 
> > variable constructor during weights initialization. 
> > 
> > I have THEANO_FLAGS="contexts=dev1->cuda1;dev2->cuda2" and the output 
> after 
> > theano import is: 
> > Using cuDNN version 5005 on context None 
> > Mapped name None to device cuda: GeForce GTX 980 (0000:0A:00.0) 
> > Using cuDNN version 5005 on context dev1 
> > Mapped name dev1 to device cuda1: GeForce GTX 980 (0000:09:00.0) 
> > Using cuDNN version 5005 on context dev2 
> > Mapped name dev2 to device cuda2: GeForce GTX 980 (0000:06:00.0) 
> > 
> > 
> > The networks definition is quite lengthy (and doesn't always reproduce 
> on 
> > toy graphs), so I'm providing a simpified example of what I'm doing. 
> > inp_0 = T.tensor4('inp0') 
> > r0 = build_model('dev1', input_var=inp_0) 
> > inp_1 = T.tensor4('inp1') 
> > r1 = build_model("dev2", input_var=inp_1) 
> > 
> > r0_out = lasagne.layers.get_output(r0['fc6'], deterministic=False) 
> > r1_out = lasagne.layers.get_output(r1['fc6'], deterministic=False) 
> > 
> > train_r0 = theano.function( 
> >     [inp_0, inp_1], 
> >     [r0_out, r1_out] 
> > ) 
> > 
> > result0 = train_r0(x, x2) 
> > This code fails with the aforementioned error. 
> > 
> > I've also tried to compile a separate function for each of the networks, 
> > like 
> > train_r0 = theano.function( 
> >     [inp_0], 
> >     [r0_out] 
> > ) 
> > 
> > train_r1 = theano.function( 
> >     [inp_1], 
> >     [r1_out] 
> > ) 
> > 
> > And running either train_r0 or train_r1 fails. But compiling and running 
> a 
> > single function (no matter train_r0 or train_r1) works just fine. 
> > Could someone help me debug this? Please let me know if I should provide 
> > additional code/info. 
> > 
> > Thanks, 
> > Sergey. 
> > 
> > The full traceback: 
> > 
> > RuntimeError                              Traceback (most recent call 
> last) 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>  
> in __call__(self, *args, **kwargs) 
> >     883             outputs =\ 
> > --> 884                 self.fn() if output_subset is None else\ 
> >     885                 self.fn(output_subset=output_subset) 
> > 
> > RuntimeError: Error in the elemwise call 
> > 
> > During handling of the above exception, another exception occurred: 
> > 
> > GpuArrayException                         Traceback (most recent call 
> last) 
> > <ipython-input-11-902c3b4617f7> in <module>() 
> > ----> 1 result0 = train_r0(x, x2) 
> >       2 #result1 = train_r1(x2) 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>  
> in __call__(self, *args, **kwargs) 
> >     896                     
> node=self.fn.nodes[self.fn.position_of_error], 
> >     897                     thunk=thunk, 
> > --> 898                     storage_map=getattr(self.fn, 'storage_map', 
> None)) 
> >     899             else: 
> >     900                 # old-style linkers raise their own exceptions 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/link.py
>  
> in raise_with_op(node, thunk, exc_info, storage_map) 
> >     139 
> >     140     hints = [] 
> > --> 141     detailed_err_msg = "\nApply node that caused the error: " + 
> str(node) 
> >     142     if exc_value.__applynode_index__ is not None: 
> >     143         detailed_err_msg += "\nToposort index: %d" % node_index 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  
> in __str__(self) 
> >     178 
> >     179     def __str__(self): 
> > --> 180         return op_as_string(self.inputs, self) 
> >     181 
> >     182     def __repr__(self): 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  
> in op_as_string(i, op, leaf_formatter, node_formatter) 
> >    1256     between i and o 
> >    1257     """ 
> > -> 1258     strs = as_string(i, op.inputs, leaf_formatter, 
> node_formatter) 
> >    1259     return node_formatter(op, strs) 
> >    1260 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  
> in as_string(i, o, leaf_formatter, node_formatter) 
> >    1336             return leaf_formatter(r) 
> >    1337 
> > -> 1338     return [describe(output) for output in o] 
> >    1339 
> >    1340 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  
> in <listcomp>(.0) 
> >    1336             return leaf_formatter(r) 
> >    1337 
> > -> 1338     return [describe(output) for output in o] 
> >    1339 
> >    1340 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  
> in describe(r) 
> >    1334                     return s 
> >    1335         else: 
> > -> 1336             return leaf_formatter(r) 
> >    1337 
> >    1338     return [describe(output) for output in o] 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gpuarray/type.py
>  
> in __str__(self) 
> >     604         except gpuarray.GpuArrayException: 
> >     605             np_data = self.data 
> > --> 606         return "GpuArrayConstant{%s}" % np_data 
> >     607 
> >     608 
> > 
> > pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__str__ 
> (pygpu/gpuarray.c:28703)() 
> > 
> > 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/numpy/core/numeric.py
>  
> in asarray(a, dtype, order) 
> >     529 
> >     530     """ 
> > --> 531     return array(a, dtype, copy=False, order=order) 
> >     532 
> >     533 
> > 
> > pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__array__ 
> (pygpu/gpuarray.c:21616)() 
> > 
> > pygpu/gpuarray.pyx in pygpu.gpuarray._pygpu_as_ndarray 
> (pygpu/gpuarray.c:18322)() 
> > 
> > pygpu/gpuarray.pyx in pygpu.gpuarray.array_read 
> (pygpu/gpuarray.c:6923)() 
> > 
> > GpuArrayException: b'an illegal memory access was encountered' 
> > 
> > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "theano-users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to theano-users...@googlegroups.com <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>
>
> -- 
> Pascal 
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to