Could you try running the whole thing inside cuda-memcheck?

On Wed, Apr 05, 2017, Sergey Ovcharenko wrote:
> Hi,
> 
> I'm struggling to get a theano graph spread over two GPU's working, but I 
> keep encountering the GpuArrayException: b'an illegal memory access was 
> encountered' error (full traceback is in the end of this email).
> The basic idea is to do a forward pass through two neural networks, each 
> located on a separate device each and combine the outputs.
> 
> I'm using the latest Theano, libgpuarray and Lasagne to build the networks, 
> and have hacked Lasagne a bit to able to pass target='device' to the shared 
> variable constructor during weights initialization. 
> 
> I have THEANO_FLAGS="contexts=dev1->cuda1;dev2->cuda2" and the output after 
> theano import is:
> Using cuDNN version 5005 on context None 
> Mapped name None to device cuda: GeForce GTX 980 (0000:0A:00.0) 
> Using cuDNN version 5005 on context dev1 
> Mapped name dev1 to device cuda1: GeForce GTX 980 (0000:09:00.0) 
> Using cuDNN version 5005 on context dev2 
> Mapped name dev2 to device cuda2: GeForce GTX 980 (0000:06:00.0)
> 
> 
> The networks definition is quite lengthy (and doesn't always reproduce on 
> toy graphs), so I'm providing a simpified example of what I'm doing. 
> inp_0 = T.tensor4('inp0')
> r0 = build_model('dev1', input_var=inp_0)
> inp_1 = T.tensor4('inp1')
> r1 = build_model("dev2", input_var=inp_1)
> 
> r0_out = lasagne.layers.get_output(r0['fc6'], deterministic=False)
> r1_out = lasagne.layers.get_output(r1['fc6'], deterministic=False)
> 
> train_r0 = theano.function(
>     [inp_0, inp_1],
>     [r0_out, r1_out]
> )
> 
> result0 = train_r0(x, x2)
> This code fails with the aforementioned error.
> 
> I've also tried to compile a separate function for each of the networks, 
> like
> train_r0 = theano.function(
>     [inp_0],
>     [r0_out]
> )
> 
> train_r1 = theano.function(
>     [inp_1],
>     [r1_out]
> )
> 
> And running either train_r0 or train_r1 fails. But compiling and running a 
> single function (no matter train_r0 or train_r1) works just fine.
> Could someone help me debug this? Please let me know if I should provide 
> additional code/info.
> 
> Thanks,
> Sergey.
> 
> The full traceback:
> 
> RuntimeError                              Traceback (most recent call last)
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)
>     883             outputs =\
> --> 884                 self.fn() if output_subset is None else\
>     885                 self.fn(output_subset=output_subset)
> 
> RuntimeError: Error in the elemwise call
> 
> During handling of the above exception, another exception occurred:
> 
> GpuArrayException                         Traceback (most recent call last)
> <ipython-input-11-902c3b4617f7> in <module>()
> ----> 1 result0 = train_r0(x, x2)
>       2 #result1 = train_r1(x2)
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)
>     896                     node=self.fn.nodes[self.fn.position_of_error],
>     897                     thunk=thunk,
> --> 898                     storage_map=getattr(self.fn, 'storage_map', None))
>     899             else:
>     900                 # old-style linkers raise their own exceptions
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/link.py
>  in raise_with_op(node, thunk, exc_info, storage_map)
>     139 
>     140     hints = []
> --> 141     detailed_err_msg = "\nApply node that caused the error: " + 
> str(node)
>     142     if exc_value.__applynode_index__ is not None:
>     143         detailed_err_msg += "\nToposort index: %d" % node_index
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in __str__(self)
>     178 
>     179     def __str__(self):
> --> 180         return op_as_string(self.inputs, self)
>     181 
>     182     def __repr__(self):
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in op_as_string(i, op, leaf_formatter, node_formatter)
>    1256     between i and o
>    1257     """
> -> 1258     strs = as_string(i, op.inputs, leaf_formatter, node_formatter)
>    1259     return node_formatter(op, strs)
>    1260 
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in as_string(i, o, leaf_formatter, node_formatter)
>    1336             return leaf_formatter(r)
>    1337 
> -> 1338     return [describe(output) for output in o]
>    1339 
>    1340 
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in <listcomp>(.0)
>    1336             return leaf_formatter(r)
>    1337 
> -> 1338     return [describe(output) for output in o]
>    1339 
>    1340 
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in describe(r)
>    1334                     return s
>    1335         else:
> -> 1336             return leaf_formatter(r)
>    1337 
>    1338     return [describe(output) for output in o]
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gpuarray/type.py
>  in __str__(self)
>     604         except gpuarray.GpuArrayException:
>     605             np_data = self.data
> --> 606         return "GpuArrayConstant{%s}" % np_data
>     607 
>     608 
> 
> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__str__ 
> (pygpu/gpuarray.c:28703)()
> 
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/numpy/core/numeric.py
>  in asarray(a, dtype, order)
>     529 
>     530     """
> --> 531     return array(a, dtype, copy=False, order=order)
>     532 
>     533 
> 
> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__array__ 
> (pygpu/gpuarray.c:21616)()
> 
> pygpu/gpuarray.pyx in pygpu.gpuarray._pygpu_as_ndarray 
> (pygpu/gpuarray.c:18322)()
> 
> pygpu/gpuarray.pyx in pygpu.gpuarray.array_read (pygpu/gpuarray.c:6923)()
> 
> GpuArrayException: b'an illegal memory access was encountered'
> 
> 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to