Could you try running the whole thing inside cuda-memcheck?
On Wed, Apr 05, 2017, Sergey Ovcharenko wrote:
> Hi,
>
> I'm struggling to get a theano graph spread over two GPU's working, but I
> keep encountering the GpuArrayException: b'an illegal memory access was
> encountered' error (full traceback is in the end of this email).
> The basic idea is to do a forward pass through two neural networks, each
> located on a separate device each and combine the outputs.
>
> I'm using the latest Theano, libgpuarray and Lasagne to build the networks,
> and have hacked Lasagne a bit to able to pass target='device' to the shared
> variable constructor during weights initialization.
>
> I have THEANO_FLAGS="contexts=dev1->cuda1;dev2->cuda2" and the output after
> theano import is:
> Using cuDNN version 5005 on context None
> Mapped name None to device cuda: GeForce GTX 980 (0000:0A:00.0)
> Using cuDNN version 5005 on context dev1
> Mapped name dev1 to device cuda1: GeForce GTX 980 (0000:09:00.0)
> Using cuDNN version 5005 on context dev2
> Mapped name dev2 to device cuda2: GeForce GTX 980 (0000:06:00.0)
>
>
> The networks definition is quite lengthy (and doesn't always reproduce on
> toy graphs), so I'm providing a simpified example of what I'm doing.
> inp_0 = T.tensor4('inp0')
> r0 = build_model('dev1', input_var=inp_0)
> inp_1 = T.tensor4('inp1')
> r1 = build_model("dev2", input_var=inp_1)
>
> r0_out = lasagne.layers.get_output(r0['fc6'], deterministic=False)
> r1_out = lasagne.layers.get_output(r1['fc6'], deterministic=False)
>
> train_r0 = theano.function(
> [inp_0, inp_1],
> [r0_out, r1_out]
> )
>
> result0 = train_r0(x, x2)
> This code fails with the aforementioned error.
>
> I've also tried to compile a separate function for each of the networks,
> like
> train_r0 = theano.function(
> [inp_0],
> [r0_out]
> )
>
> train_r1 = theano.function(
> [inp_1],
> [r1_out]
> )
>
> And running either train_r0 or train_r1 fails. But compiling and running a
> single function (no matter train_r0 or train_r1) works just fine.
> Could someone help me debug this? Please let me know if I should provide
> additional code/info.
>
> Thanks,
> Sergey.
>
> The full traceback:
>
> RuntimeError Traceback (most recent call last)
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
> in __call__(self, *args, **kwargs)
> 883 outputs =\
> --> 884 self.fn() if output_subset is None else\
> 885 self.fn(output_subset=output_subset)
>
> RuntimeError: Error in the elemwise call
>
> During handling of the above exception, another exception occurred:
>
> GpuArrayException Traceback (most recent call last)
> <ipython-input-11-902c3b4617f7> in <module>()
> ----> 1 result0 = train_r0(x, x2)
> 2 #result1 = train_r1(x2)
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
> in __call__(self, *args, **kwargs)
> 896 node=self.fn.nodes[self.fn.position_of_error],
> 897 thunk=thunk,
> --> 898 storage_map=getattr(self.fn, 'storage_map', None))
> 899 else:
> 900 # old-style linkers raise their own exceptions
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/link.py
> in raise_with_op(node, thunk, exc_info, storage_map)
> 139
> 140 hints = []
> --> 141 detailed_err_msg = "\nApply node that caused the error: " +
> str(node)
> 142 if exc_value.__applynode_index__ is not None:
> 143 detailed_err_msg += "\nToposort index: %d" % node_index
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
> in __str__(self)
> 178
> 179 def __str__(self):
> --> 180 return op_as_string(self.inputs, self)
> 181
> 182 def __repr__(self):
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
> in op_as_string(i, op, leaf_formatter, node_formatter)
> 1256 between i and o
> 1257 """
> -> 1258 strs = as_string(i, op.inputs, leaf_formatter, node_formatter)
> 1259 return node_formatter(op, strs)
> 1260
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
> in as_string(i, o, leaf_formatter, node_formatter)
> 1336 return leaf_formatter(r)
> 1337
> -> 1338 return [describe(output) for output in o]
> 1339
> 1340
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
> in <listcomp>(.0)
> 1336 return leaf_formatter(r)
> 1337
> -> 1338 return [describe(output) for output in o]
> 1339
> 1340
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
> in describe(r)
> 1334 return s
> 1335 else:
> -> 1336 return leaf_formatter(r)
> 1337
> 1338 return [describe(output) for output in o]
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gpuarray/type.py
> in __str__(self)
> 604 except gpuarray.GpuArrayException:
> 605 np_data = self.data
> --> 606 return "GpuArrayConstant{%s}" % np_data
> 607
> 608
>
> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__str__
> (pygpu/gpuarray.c:28703)()
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/numpy/core/numeric.py
> in asarray(a, dtype, order)
> 529
> 530 """
> --> 531 return array(a, dtype, copy=False, order=order)
> 532
> 533
>
> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__array__
> (pygpu/gpuarray.c:21616)()
>
> pygpu/gpuarray.pyx in pygpu.gpuarray._pygpu_as_ndarray
> (pygpu/gpuarray.c:18322)()
>
> pygpu/gpuarray.pyx in pygpu.gpuarray.array_read (pygpu/gpuarray.c:6923)()
>
> GpuArrayException: b'an illegal memory access was encountered'
>
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
--
Pascal
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.