Do you have a script we could run to reproduce it? It would help
investigate this. Also give us the Theano flags that you use to reproduce
this.

thanks

Fred

On Wed, Apr 5, 2017 at 12:54 PM Sergey Ovcharenko <[email protected]>
wrote:

> Hi,
>
> I'm struggling to get a theano graph spread over two GPU's working, but I
> keep encountering the GpuArrayException: b'an illegal memory access was
> encountered' error (full traceback is in the end of this email).
> The basic idea is to do a forward pass through two neural networks, each
> located on a separate device each and combine the outputs.
>
> I'm using the latest Theano, libgpuarray and Lasagne to build the
> networks, and have hacked Lasagne a bit to able to pass target='device' to
> the shared variable constructor during weights initialization.
>
> I have THEANO_FLAGS="contexts=dev1->cuda1;dev2->cuda2" and the output
> after theano import is:
> Using cuDNN version 5005 on context None
> Mapped name None to device cuda: GeForce GTX 980 (0000:0A:00.0)
> Using cuDNN version 5005 on context dev1
> Mapped name dev1 to device cuda1: GeForce GTX 980 (0000:09:00.0)
> Using cuDNN version 5005 on context dev2
> Mapped name dev2 to device cuda2: GeForce GTX 980 (0000:06:00.0)
>
>
> The networks definition is quite lengthy (and doesn't always reproduce on
> toy graphs), so I'm providing a simpified example of what I'm doing.
> inp_0 = T.tensor4('inp0')
> r0 = build_model('dev1', input_var=inp_0)
> inp_1 = T.tensor4('inp1')
> r1 = build_model("dev2", input_var=inp_1)
>
> r0_out = lasagne.layers.get_output(r0['fc6'], deterministic=False)
> r1_out = lasagne.layers.get_output(r1['fc6'], deterministic=False)
>
> train_r0 = theano.function(
>     [inp_0, inp_1],
>     [r0_out, r1_out]
> )
>
> result0 = train_r0(x, x2)
> This code fails with the aforementioned error.
>
> I've also tried to compile a separate function for each of the networks,
> like
> train_r0 = theano.function(
>     [inp_0],
>     [r0_out]
> )
>
> train_r1 = theano.function(
>     [inp_1],
>     [r1_out]
> )
>
> And running either train_r0 or train_r1 fails. But compiling and running a
> single function (no matter train_r0 or train_r1) works just fine.
> Could someone help me debug this? Please let me know if I should provide
> additional code/info.
>
> Thanks,
> Sergey.
>
> The full traceback:
>
> RuntimeError                              Traceback (most recent call last)
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)
>     883             outputs =\
> --> 884                 self.fn() if output_subset is None else\
>     885                 self.fn(output_subset=output_subset)
>
> RuntimeError: Error in the elemwise call
>
> During handling of the above exception, another exception occurred:
>
> GpuArrayException                         Traceback (most recent call last)
> <ipython-input-11-902c3b4617f7> in <module>()
> ----> 1 result0 = train_r0(x, x2)
>       2 #result1 = train_r1(x2)
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)
>     896                     node=self.fn.nodes[self.fn.position_of_error],
>     897                     thunk=thunk,
> --> 898                     storage_map=getattr(self.fn, 'storage_map', None))
>     899             else:
>     900                 # old-style linkers raise their own exceptions
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/link.py
>  in raise_with_op(node, thunk, exc_info, storage_map)
>     139
>     140     hints = []
> --> 141     detailed_err_msg = "\nApply node that caused the error: " + 
> str(node)
>     142     if exc_value.__applynode_index__ is not None:
>     143         detailed_err_msg += "\nToposort index: %d" % node_index
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in __str__(self)
>     178
>     179     def __str__(self):
> --> 180         return op_as_string(self.inputs, self)
>     181
>     182     def __repr__(self):
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in op_as_string(i, op, leaf_formatter, node_formatter)
>    1256     between i and o
>    1257     """
> -> 1258     strs = as_string(i, op.inputs, leaf_formatter, node_formatter)
>    1259     return node_formatter(op, strs)
>    1260
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in as_string(i, o, leaf_formatter, node_formatter)
>    1336             return leaf_formatter(r)
>    1337
> -> 1338     return [describe(output) for output in o]
>    1339
>    1340
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in <listcomp>(.0)
>    1336             return leaf_formatter(r)
>    1337
> -> 1338     return [describe(output) for output in o]
>    1339
>    1340
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>  in describe(r)
>    1334                     return s
>    1335         else:
> -> 1336             return leaf_formatter(r)
>    1337
>    1338     return [describe(output) for output in o]
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gpuarray/type.py
>  in __str__(self)
>     604         except gpuarray.GpuArrayException:
>     605             np_data = self.data
> --> 606         return "GpuArrayConstant{%s}" % np_data
>     607
>     608
>
> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__str__ 
> (pygpu/gpuarray.c:28703)()
>
> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/numpy/core/numeric.py
>  in asarray(a, dtype, order)
>     529
>     530     """
> --> 531     return array(a, dtype, copy=False, order=order)
>     532
>     533
>
> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__array__ 
> (pygpu/gpuarray.c:21616)()
>
> pygpu/gpuarray.pyx in pygpu.gpuarray._pygpu_as_ndarray 
> (pygpu/gpuarray.c:18322)()
>
> pygpu/gpuarray.pyx in pygpu.gpuarray.array_read (pygpu/gpuarray.c:6923)()
>
> GpuArrayException: b'an illegal memory access was encountered'
>
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to