Might be related:  https://github.com/Theano/libgpuarray/issues/404

On Tuesday, April 11, 2017 at 8:11:28 AM UTC-7, nouiz wrote:
>
> What is your cuda version? Can you update to cuda 8? Can you update cudnn 
> to version 6?
>
> It seem the error is inside cudnn, so updating it could fix the problem.
>
> Fred
>
> On Wed, Apr 5, 2017 at 12:54 PM Sergey Ovcharenko <ovchare...@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I'm struggling to get a theano graph spread over two GPU's working, but I 
>> keep encountering the GpuArrayException: b'an illegal memory access was 
>> encountered' error (full traceback is in the end of this email).
>> The basic idea is to do a forward pass through two neural networks, each 
>> located on a separate device each and combine the outputs.
>>
>> I'm using the latest Theano, libgpuarray and Lasagne to build the 
>> networks, and have hacked Lasagne a bit to able to pass target='device' to 
>> the shared variable constructor during weights initialization. 
>>
>> I have THEANO_FLAGS="contexts=dev1->cuda1;dev2->cuda2" and the output 
>> after theano import is:
>> Using cuDNN version 5005 on context None 
>> Mapped name None to device cuda: GeForce GTX 980 (0000:0A:00.0) 
>> Using cuDNN version 5005 on context dev1 
>> Mapped name dev1 to device cuda1: GeForce GTX 980 (0000:09:00.0) 
>> Using cuDNN version 5005 on context dev2 
>> Mapped name dev2 to device cuda2: GeForce GTX 980 (0000:06:00.0)
>>
>>
>> The networks definition is quite lengthy (and doesn't always reproduce on 
>> toy graphs), so I'm providing a simpified example of what I'm doing. 
>> inp_0 = T.tensor4('inp0')
>> r0 = build_model('dev1', input_var=inp_0)
>> inp_1 = T.tensor4('inp1')
>> r1 = build_model("dev2", input_var=inp_1)
>>
>> r0_out = lasagne.layers.get_output(r0['fc6'], deterministic=False)
>> r1_out = lasagne.layers.get_output(r1['fc6'], deterministic=False)
>>
>> train_r0 = theano.function(
>>     [inp_0, inp_1],
>>     [r0_out, r1_out]
>> )
>>
>> result0 = train_r0(x, x2)
>> This code fails with the aforementioned error.
>>
>> I've also tried to compile a separate function for each of the networks, 
>> like
>> train_r0 = theano.function(
>>     [inp_0],
>>     [r0_out]
>> )
>>
>> train_r1 = theano.function(
>>     [inp_1],
>>     [r1_out]
>> )
>>
>> And running either train_r0 or train_r1 fails. But compiling and running 
>> a single function (no matter train_r0 or train_r1) works just fine.
>> Could someone help me debug this? Please let me know if I should provide 
>> additional code/info.
>>
>> Thanks,
>> Sergey.
>>
>> The full traceback:
>>
>> RuntimeError                              Traceback (most recent call last)
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>>  in __call__(self, *args, **kwargs)
>>     883             outputs =\
>> --> 884                 self.fn() if output_subset is None else\
>>     885                 self.fn(output_subset=output_subset)
>>
>> RuntimeError: Error in the elemwise call
>>
>> During handling of the above exception, another exception occurred:
>>
>> GpuArrayException                         Traceback (most recent call last)
>> <ipython-input-11-902c3b4617f7> in <module>()
>> ----> 1 result0 = train_r0(x, x2)
>>       2 #result1 = train_r1(x2)
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/compile/function_module.py
>>  in __call__(self, *args, **kwargs)
>>     896                     node=self.fn.nodes[self.fn.position_of_error],
>>     897                     thunk=thunk,
>> --> 898                     storage_map=getattr(self.fn, 'storage_map', 
>> None))
>>     899             else:
>>     900                 # old-style linkers raise their own exceptions
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/link.py
>>  in raise_with_op(node, thunk, exc_info, storage_map)
>>     139 
>>     140     hints = []
>> --> 141     detailed_err_msg = "\nApply node that caused the error: " + 
>> str(node)
>>     142     if exc_value.__applynode_index__ is not None:
>>     143         detailed_err_msg += "\nToposort index: %d" % node_index
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>>  in __str__(self)
>>     178 
>>     179     def __str__(self):
>> --> 180         return op_as_string(self.inputs, self)
>>     181 
>>     182     def __repr__(self):
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>>  in op_as_string(i, op, leaf_formatter, node_formatter)
>>    1256     between i and o
>>    1257     """
>> -> 1258     strs = as_string(i, op.inputs, leaf_formatter, node_formatter)
>>    1259     return node_formatter(op, strs)
>>    1260 
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>>  in as_string(i, o, leaf_formatter, node_formatter)
>>    1336             return leaf_formatter(r)
>>    1337 
>> -> 1338     return [describe(output) for output in o]
>>    1339 
>>    1340 
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>>  in <listcomp>(.0)
>>    1336             return leaf_formatter(r)
>>    1337 
>> -> 1338     return [describe(output) for output in o]
>>    1339 
>>    1340 
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gof/graph.py
>>  in describe(r)
>>    1334                     return s
>>    1335         else:
>> -> 1336             return leaf_formatter(r)
>>    1337 
>>    1338     return [describe(output) for output in o]
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/theano/gpuarray/type.py
>>  in __str__(self)
>>     604         except gpuarray.GpuArrayException:
>>     605             np_data = self.data
>> --> 606         return "GpuArrayConstant{%s}" % np_data
>>     607 
>>     608 
>>
>> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__str__ 
>> (pygpu/gpuarray.c:28703)()
>>
>> /home/facenx/.virtualenvs/multitheano/lib/python3.5/site-packages/numpy/core/numeric.py
>>  in asarray(a, dtype, order)
>>     529 
>>     530     """
>> --> 531     return array(a, dtype, copy=False, order=order)
>>     532 
>>     533 
>>
>> pygpu/gpuarray.pyx in pygpu.gpuarray.GpuArray.__array__ 
>> (pygpu/gpuarray.c:21616)()
>>
>> pygpu/gpuarray.pyx in pygpu.gpuarray._pygpu_as_ndarray 
>> (pygpu/gpuarray.c:18322)()
>>
>> pygpu/gpuarray.pyx in pygpu.gpuarray.array_read (pygpu/gpuarray.c:6923)()
>>
>> GpuArrayException: b'an illegal memory access was encountered'
>>
>>
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to theano-users...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to