Hi there!

I'm implementing a convolutional operation and I'm getting an unexpected 
error when I try perform a convolution on a Binomial sampled tensor.

The error is: 
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5

The error can be re-created with the following code (At least on my machine 
it can):

import numpy as np
import theano as th
from theano import tensor as T
from theano.tensor.shared_randomstreams import RandomStreams

rng = np.random.RandomState()
theano_rng = RandomStreams(rng.randint(2 ** 30))

th_input = T.tensor4()
th_filter = T.tensor4()

th_sampled = theano_rng.binomial(size=th_input.shape, n=1, p=th_input)
th_output = T.nnet.conv2d(th_sampled, th_filter)

op = th.function(
    inputs=[th_input, th_filter],
    outputs=th_output
)

input_sample = np.random.rand(1, 1, 28, 28)
kernel = np.random.rand(1, 1, 6, 6)

op(input_sample, kernel)


Interestingly, the error is NOT shown for other distribution samples, like 
theano_rng.normal(), 
which has type RandomFunction{normal}.1 instead 
of RandomFunction{binomial}.1

For what its worth, my THEANO_FLAGS are as follows:
floatX=float64,device=cuda,nvcc.flags=-D_FORCE_INLINES,exception_verbosity=
high

The rest of the stack trace is as follows:
Traceback (most recent call last):
  File "tmp2.py", line 23, in <module>
    op(input_sample, kernel)
  File 
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
 
line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File 
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/gof/link.py", 
line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File 
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
 
line 884, in __call__
    self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1, 
1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 11
Inputs types: [GpuArrayType<None>(int64, (False, False, False, False)), 
GpuArrayType<None>(float64, (False, False, False, False))]
Inputs shapes: [(1, 1, 28, 28), (1, 1, 6, 6)]
Inputs strides: [(6272, 6272, 224, 8), (288, 288, 48, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [7, 12]
Outputs clients: [[HostFromGpu(gpuarray)(GpuCorrMM{valid, (1, 1), (1, 
1)}.0)]]

Debugprint of the apply node: 
GpuCorrMM{valid, (1, 1), (1, 1)} [id A] <GpuArrayType<None>(int64, (False, 
False, False, False))> ''   
 |GpuContiguous [id B] <GpuArrayType<None>(int64, (False, False, False, 
False))> ''   
 | |GpuFromHost<None> [id C] <GpuArrayType<None>(int64, (False, False, 
False, False))> ''   
 |   |RandomFunction{binomial}.1 [id D] <TensorType(int64, 4D)> ''   
 |     |<RandomStateType> [id E] <RandomStateType>
 |     |MakeVector{dtype='int64'} [id F] <TensorType(int64, vector)> ''   
 |     | |Shape_i{0} [id G] <TensorType(int64, scalar)> ''   
 |     | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
 |     | |Shape_i{1} [id I] <TensorType(int64, scalar)> ''   
 |     | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
 |     | |Shape_i{2} [id J] <TensorType(int64, scalar)> ''   
 |     | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
 |     | |Shape_i{3} [id K] <TensorType(int64, scalar)> ''   
 |     |   |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
 |     |TensorConstant{1} [id L] <TensorType(int8, scalar)>
 |     |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
 |GpuContiguous [id M] <GpuArrayType<None>(float64, (False, False, False, 
False))> ''   
   |GpuFromHost<None> [id N] <GpuArrayType<None>(float64, (False, False, 
False, False))> ''   
     |Subtensor{::, ::, ::int64, ::int64} [id O] <TensorType(float64, 4D)> 
''   
       |<TensorType(float64, 4D)> [id P] <TensorType(float64, 4D)>
       |Constant{-1} [id Q] <int64>
       |Constant{-1} [id Q] <int64>

Storage map footprint:
 - GpuContiguous.0, Shape: (1, 1, 28, 28), ElemSize: 8 Byte(s), TotalSize: 
6272 Byte(s)
 - <TensorType(float64, 4D)>, Input, Shape: (1, 1, 28, 28), ElemSize: 8 
Byte(s), TotalSize: 6272 Byte(s)
 - GpuContiguous.0, Shape: (1, 1, 6, 6), ElemSize: 8 Byte(s), TotalSize: 
288 Byte(s)
 - <TensorType(float64, 4D)>, Input, Shape: (1, 1, 6, 6), ElemSize: 8 
Byte(s), TotalSize: 288 Byte(s)
 - Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
 TotalSize: 13129.0 Byte(s) 0.000 GB
 TotalSize inputs: 6569.0 Byte(s) 0.000 GB

Am I doing something wrong here? Any idea how I might get around this? It 
works if i split up the code into two functions, one that does the sampling 
and returns out the tensor, and then one that takes in this result and does 
the convolution. But it'd be really stupid to pass the value back out to 
the CPU RAM from the GPU RAM just to get around this...

Any advice would be hugely appreciated!

Cheers,
Dave

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to