Re: [theano-users] Performance issue while indexing with a large vector

Pascal Lamblin Mon, 21 Nov 2016 19:46:28 -0800

On Mon, Nov 21, 2016, Seppo Enarvi wrote:
> Ok. Is random number generation working in the new GPU backend yet? I can 
> see some code related to it, but a call to *uniform()* produces the error 
> messages "context name None not defined" and "Could not infer context from 
> inputs". Looks like it's not possible to specify the target device to 
> *uniform()*.


It should work.
In fact I just tried with the latest master, and I did not have any issue
with the following, with THEANO_FLAGS=device=cuda0,floatX=float32:

>>> import theano
Mapped name None to device cuda0 [...]
>>> from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
>>> rng = RandomStreams(23)
>>> u = rng.uniform((12,))
>>> f = theano.function([], u)
HostFromGpu(gpuarray) [id A] <TensorType(float32, vector)> ''   1
 |GPUA_mrg_uniform{GpuArrayType<None>(float32, (False,)),inplace}.1 [id B] 
<GpuArrayType<None>(float32, (False,))> ''   0
   |<GpuArrayType<None>(int32, (False, False))> [id C] 
<GpuArrayType<None>(int32, (False, False))>
   |TensorConstant{(1,) of 12} [id D] <TensorType(int64, (True,))>
GPUA_mrg_uniform{GpuArrayType<None>(float32, (False,)),inplace}.0 [id B] 
<GpuArrayType<None>(int32, (False, False))> ''   0
>>> f()
array([ 0.04422134,  0.93608665,  0.04399569,  0.95211482,  0.39980391,
        0.23936224,  0.31680474,  0.9962666 ,  0.46095091,  0.72883427,
        0.13103466,  0.61714345], dtype=float32)


> 
> On Monday, November 21, 2016 at 2:40:03 AM UTC+2, Pascal Lamblin wrote:
> >
> > Right, now I remember that the _dev20 version only works on a limited 
> > number of dimensions. That would explain why adding a new axis helped. 
> >
> > It may be fixed already in the new GPU back-end (it needs libgpuarray, 
> > then use device=cudaX instead of gpuX) already, otherwise this is where 
> > we should fix that. 
> >
> > On Fri, Nov 18, 2016, Seppo Enarvi wrote: 
> > > 
> > > 
> > > That's interesting, because this function is not supposed to update the 
> > > bias. It just computes the cost and its gradient. Maybe that op is used 
> > > to update the gradient. 
> > > 
> > > My GPU is Quadro K2000. I don't think it's too old because the graph 
> > > contains other instances of GpuAdvancedIncSubtensor1_dev20. 
> > > 
> > > Anyway, I started to think why I don't have this problem with the weight 
> > > matrix. I'm selecting vectors from the weight matrix in the same manner. 
> > > So I tried converting the bias vector into a matrix, and selecting rows 
> > > from the matrix (each of which contain only one element): 
> > > 
> > >      bias = bias[class_ids] 
> > > => 
> > >      bias = bias[:, None] 
> > >      bias = bias[class_ids, 0] 
> > > 
> > > It's a lot faster this way. I updated to the latest version of Theano 
> > > from Git and I still see the huge speed difference. 
> > > 
> > > Seppo 
> > > 
> > > 
> > > 
> > > On Friday, November 18, 2016 at 6:49:56 PM UTC+2, Pascal Lamblin wrote: 
> > > > 
> > > > Hi, 
> > > > 
> > > > This operation is actually the _update_ of the selected elements of 
> > the 
> > > > bias. 
> > > > 
> > > > There is a faster implementation (named GpuAdvancedIncSubtensor1_dev20 
> > > > IIRC) that uses atomic addition to speed up that operation. It has the 
> > > > downside of not yielding a deterministic order of summation if the 
> > same 
> > > > element is updated more than once in the same operation. 
> > > > 
> > > > One of the issues seems to be that this faster implementation is not 
> > > > selected. Could it be that you have an old GPU? 
> > > > 
> > > > Another potential issue is that your graph seems to first apply 
> > updates 
> > > > on a tensor of zeros, and then apply another update on the bias 
> > itself. 
> > > > There may be a way of simplifying that. 
> > > > 
> > > > On Fri, Nov 18, 2016, Seppo Enarvi wrote: 
> > > > > 
> > > > > I'm implementing sampling based softmax alternatives, where I 
> > compute 
> > > > the 
> > > > > preactivations only for certain output classes. I get a very bad 
> > > > > performance due to a GpuAdvancedIncSubtensor1 op, which consumes 90 
> > % of 
> > > > > the processing time of the update function: 
> > > > > 
> > > > > <% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> 
> > > > > <Gflops/s> <Apply name> 89.0% 89.0% 725.413s 2.44e-01s 2968 115 
> > > > > 
> > > > 
> > GpuAdvancedIncSubtensor1{inplace,inc}(GpuAdvancedIncSubtensor1{inplace,inc}.0,
> >  
> >
> > > > 
> > > > > GpuFromHost.0, Elemwise{Cast{int64}}.0) input 0: dtype=float32, 
> > > > > shape=(10001,), strides=(1,) input 1: dtype=float32, shape=(25600,), 
> > > > > strides=(1,) input 2: dtype=int64, shape=(25600,), strides=c output 
> > 0: 
> > > > > dtype=float32, shape=(10001,), strides=(1,) 
> > > > > 
> > > > > Looking at the computation graph of that function, I noticed it's 
> > > > operating 
> > > > > on the bias vector: 
> > > > > 
> > > > > GpuAdvancedIncSubtensor1{inplace,inc} [id FL] ''   115 
> > > > >  |GpuAdvancedIncSubtensor1{inplace,inc} [id FM] ''   112 
> > > > >  | |GpuAlloc{memset_0=True} [id FN] ''   17 
> > > > >  | | |CudaNdarrayConstant{[ 0.]} [id FO] 
> > > > >  | | |Shape_i{0} [id FP] ''   7 
> > > > >  | |   |bias [id BU] 
> > > > > 
> > > > > More precisely, the performance hit seems to come from selecting 
> > from 
> > > > the 
> > > > > bias vector those values that correspond to the output classes (bias 
> > = 
> > > > > bias[class_ids]). Is that a particularly expensive operation? 
> > class_ids 
> > > > can 
> > > > > be large (1,000 - 10,000). If I don't use the bias, my speed 
> > improves 
> > > > > tenfold. Is there a way to circumvent that problem? 
> > > > 
> > > 
> > > -- 
> > > 
> > > --- 
> > > You received this message because you are subscribed to the Google 
> > Groups "theano-users" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to [email protected] <javascript:>. 
> > > For more options, visit https://groups.google.com/d/optout. 
> >
> >
> > -- 
> > Pascal 
> >
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Performance issue while indexing with a large vector

Reply via email to