Re: [theano-users] Performance issue while indexing with a large vector

Pascal Lamblin Sun, 20 Nov 2016 16:40:21 -0800

Right, now I remember that the _dev20 version only works on a limited
number of dimensions. That would explain why adding a new axis helped.


It may be fixed already in the new GPU back-end (it needs libgpuarray,
then use device=cudaX instead of gpuX) already, otherwise this is where
we should fix that.

On Fri, Nov 18, 2016, Seppo Enarvi wrote:
> 
> 
> That's interesting, because this function is not supposed to update the 
> bias. It just computes the cost and its gradient. Maybe that op is used 
> to update the gradient.
> 
> My GPU is Quadro K2000. I don't think it's too old because the graph 
> contains other instances of GpuAdvancedIncSubtensor1_dev20.
> 
> Anyway, I started to think why I don't have this problem with the weight 
> matrix. I'm selecting vectors from the weight matrix in the same manner. 
> So I tried converting the bias vector into a matrix, and selecting rows 
> from the matrix (each of which contain only one element):
> 
>      bias = bias[class_ids]
> =>
>      bias = bias[:, None]
>      bias = bias[class_ids, 0]
> 
> It's a lot faster this way. I updated to the latest version of Theano 
> from Git and I still see the huge speed difference.
> 
> Seppo
> 
> 
> 
> On Friday, November 18, 2016 at 6:49:56 PM UTC+2, Pascal Lamblin wrote:
> >
> > Hi, 
> >
> > This operation is actually the _update_ of the selected elements of the 
> > bias. 
> >
> > There is a faster implementation (named GpuAdvancedIncSubtensor1_dev20 
> > IIRC) that uses atomic addition to speed up that operation. It has the 
> > downside of not yielding a deterministic order of summation if the same 
> > element is updated more than once in the same operation. 
> >
> > One of the issues seems to be that this faster implementation is not 
> > selected. Could it be that you have an old GPU? 
> >
> > Another potential issue is that your graph seems to first apply updates 
> > on a tensor of zeros, and then apply another update on the bias itself. 
> > There may be a way of simplifying that. 
> >
> > On Fri, Nov 18, 2016, Seppo Enarvi wrote: 
> > > 
> > > I'm implementing sampling based softmax alternatives, where I compute 
> > the 
> > > preactivations only for certain output classes. I get a very bad 
> > > performance due to a GpuAdvancedIncSubtensor1 op, which consumes 90 % of 
> > > the processing time of the update function: 
> > > 
> > > <% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> 
> > > <Gflops/s> <Apply name> 89.0% 89.0% 725.413s 2.44e-01s 2968 115 
> > > 
> > GpuAdvancedIncSubtensor1{inplace,inc}(GpuAdvancedIncSubtensor1{inplace,inc}.0,
> >  
> >
> > > GpuFromHost.0, Elemwise{Cast{int64}}.0) input 0: dtype=float32, 
> > > shape=(10001,), strides=(1,) input 1: dtype=float32, shape=(25600,), 
> > > strides=(1,) input 2: dtype=int64, shape=(25600,), strides=c output 0: 
> > > dtype=float32, shape=(10001,), strides=(1,) 
> > > 
> > > Looking at the computation graph of that function, I noticed it's 
> > operating 
> > > on the bias vector: 
> > > 
> > > GpuAdvancedIncSubtensor1{inplace,inc} [id FL] ''   115 
> > >  |GpuAdvancedIncSubtensor1{inplace,inc} [id FM] ''   112 
> > >  | |GpuAlloc{memset_0=True} [id FN] ''   17 
> > >  | | |CudaNdarrayConstant{[ 0.]} [id FO] 
> > >  | | |Shape_i{0} [id FP] ''   7 
> > >  | |   |bias [id BU] 
> > > 
> > > More precisely, the performance hit seems to come from selecting from 
> > the 
> > > bias vector those values that correspond to the output classes (bias = 
> > > bias[class_ids]). Is that a particularly expensive operation? class_ids 
> > can 
> > > be large (1,000 - 10,000). If I don't use the bias, my speed improves 
> > > tenfold. Is there a way to circumvent that problem? 
> >
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Performance issue while indexing with a large vector

Reply via email to