Re: [theano-users] Performance issue while indexing with a large vector

Pascal Lamblin Fri, 18 Nov 2016 08:50:08 -0800

Hi,

This operation is actually the _update_ of the selected elements of the bias.


There is a faster implementation (named GpuAdvancedIncSubtensor1_dev20
IIRC) that uses atomic addition to speed up that operation. It has the
downside of not yielding a deterministic order of summation if the same
element is updated more than once in the same operation.

One of the issues seems to be that this faster implementation is not
selected. Could it be that you have an old GPU?

Another potential issue is that your graph seems to first apply updates
on a tensor of zeros, and then apply another update on the bias itself.
There may be a way of simplifying that.

On Fri, Nov 18, 2016, Seppo Enarvi wrote:
> 
> 
> I'm implementing sampling based softmax alternatives, where I compute the 
> preactivations only for certain output classes. I get a very bad 
> performance due to a GpuAdvancedIncSubtensor1 op, which consumes 90 % of 
> the processing time of the update function:
> 
> 
> <% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> 
> <Gflops/s> <Apply name> 89.0% 89.0% 725.413s 2.44e-01s 2968 115 
> GpuAdvancedIncSubtensor1{inplace,inc}(GpuAdvancedIncSubtensor1{inplace,inc}.0,
>  
> GpuFromHost.0, Elemwise{Cast{int64}}.0) input 0: dtype=float32, 
> shape=(10001,), strides=(1,) input 1: dtype=float32, shape=(25600,), 
> strides=(1,) input 2: dtype=int64, shape=(25600,), strides=c output 0: 
> dtype=float32, shape=(10001,), strides=(1,) 
> 
> Looking at the computation graph of that function, I noticed it's operating 
> on the bias vector:
> 
> 
> GpuAdvancedIncSubtensor1{inplace,inc} [id FL] ''   115
>  |GpuAdvancedIncSubtensor1{inplace,inc} [id FM] ''   112
>  | |GpuAlloc{memset_0=True} [id FN] ''   17
>  | | |CudaNdarrayConstant{[ 0.]} [id FO]
>  | | |Shape_i{0} [id FP] ''   7
>  | |   |bias [id BU]
> 
> More precisely, the performance hit seems to come from selecting from the 
> bias vector those values that correspond to the output classes (bias = 
> bias[class_ids]). Is that a particularly expensive operation? class_ids can 
> be large (1,000 - 10,000). If I don't use the bias, my speed improves 
> tenfold. Is there a way to circumvent that problem?
> 
> Thanks for any help!
> 
> Seppo
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Performance issue while indexing with a large vector

Reply via email to