[theano-users] Performance issue while indexing with a large vector

Seppo Enarvi Fri, 18 Nov 2016 07:06:53 -0800


I'm implementing sampling based softmax alternatives, where I compute the 
preactivations only for certain output classes. I get a very bad 
performance due to a GpuAdvancedIncSubtensor1 op, which consumes 90 % of 
the processing time of the update function:



<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> 
<Gflops/s> <Apply name> 89.0% 89.0% 725.413s 2.44e-01s 2968 115 
GpuAdvancedIncSubtensor1{inplace,inc}(GpuAdvancedIncSubtensor1{inplace,inc}.0, 
GpuFromHost.0, Elemwise{Cast{int64}}.0) input 0: dtype=float32, 
shape=(10001,), strides=(1,) input 1: dtype=float32, shape=(25600,), 
strides=(1,) input 2: dtype=int64, shape=(25600,), strides=c output 0: 
dtype=float32, shape=(10001,), strides=(1,) 

Looking at the computation graph of that function, I noticed it's operating 
on the bias vector:


GpuAdvancedIncSubtensor1{inplace,inc} [id FL] ''   115
 |GpuAdvancedIncSubtensor1{inplace,inc} [id FM] ''   112
 | |GpuAlloc{memset_0=True} [id FN] ''   17
 | | |CudaNdarrayConstant{[ 0.]} [id FO]
 | | |Shape_i{0} [id FP] ''   7
 | |   |bias [id BU]

More precisely, the performance hit seems to come from selecting from the 
bias vector those values that correspond to the output classes (bias = 
bias[class_ids]). Is that a particularly expensive operation? class_ids can 
be large (1,000 - 10,000). If I don't use the bias, my speed improves 
tenfold. Is there a way to circumvent that problem?

Thanks for any help!

Seppo

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Performance issue while indexing with a large vector

Reply via email to