I'm implementing sampling based softmax alternatives, where I compute the
preactivations only for certain output classes. I get a very bad
performance due to a GpuAdvancedIncSubtensor1 op, which consumes 90 % of
the processing time of the update function:
<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops>
<Gflops/s> <Apply name> 89.0% 89.0% 725.413s 2.44e-01s 2968 115
GpuAdvancedIncSubtensor1{inplace,inc}(GpuAdvancedIncSubtensor1{inplace,inc}.0,
GpuFromHost.0, Elemwise{Cast{int64}}.0) input 0: dtype=float32,
shape=(10001,), strides=(1,) input 1: dtype=float32, shape=(25600,),
strides=(1,) input 2: dtype=int64, shape=(25600,), strides=c output 0:
dtype=float32, shape=(10001,), strides=(1,)
Looking at the computation graph of that function, I noticed it's operating
on the bias vector:
GpuAdvancedIncSubtensor1{inplace,inc} [id FL] '' 115
|GpuAdvancedIncSubtensor1{inplace,inc} [id FM] '' 112
| |GpuAlloc{memset_0=True} [id FN] '' 17
| | |CudaNdarrayConstant{[ 0.]} [id FO]
| | |Shape_i{0} [id FP] '' 7
| | |bias [id BU]
More precisely, the performance hit seems to come from selecting from the
bias vector those values that correspond to the output classes (bias =
bias[class_ids]). Is that a particularly expensive operation? class_ids can
be large (1,000 - 10,000). If I don't use the bias, my speed improves
tenfold. Is there a way to circumvent that problem?
Thanks for any help!
Seppo
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.