Hi,
It turns out the LSTMs run very slow on CPU,
the profiling results show that theano.tensor.blas.Gemv is the reason and the
type of Gemv is Py.
Does this result imply that the Gemv operation is run on Python level?
Is there anyone can provide some tips on how to speed up the operation?
Thanks !
Raphael Shu
Function profiling
==================
Message: /home/shu/research/deepy/deepy/networks/network.py:196
Time in 581 calls to Function.__call__: 7.022281e+01s
Time in Function.fn.__call__: 7.018702e+01s (99.949%)
Time in thunks: 7.015664e+01s (99.906%)
Total compile time: 1.668830e-01s
Number of Apply nodes: 49
Theano Optimizer time: 1.264119e-01s
Theano validate time: 1.095724e-02s
Theano Linker time (includes C, CUDA code generation/compiling):
2.585006e-02s
Import time 1.466990e-03s
Node make_thunk time 2.393484e-02s
Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) +
(scalar_sigmoid((i3 +
i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0,
Subtensor{int64:int64:}.0, LST
M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s
Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, Elemwise{C
omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time
8.509159e-04s
Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, LSTM_wf.T,
Join.0, TensorConstant{0.0}) time 8.258820e-04s
Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, attention_
wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s
Node Elemwise{Composite{tanh((i0 + i1))}}(InplaceDimShuffle{x,0}.0,
uah) time 7.7
41451e-04s
Time in all call to theano.grad() 0.000000e+00s
Time since theano import 335.098s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class
name>
99.2% 99.2% 69.597s 1.20e-02s Py 5810 10
theano.tensor.blas.Ge
mv
0.7% 99.9% 0.488s 2.10e-04s C 2324 4
theano.tensor.elemwis
e.Elemwise
0.0% 99.9% 0.026s 4.48e-05s C 581 1
theano.tensor.elemwis
e.Sum
0.0% 100.0% 0.013s 1.81e-06s C 6972 12
theano.tensor.elemwis
e.DimShuffle
0.0% 100.0% 0.010s 8.33e-06s C 1162 2
theano.tensor.basic.J
oin
0.0% 100.0% 0.007s 1.21e-05s C 581 1
theano.tensor.subtens
or.AdvancedSubtensor1
0.0% 100.0% 0.005s 2.95e-06s C 1743 3
theano.tensor.subtens
or.Subtensor
0.0% 100.0% 0.005s 1.35e-06s C 3486 6
theano.compile.ops.Sh
ape_i
0.0% 100.0% 0.003s 7.82e-07s C 3486 6
theano.tensor.basic.A
llocEmpty
0.0% 100.0% 0.002s 1.51e-06s C 1162 2
theano.tensor.basic.R
eshape
0.0% 100.0% 0.002s 2.60e-06s C 581 1
theano.tensor.nnet.nn
et.Softmax
0.0% 100.0% 0.001s 1.09e-06s C 581 1
theano.compile.ops.Re
broadcast
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
99.2% 99.2% 69.597s 1.20e-02s Py 5810 10
Gemv{inplace}
0.5% 99.7% 0.375s 6.46e-04s C 581 1
Elemwise{Composite{t
anh((i0 + i1))}}
0.1% 99.8% 0.052s 8.99e-05s C 581 1
Elemwise{Composite{(
(scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)]
0.0% 99.9% 0.033s 5.60e-05s C 581 1
Elemwise{Composite{(
scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)]
0.0% 99.9% 0.028s 4.78e-05s C 581 1
Elemwise{mul,no_inpl
ace}
0.0% 99.9% 0.026s 4.48e-05s C 581 1
Sum{axis=[0], acc_dt
ype=float64}
0.0% 99.9% 0.010s 1.96e-06s C 5229 9
InplaceDimShuffle{1,
0}
0.0% 100.0% 0.010s 8.33e-06s C 1162 2 Join
0.0% 100.0% 0.007s 1.21e-05s C 581 1
AdvancedSubtensor1
0.0% 100.0% 0.004s 1.46e-06s C 2905 5
Shape_i{1}
0.0% 100.0% 0.003s 7.82e-07s C 3486 6
AllocEmpty{dtype='fl
oat32'}
0.0% 100.0% 0.003s 4.41e-06s C 581 1
Subtensor{int64:int6
4:}
0.0% 100.0% 0.002s 1.51e-06s C 1162 2
Reshape{1}
0.0% 100.0% 0.002s 1.46e-06s C 1162 2
InplaceDimShuffle{x,
0}
0.0% 100.0% 0.002s 2.66e-06s C 581 1
Subtensor{::, :int64
:}
0.0% 100.0% 0.002s 2.60e-06s C 581 1 Softmax
0.0% 100.0% 0.001s 1.78e-06s C 581 1
Subtensor{:int64:}
0.0% 100.0% 0.001s 1.16e-06s C 581 1
InplaceDimShuffle{1,
x}
0.0% 100.0% 0.001s 1.09e-06s C 581 1
Rebroadcast{1}
0.0% 100.0% 0.000s 8.04e-07s C 581 1
Shape_i{0}
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
22.3% 22.3% 15.677s 2.70e-02s 581 39
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0})
22.2% 44.5% 15.575s 2.68e-02s 581 38
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0})
22.0% 66.5% 15.409s 2.65e-02s 581 41
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0})
21.8% 88.4% 15.324s 2.64e-02s 581 40
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0})
2.2% 90.5% 1.523s 2.62e-03s 581 42
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0})
2.2% 92.7% 1.523s 2.62e-03s 581 43
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0})
2.2% 94.9% 1.514s 2.61e-03s 581 27
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, attention_wa.T, Subtensor{:int64:}.0,
TensorConstant{0.0}
)
2.2% 97.0% 1.513s 2.60e-03s 581 45
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0})
2.2% 99.2% 1.509s 2.60e-03s 581 44
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0})
0.5% 99.7% 0.375s 6.46e-04s 581 30
Elemwise{Composite{tanh((i0 + i
1))}}(InplaceDimShuffle{x,0}.0, uah)
0.1% 99.8% 0.052s 8.99e-05s 581 46
Elemwise{Composite{((scalar_sig
moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0,
1)](LSTM_bf, Ge
mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0, LSTM_bc,
Gemv{inplace}.0
)
0.0% 99.8% 0.033s 5.60e-05s 581 47
Elemwise{Composite{(scalar_sigm
oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0,
Elemwise{Composite{((scalar_s
igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0,
1)].0)
0.0% 99.9% 0.031s 5.27e-05s 581 31
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 + i1))}}.0,
attention_va, Ten
sorConstant{0.0})
0.0% 99.9% 0.028s 4.78e-05s 581 35
Elemwise{mul,no_inplace}(Inplac
eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0)
0.0% 99.9% 0.026s 4.48e-05s 581 36 Sum{axis=[0],
acc_dtype=float64
}(Elemwise{mul,no_inplace}.0)
0.0% 99.9% 0.007s 1.21e-05s 581 26
AdvancedSubtensor1(word_embed_e
mbeddings, Rebroadcast{1}.0)
0.0% 100.0% 0.006s 1.03e-05s 581 48
Join(TensorConstant{0}, Elemwis
e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0,
Elemwise{Composite{((scalar_
sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0,
1)].0, Tenso
rConstant{(1,) of 0.0})
0.0% 100.0% 0.004s 6.38e-06s 581 37
Join(TensorConstant{0}, Sum{axi
s=[0], acc_dtype=float64}.0, Reshape{1}.0)
0.0% 100.0% 0.003s 4.41e-06s 581 3
Subtensor{int64:int64:}(s, Cons
tant{1000}, Constant{2000})
0.0% 100.0% 0.002s 2.87e-06s 581 16
InplaceDimShuffle{1,0}(LSTM_uo)
... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime)
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.