Hi,


It turns out the LSTMs run very slow on CPU, 


the profiling results show that theano.tensor.blas.Gemv is the reason and the 
type of Gemv is Py.


Does this result imply that the Gemv operation is run on Python level?


Is there anyone can provide some tips on how to speed up the operation?


Thanks !


Raphael Shu



Function profiling
==================
  Message: /home/shu/research/deepy/deepy/networks/network.py:196
  Time in 581 calls to Function.__call__: 7.022281e+01s
  Time in Function.fn.__call__: 7.018702e+01s (99.949%)
  Time in thunks: 7.015664e+01s (99.906%)
  Total compile time: 1.668830e-01s
    Number of Apply nodes: 49
    Theano Optimizer time: 1.264119e-01s
       Theano validate time: 1.095724e-02s
    Theano Linker time (includes C, CUDA code generation/compiling): 
2.585006e-02s
       Import time 1.466990e-03s
       Node make_thunk time 2.393484e-02s
           Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) + 
(scalar_sigmoid((i3 +
 i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0, 
Subtensor{int64:int64:}.0, LST
M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, 
TensorConstant{1.0}, Elemwise{C
omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time 
8.509159e-04s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, 
TensorConstant{1.0}, LSTM_wf.T,
 Join.0, TensorConstant{0.0}) time 8.258820e-04s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, 
TensorConstant{1.0}, attention_
wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s
           Node Elemwise{Composite{tanh((i0 + i1))}}(InplaceDimShuffle{x,0}.0, 
uah) time 7.7
41451e-04s

Time in all call to theano.grad() 0.000000e+00s
Time since theano import 335.098s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class 
name>
  99.2%    99.2%      69.597s       1.20e-02s     Py    5810      10   
theano.tensor.blas.Ge
mv
   0.7%    99.9%       0.488s       2.10e-04s     C     2324       4   
theano.tensor.elemwis
e.Elemwise
   0.0%    99.9%       0.026s       4.48e-05s     C      581       1   
theano.tensor.elemwis
e.Sum
   0.0%   100.0%       0.013s       1.81e-06s     C     6972      12   
theano.tensor.elemwis
e.DimShuffle
   0.0%   100.0%       0.010s       8.33e-06s     C     1162       2   
theano.tensor.basic.J
oin
   0.0%   100.0%       0.007s       1.21e-05s     C      581       1   
theano.tensor.subtens
or.AdvancedSubtensor1
   0.0%   100.0%       0.005s       2.95e-06s     C     1743       3   
theano.tensor.subtens
or.Subtensor
   0.0%   100.0%       0.005s       1.35e-06s     C     3486       6   
theano.compile.ops.Sh
ape_i
   0.0%   100.0%       0.003s       7.82e-07s     C     3486       6   
theano.tensor.basic.A
llocEmpty
   0.0%   100.0%       0.002s       1.51e-06s     C     1162       2   
theano.tensor.basic.R
eshape
   0.0%   100.0%       0.002s       2.60e-06s     C      581       1   
theano.tensor.nnet.nn
et.Softmax
   0.0%   100.0%       0.001s       1.09e-06s     C      581       1   
theano.compile.ops.Re
broadcast
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  99.2%    99.2%      69.597s       1.20e-02s     Py    5810       10   
Gemv{inplace}
   0.5%    99.7%       0.375s       6.46e-04s     C      581        1   
Elemwise{Composite{t
anh((i0 + i1))}}
   0.1%    99.8%       0.052s       8.99e-05s     C      581        1   
Elemwise{Composite{(
(scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + 
i6))))}}[(0, 1)]
   0.0%    99.9%       0.033s       5.60e-05s     C      581        1   
Elemwise{Composite{(
scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)]
   0.0%    99.9%       0.028s       4.78e-05s     C      581        1   
Elemwise{mul,no_inpl
ace}
   0.0%    99.9%       0.026s       4.48e-05s     C      581        1   
Sum{axis=[0], acc_dt
ype=float64}
   0.0%    99.9%       0.010s       1.96e-06s     C     5229        9   
InplaceDimShuffle{1,
0}
   0.0%   100.0%       0.010s       8.33e-06s     C     1162        2   Join
   0.0%   100.0%       0.007s       1.21e-05s     C      581        1   
AdvancedSubtensor1
   0.0%   100.0%       0.004s       1.46e-06s     C     2905        5   
Shape_i{1}
   0.0%   100.0%       0.003s       7.82e-07s     C     3486        6   
AllocEmpty{dtype='fl
oat32'}
   0.0%   100.0%       0.003s       4.41e-06s     C      581        1   
Subtensor{int64:int6
4:}
   0.0%   100.0%       0.002s       1.51e-06s     C     1162        2   
Reshape{1}
   0.0%   100.0%       0.002s       1.46e-06s     C     1162        2   
InplaceDimShuffle{x,
0}
   0.0%   100.0%       0.002s       2.66e-06s     C      581        1   
Subtensor{::, :int64
:}
   0.0%   100.0%       0.002s       2.60e-06s     C      581        1   Softmax
   0.0%   100.0%       0.001s       1.78e-06s     C      581        1   
Subtensor{:int64:}
   0.0%   100.0%       0.001s       1.16e-06s     C      581        1   
InplaceDimShuffle{1,
x}
   0.0%   100.0%       0.001s       1.09e-06s     C      581        1   
Rebroadcast{1}
   0.0%   100.0%       0.000s       8.04e-07s     C      581        1   
Shape_i{0}
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  22.3%    22.3%      15.677s       2.70e-02s    581    39   
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0})
  22.2%    44.5%      15.575s       2.68e-02s    581    38   
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0})
  22.0%    66.5%      15.409s       2.65e-02s    581    41   
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0})
  21.8%    88.4%      15.324s       2.64e-02s    581    40   
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0})
   2.2%    90.5%       1.523s       2.62e-03s    581    42   
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    92.7%       1.523s       2.62e-03s    581    43   
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    94.9%       1.514s       2.61e-03s    581    27   
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, attention_wa.T, Subtensor{:int64:}.0, 
TensorConstant{0.0}
)
   2.2%    97.0%       1.513s       2.60e-03s    581    45   
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    99.2%       1.509s       2.60e-03s    581    44   
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   0.5%    99.7%       0.375s       6.46e-04s    581    30   
Elemwise{Composite{tanh((i0 + i
1))}}(InplaceDimShuffle{x,0}.0, uah)
   0.1%    99.8%       0.052s       8.99e-05s    581    46   
Elemwise{Composite{((scalar_sig
moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 
1)](LSTM_bf, Ge
mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0, LSTM_bc, 
Gemv{inplace}.0
)
   0.0%    99.8%       0.033s       5.60e-05s    581    47   
Elemwise{Composite{(scalar_sigm
oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0, 
Elemwise{Composite{((scalar_s
igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 
1)].0)
   0.0%    99.9%       0.031s       5.27e-05s    581    31   
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 + i1))}}.0, 
attention_va, Ten
sorConstant{0.0})
   0.0%    99.9%       0.028s       4.78e-05s    581    35   
Elemwise{mul,no_inplace}(Inplac
eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0)
   0.0%    99.9%       0.026s       4.48e-05s    581    36   Sum{axis=[0], 
acc_dtype=float64
}(Elemwise{mul,no_inplace}.0)
   0.0%    99.9%       0.007s       1.21e-05s    581    26   
AdvancedSubtensor1(word_embed_e
mbeddings, Rebroadcast{1}.0)
   0.0%   100.0%       0.006s       1.03e-05s    581    48   
Join(TensorConstant{0}, Elemwis
e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0, 
Elemwise{Composite{((scalar_
sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 
1)].0, Tenso
rConstant{(1,) of 0.0})
   0.0%   100.0%       0.004s       6.38e-06s    581    37   
Join(TensorConstant{0}, Sum{axi
s=[0], acc_dtype=float64}.0, Reshape{1}.0)
   0.0%   100.0%       0.003s       4.41e-06s    581     3   
Subtensor{int64:int64:}(s, Cons
tant{1000}, Constant{2000})
   0.0%   100.0%       0.002s       2.87e-06s    581    16   
InplaceDimShuffle{1,0}(LSTM_uo)
   ... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to