I have encountered a similar issue:
https://groups.google.com/forum/#!searchin/theano-users/Gemv%7Csort:date/theano-users/UfPNnTI1pI4/2w48Gid_BwAJ

On Tue, Jan 31, 2017 at 2:56 AM, Raphael Shu <[email protected]> wrote:

> Hi,
>
>
> It turns out the LSTMs run very slow on CPU,
>
>
> the profiling results show that theano.tensor.blas.Gemv is the reason and the 
> type of Gemv is Py.
>
>
> Does this result imply that the Gemv operation is run on Python level?
>
>
> Is there anyone can provide some tips on how to speed up the operation?
>
>
> Thanks !
>
>
> Raphael Shu
>
>
>
> Function profiling
> ==================
>   Message: /home/shu/research/deepy/deepy/networks/network.py:196
>   Time in 581 calls to Function.__call__: 7.022281e+01s
>   Time in Function.fn.__call__: 7.018702e+01s (99.949%)
>   Time in thunks: 7.015664e+01s (99.906%)
>   Total compile time: 1.668830e-01s
>     Number of Apply nodes: 49
>     Theano Optimizer time: 1.264119e-01s
>        Theano validate time: 1.095724e-02s
>     Theano Linker time (includes C, CUDA code generation/compiling): 
> 2.585006e-02s
>        Import time 1.466990e-03s
>        Node make_thunk time 2.393484e-02s
>            Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) + 
> (scalar_sigmoid((i3 +
>  i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0, 
> Subtensor{int64:int64:}.0, LST
> M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s
>            Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, 
> TensorConstant{1.0}, Elemwise{C
> omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time 
> 8.509159e-04s
>            Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, 
> TensorConstant{1.0}, LSTM_wf.T,
>  Join.0, TensorConstant{0.0}) time 8.258820e-04s
>            Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, 
> TensorConstant{1.0}, attention_
> wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s
>            Node Elemwise{Composite{tanh((i0 + 
> i1))}}(InplaceDimShuffle{x,0}.0, uah) time 7.7
> 41451e-04s
>
> Time in all call to theano.grad() 0.000000e+00s
> Time since theano import 335.098s
> Class
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class 
> name>
>   99.2%    99.2%      69.597s       1.20e-02s     Py    5810      10   
> theano.tensor.blas.Ge
> mv
>    0.7%    99.9%       0.488s       2.10e-04s     C     2324       4   
> theano.tensor.elemwis
> e.Elemwise
>    0.0%    99.9%       0.026s       4.48e-05s     C      581       1   
> theano.tensor.elemwis
> e.Sum
>    0.0%   100.0%       0.013s       1.81e-06s     C     6972      12   
> theano.tensor.elemwis
> e.DimShuffle
>    0.0%   100.0%       0.010s       8.33e-06s     C     1162       2   
> theano.tensor.basic.J
> oin
>    0.0%   100.0%       0.007s       1.21e-05s     C      581       1   
> theano.tensor.subtens
> or.AdvancedSubtensor1
>    0.0%   100.0%       0.005s       2.95e-06s     C     1743       3   
> theano.tensor.subtens
> or.Subtensor
>    0.0%   100.0%       0.005s       1.35e-06s     C     3486       6   
> theano.compile.ops.Sh
> ape_i
>    0.0%   100.0%       0.003s       7.82e-07s     C     3486       6   
> theano.tensor.basic.A
> llocEmpty
>    0.0%   100.0%       0.002s       1.51e-06s     C     1162       2   
> theano.tensor.basic.R
> eshape
>    0.0%   100.0%       0.002s       2.60e-06s     C      581       1   
> theano.tensor.nnet.nn
> et.Softmax
>    0.0%   100.0%       0.001s       1.09e-06s     C      581       1   
> theano.compile.ops.Re
> broadcast
>    ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)
>
> Ops
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op 
> name>
>   99.2%    99.2%      69.597s       1.20e-02s     Py    5810       10   
> Gemv{inplace}
>    0.5%    99.7%       0.375s       6.46e-04s     C      581        1   
> Elemwise{Composite{t
> anh((i0 + i1))}}
>    0.1%    99.8%       0.052s       8.99e-05s     C      581        1   
> Elemwise{Composite{(
> (scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + 
> i6))))}}[(0, 1)]
>    0.0%    99.9%       0.033s       5.60e-05s     C      581        1   
> Elemwise{Composite{(
> scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)]
>    0.0%    99.9%       0.028s       4.78e-05s     C      581        1   
> Elemwise{mul,no_inpl
> ace}
>    0.0%    99.9%       0.026s       4.48e-05s     C      581        1   
> Sum{axis=[0], acc_dt
> ype=float64}
>    0.0%    99.9%       0.010s       1.96e-06s     C     5229        9   
> InplaceDimShuffle{1,
> 0}
>    0.0%   100.0%       0.010s       8.33e-06s     C     1162        2   Join
>    0.0%   100.0%       0.007s       1.21e-05s     C      581        1   
> AdvancedSubtensor1
>    0.0%   100.0%       0.004s       1.46e-06s     C     2905        5   
> Shape_i{1}
>    0.0%   100.0%       0.003s       7.82e-07s     C     3486        6   
> AllocEmpty{dtype='fl
> oat32'}
>    0.0%   100.0%       0.003s       4.41e-06s     C      581        1   
> Subtensor{int64:int6
> 4:}
>    0.0%   100.0%       0.002s       1.51e-06s     C     1162        2   
> Reshape{1}
>    0.0%   100.0%       0.002s       1.46e-06s     C     1162        2   
> InplaceDimShuffle{x,
> 0}
>    0.0%   100.0%       0.002s       2.66e-06s     C      581        1   
> Subtensor{::, :int64
> :}
>    0.0%   100.0%       0.002s       2.60e-06s     C      581        1   
> Softmax
>    0.0%   100.0%       0.001s       1.78e-06s     C      581        1   
> Subtensor{:int64:}
>    0.0%   100.0%       0.001s       1.16e-06s     C      581        1   
> InplaceDimShuffle{1,
> x}
>    0.0%   100.0%       0.001s       1.09e-06s     C      581        1   
> Rebroadcast{1}
>    0.0%   100.0%       0.000s       8.04e-07s     C      581        1   
> Shape_i{0}
>    ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)
>
> Apply
> ------
> <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
>   22.3%    22.3%      15.677s       2.70e-02s    581    39   
> Gemv{inplace}(AllocEmpty{dtype=
> 'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0})
>   22.2%    44.5%      15.575s       2.68e-02s    581    38   
> Gemv{inplace}(AllocEmpty{dtype=
> 'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0})
>   22.0%    66.5%      15.409s       2.65e-02s    581    41   
> Gemv{inplace}(AllocEmpty{dtype=
> 'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0})
>   21.8%    88.4%      15.324s       2.64e-02s    581    40   
> Gemv{inplace}(AllocEmpty{dtype=
> 'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0})
>    2.2%    90.5%       1.523s       2.62e-03s    581    42   
> Gemv{inplace}(Gemv{inplace}.0,
> TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0})
>    2.2%    92.7%       1.523s       2.62e-03s    581    43   
> Gemv{inplace}(Gemv{inplace}.0,
> TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0})
>    2.2%    94.9%       1.514s       2.61e-03s    581    27   
> Gemv{inplace}(AllocEmpty{dtype=
> 'float32'}.0, TensorConstant{1.0}, attention_wa.T, Subtensor{:int64:}.0, 
> TensorConstant{0.0}
> )
>    2.2%    97.0%       1.513s       2.60e-03s    581    45   
> Gemv{inplace}(Gemv{inplace}.0,
> TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0})
>    2.2%    99.2%       1.509s       2.60e-03s    581    44   
> Gemv{inplace}(Gemv{inplace}.0,
> TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0})
>    0.5%    99.7%       0.375s       6.46e-04s    581    30   
> Elemwise{Composite{tanh((i0 + i
> 1))}}(InplaceDimShuffle{x,0}.0, uah)
>    0.1%    99.8%       0.052s       8.99e-05s    581    46   
> Elemwise{Composite{((scalar_sig
> moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 
> 1)](LSTM_bf, Ge
> mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0, LSTM_bc, 
> Gemv{inplace}.0
> )
>    0.0%    99.8%       0.033s       5.60e-05s    581    47   
> Elemwise{Composite{(scalar_sigm
> oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0, 
> Elemwise{Composite{((scalar_s
> igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + 
> i6))))}}[(0, 1)].0)
>    0.0%    99.9%       0.031s       5.27e-05s    581    31   
> Gemv{inplace}(AllocEmpty{dtype=
> 'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 + i1))}}.0, 
> attention_va, Ten
> sorConstant{0.0})
>    0.0%    99.9%       0.028s       4.78e-05s    581    35   
> Elemwise{mul,no_inplace}(Inplac
> eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0)
>    0.0%    99.9%       0.026s       4.48e-05s    581    36   Sum{axis=[0], 
> acc_dtype=float64
> }(Elemwise{mul,no_inplace}.0)
>    0.0%    99.9%       0.007s       1.21e-05s    581    26   
> AdvancedSubtensor1(word_embed_e
> mbeddings, Rebroadcast{1}.0)
>    0.0%   100.0%       0.006s       1.03e-05s    581    48   
> Join(TensorConstant{0}, Elemwis
> e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0, 
> Elemwise{Composite{((scalar_
> sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + 
> i6))))}}[(0, 1)].0, Tenso
> rConstant{(1,) of 0.0})
>    0.0%   100.0%       0.004s       6.38e-06s    581    37   
> Join(TensorConstant{0}, Sum{axi
> s=[0], acc_dtype=float64}.0, Reshape{1}.0)
>    0.0%   100.0%       0.003s       4.41e-06s    581     3   
> Subtensor{int64:int64:}(s, Cons
> tant{1000}, Constant{2000})
>    0.0%   100.0%       0.002s       2.87e-06s    581    16   
> InplaceDimShuffle{1,0}(LSTM_uo)
>    ... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime)
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to