Theano is not about to link directly to BLAS.

The simplest way to fix this is to use anaconda and install mkl-service.

Make sure to use Theano dev version.

We tested this recently and it work on Mac Linux and Windows.

Fred
Le 30 janv. 2017 20:56, "Raphael Shu" <[email protected]> a écrit :

Hi,


It turns out the LSTMs run very slow on CPU,


the profiling results show that theano.tensor.blas.Gemv is the reason
and the type of Gemv is Py.


Does this result imply that the Gemv operation is run on Python level?


Is there anyone can provide some tips on how to speed up the operation?


Thanks !


Raphael Shu



Function profiling
==================
  Message: /home/shu/research/deepy/deepy/networks/network.py:196
  Time in 581 calls to Function.__call__: 7.022281e+01s
  Time in Function.fn.__call__: 7.018702e+01s (99.949%)
  Time in thunks: 7.015664e+01s (99.906%)
  Total compile time: 1.668830e-01s
    Number of Apply nodes: 49
    Theano Optimizer time: 1.264119e-01s
       Theano validate time: 1.095724e-02s
    Theano Linker time (includes C, CUDA code generation/compiling):
2.585006e-02s
       Import time 1.466990e-03s
       Node make_thunk time 2.393484e-02s
           Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) +
(scalar_sigmoid((i3 +
 i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0,
Subtensor{int64:int64:}.0, LST
M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, Elemwise{C
omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time
8.509159e-04s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, LSTM_wf.T,
 Join.0, TensorConstant{0.0}) time 8.258820e-04s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, attention_
wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s
           Node Elemwise{Composite{tanh((i0 +
i1))}}(InplaceDimShuffle{x,0}.0, uah) time 7.7
41451e-04s

Time in all call to theano.grad() 0.000000e+00s
Time since theano import 335.098s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply>
<Class name>
  99.2%    99.2%      69.597s       1.20e-02s     Py    5810      10
theano.tensor.blas.Ge
mv
   0.7%    99.9%       0.488s       2.10e-04s     C     2324       4
theano.tensor.elemwis
e.Elemwise
   0.0%    99.9%       0.026s       4.48e-05s     C      581       1
theano.tensor.elemwis
e.Sum
   0.0%   100.0%       0.013s       1.81e-06s     C     6972      12
theano.tensor.elemwis
e.DimShuffle
   0.0%   100.0%       0.010s       8.33e-06s     C     1162       2
theano.tensor.basic.J
oin
   0.0%   100.0%       0.007s       1.21e-05s     C      581       1
theano.tensor.subtens
or.AdvancedSubtensor1
   0.0%   100.0%       0.005s       2.95e-06s     C     1743       3
theano.tensor.subtens
or.Subtensor
   0.0%   100.0%       0.005s       1.35e-06s     C     3486       6
theano.compile.ops.Sh
ape_i
   0.0%   100.0%       0.003s       7.82e-07s     C     3486       6
theano.tensor.basic.A
llocEmpty
   0.0%   100.0%       0.002s       1.51e-06s     C     1162       2
theano.tensor.basic.R
eshape
   0.0%   100.0%       0.002s       2.60e-06s     C      581       1
theano.tensor.nnet.nn
et.Softmax
   0.0%   100.0%       0.001s       1.09e-06s     C      581       1
theano.compile.ops.Re
broadcast
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  99.2%    99.2%      69.597s       1.20e-02s     Py    5810       10
 Gemv{inplace}
   0.5%    99.7%       0.375s       6.46e-04s     C      581        1
 Elemwise{Composite{t
anh((i0 + i1))}}
   0.1%    99.8%       0.052s       8.99e-05s     C      581        1
 Elemwise{Composite{(
(scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) *
tanh((i5 + i6))))}}[(0, 1)]
   0.0%    99.9%       0.033s       5.60e-05s     C      581        1
 Elemwise{Composite{(
scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)]
   0.0%    99.9%       0.028s       4.78e-05s     C      581        1
 Elemwise{mul,no_inpl
ace}
   0.0%    99.9%       0.026s       4.48e-05s     C      581        1
 Sum{axis=[0], acc_dt
ype=float64}
   0.0%    99.9%       0.010s       1.96e-06s     C     5229        9
 InplaceDimShuffle{1,
0}
   0.0%   100.0%       0.010s       8.33e-06s     C     1162        2   Join
   0.0%   100.0%       0.007s       1.21e-05s     C      581        1
 AdvancedSubtensor1
   0.0%   100.0%       0.004s       1.46e-06s     C     2905        5
 Shape_i{1}
   0.0%   100.0%       0.003s       7.82e-07s     C     3486        6
 AllocEmpty{dtype='fl
oat32'}
   0.0%   100.0%       0.003s       4.41e-06s     C      581        1
 Subtensor{int64:int6
4:}
   0.0%   100.0%       0.002s       1.51e-06s     C     1162        2
 Reshape{1}
   0.0%   100.0%       0.002s       1.46e-06s     C     1162        2
 InplaceDimShuffle{x,
0}
   0.0%   100.0%       0.002s       2.66e-06s     C      581        1
 Subtensor{::, :int64
:}
   0.0%   100.0%       0.002s       2.60e-06s     C      581        1   Softmax
   0.0%   100.0%       0.001s       1.78e-06s     C      581        1
 Subtensor{:int64:}
   0.0%   100.0%       0.001s       1.16e-06s     C      581        1
 InplaceDimShuffle{1,
x}
   0.0%   100.0%       0.001s       1.09e-06s     C      581        1
 Rebroadcast{1}
   0.0%   100.0%       0.000s       8.04e-07s     C      581        1
 Shape_i{0}
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  22.3%    22.3%      15.677s       2.70e-02s    581    39
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0})
  22.2%    44.5%      15.575s       2.68e-02s    581    38
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0})
  22.0%    66.5%      15.409s       2.65e-02s    581    41
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0})
  21.8%    88.4%      15.324s       2.64e-02s    581    40
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0})
   2.2%    90.5%       1.523s       2.62e-03s    581    42
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    92.7%       1.523s       2.62e-03s    581    43
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    94.9%       1.514s       2.61e-03s    581    27
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, attention_wa.T,
Subtensor{:int64:}.0, TensorConstant{0.0}
)
   2.2%    97.0%       1.513s       2.60e-03s    581    45
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    99.2%       1.509s       2.60e-03s    581    44
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   0.5%    99.7%       0.375s       6.46e-04s    581    30
Elemwise{Composite{tanh((i0 + i
1))}}(InplaceDimShuffle{x,0}.0, uah)
   0.1%    99.8%       0.052s       8.99e-05s    581    46
Elemwise{Composite{((scalar_sig
moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)](LSTM_bf, Ge
mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0,
LSTM_bc, Gemv{inplace}.0
)
   0.0%    99.8%       0.033s       5.60e-05s    581    47
Elemwise{Composite{(scalar_sigm
oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0,
Elemwise{Composite{((scalar_s
igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)].0)
   0.0%    99.9%       0.031s       5.27e-05s    581    31
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 +
i1))}}.0, attention_va, Ten
sorConstant{0.0})
   0.0%    99.9%       0.028s       4.78e-05s    581    35
Elemwise{mul,no_inplace}(Inplac
eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0)
   0.0%    99.9%       0.026s       4.48e-05s    581    36
Sum{axis=[0], acc_dtype=float64
}(Elemwise{mul,no_inplace}.0)
   0.0%    99.9%       0.007s       1.21e-05s    581    26
AdvancedSubtensor1(word_embed_e
mbeddings, Rebroadcast{1}.0)
   0.0%   100.0%       0.006s       1.03e-05s    581    48
Join(TensorConstant{0}, Elemwis
e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0,
Elemwise{Composite{((scalar_
sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)].0, Tenso
rConstant{(1,) of 0.0})
   0.0%   100.0%       0.004s       6.38e-06s    581    37
Join(TensorConstant{0}, Sum{axi
s=[0], acc_dtype=float64}.0, Reshape{1}.0)
   0.0%   100.0%       0.003s       4.41e-06s    581     3
Subtensor{int64:int64:}(s, Cons
tant{1000}, Constant{2000})
   0.0%   100.0%       0.002s       2.87e-06s    581    16
InplaceDimShuffle{1,0}(LSTM_uo)
   ... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime)

-- 

---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
For more options, visit https://groups.google.com/d/optout.



Le 30 janv. 2017 20:56, "Raphael Shu" <[email protected]> a écrit :

Hi,


It turns out the LSTMs run very slow on CPU,


the profiling results show that theano.tensor.blas.Gemv is the reason
and the type of Gemv is Py.


Does this result imply that the Gemv operation is run on Python level?


Is there anyone can provide some tips on how to speed up the operation?


Thanks !


Raphael Shu



Function profiling
==================
  Message: /home/shu/research/deepy/deepy/networks/network.py:196
  Time in 581 calls to Function.__call__: 7.022281e+01s
  Time in Function.fn.__call__: 7.018702e+01s (99.949%)
  Time in thunks: 7.015664e+01s (99.906%)
  Total compile time: 1.668830e-01s
    Number of Apply nodes: 49
    Theano Optimizer time: 1.264119e-01s
       Theano validate time: 1.095724e-02s
    Theano Linker time (includes C, CUDA code generation/compiling):
2.585006e-02s
       Import time 1.466990e-03s
       Node make_thunk time 2.393484e-02s
           Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) +
(scalar_sigmoid((i3 +
 i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0,
Subtensor{int64:int64:}.0, LST
M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, Elemwise{C
omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time
8.509159e-04s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, LSTM_wf.T,
 Join.0, TensorConstant{0.0}) time 8.258820e-04s
           Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0,
TensorConstant{1.0}, attention_
wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s
           Node Elemwise{Composite{tanh((i0 +
i1))}}(InplaceDimShuffle{x,0}.0, uah) time 7.7
41451e-04s

Time in all call to theano.grad() 0.000000e+00s
Time since theano import 335.098s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply>
<Class name>
  99.2%    99.2%      69.597s       1.20e-02s     Py    5810      10
theano.tensor.blas.Ge
mv
   0.7%    99.9%       0.488s       2.10e-04s     C     2324       4
theano.tensor.elemwis
e.Elemwise
   0.0%    99.9%       0.026s       4.48e-05s     C      581       1
theano.tensor.elemwis
e.Sum
   0.0%   100.0%       0.013s       1.81e-06s     C     6972      12
theano.tensor.elemwis
e.DimShuffle
   0.0%   100.0%       0.010s       8.33e-06s     C     1162       2
theano.tensor.basic.J
oin
   0.0%   100.0%       0.007s       1.21e-05s     C      581       1
theano.tensor.subtens
or.AdvancedSubtensor1
   0.0%   100.0%       0.005s       2.95e-06s     C     1743       3
theano.tensor.subtens
or.Subtensor
   0.0%   100.0%       0.005s       1.35e-06s     C     3486       6
theano.compile.ops.Sh
ape_i
   0.0%   100.0%       0.003s       7.82e-07s     C     3486       6
theano.tensor.basic.A
llocEmpty
   0.0%   100.0%       0.002s       1.51e-06s     C     1162       2
theano.tensor.basic.R
eshape
   0.0%   100.0%       0.002s       2.60e-06s     C      581       1
theano.tensor.nnet.nn
et.Softmax
   0.0%   100.0%       0.001s       1.09e-06s     C      581       1
theano.compile.ops.Re
broadcast
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  99.2%    99.2%      69.597s       1.20e-02s     Py    5810       10
 Gemv{inplace}
   0.5%    99.7%       0.375s       6.46e-04s     C      581        1
 Elemwise{Composite{t
anh((i0 + i1))}}
   0.1%    99.8%       0.052s       8.99e-05s     C      581        1
 Elemwise{Composite{(
(scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) *
tanh((i5 + i6))))}}[(0, 1)]
   0.0%    99.9%       0.033s       5.60e-05s     C      581        1
 Elemwise{Composite{(
scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)]
   0.0%    99.9%       0.028s       4.78e-05s     C      581        1
 Elemwise{mul,no_inpl
ace}
   0.0%    99.9%       0.026s       4.48e-05s     C      581        1
 Sum{axis=[0], acc_dt
ype=float64}
   0.0%    99.9%       0.010s       1.96e-06s     C     5229        9
 InplaceDimShuffle{1,
0}
   0.0%   100.0%       0.010s       8.33e-06s     C     1162        2   Join
   0.0%   100.0%       0.007s       1.21e-05s     C      581        1
 AdvancedSubtensor1
   0.0%   100.0%       0.004s       1.46e-06s     C     2905        5
 Shape_i{1}
   0.0%   100.0%       0.003s       7.82e-07s     C     3486        6
 AllocEmpty{dtype='fl
oat32'}
   0.0%   100.0%       0.003s       4.41e-06s     C      581        1
 Subtensor{int64:int6
4:}
   0.0%   100.0%       0.002s       1.51e-06s     C     1162        2
 Reshape{1}
   0.0%   100.0%       0.002s       1.46e-06s     C     1162        2
 InplaceDimShuffle{x,
0}
   0.0%   100.0%       0.002s       2.66e-06s     C      581        1
 Subtensor{::, :int64
:}
   0.0%   100.0%       0.002s       2.60e-06s     C      581        1   Softmax
   0.0%   100.0%       0.001s       1.78e-06s     C      581        1
 Subtensor{:int64:}
   0.0%   100.0%       0.001s       1.16e-06s     C      581        1
 InplaceDimShuffle{1,
x}
   0.0%   100.0%       0.001s       1.09e-06s     C      581        1
 Rebroadcast{1}
   0.0%   100.0%       0.000s       8.04e-07s     C      581        1
 Shape_i{0}
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  22.3%    22.3%      15.677s       2.70e-02s    581    39
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0})
  22.2%    44.5%      15.575s       2.68e-02s    581    38
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0})
  22.0%    66.5%      15.409s       2.65e-02s    581    41
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0})
  21.8%    88.4%      15.324s       2.64e-02s    581    40
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0})
   2.2%    90.5%       1.523s       2.62e-03s    581    42
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    92.7%       1.523s       2.62e-03s    581    43
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    94.9%       1.514s       2.61e-03s    581    27
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, attention_wa.T,
Subtensor{:int64:}.0, TensorConstant{0.0}
)
   2.2%    97.0%       1.513s       2.60e-03s    581    45
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   2.2%    99.2%       1.509s       2.60e-03s    581    44
Gemv{inplace}(Gemv{inplace}.0,
TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0})
   0.5%    99.7%       0.375s       6.46e-04s    581    30
Elemwise{Composite{tanh((i0 + i
1))}}(InplaceDimShuffle{x,0}.0, uah)
   0.1%    99.8%       0.052s       8.99e-05s    581    46
Elemwise{Composite{((scalar_sig
moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)](LSTM_bf, Ge
mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0,
LSTM_bc, Gemv{inplace}.0
)
   0.0%    99.8%       0.033s       5.60e-05s    581    47
Elemwise{Composite{(scalar_sigm
oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0,
Elemwise{Composite{((scalar_s
igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)].0)
   0.0%    99.9%       0.031s       5.27e-05s    581    31
Gemv{inplace}(AllocEmpty{dtype=
'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 +
i1))}}.0, attention_va, Ten
sorConstant{0.0})
   0.0%    99.9%       0.028s       4.78e-05s    581    35
Elemwise{mul,no_inplace}(Inplac
eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0)
   0.0%    99.9%       0.026s       4.48e-05s    581    36
Sum{axis=[0], acc_dtype=float64
}(Elemwise{mul,no_inplace}.0)
   0.0%    99.9%       0.007s       1.21e-05s    581    26
AdvancedSubtensor1(word_embed_e
mbeddings, Rebroadcast{1}.0)
   0.0%   100.0%       0.006s       1.03e-05s    581    48
Join(TensorConstant{0}, Elemwis
e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0,
Elemwise{Composite{((scalar_
sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 +
i6))))}}[(0, 1)].0, Tenso
rConstant{(1,) of 0.0})
   0.0%   100.0%       0.004s       6.38e-06s    581    37
Join(TensorConstant{0}, Sum{axi
s=[0], acc_dtype=float64}.0, Reshape{1}.0)
   0.0%   100.0%       0.003s       4.41e-06s    581     3
Subtensor{int64:int64:}(s, Cons
tant{1000}, Constant{2000})
   0.0%   100.0%       0.002s       2.87e-06s    581    16
InplaceDimShuffle{1,0}(LSTM_uo)
   ... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime)

-- 

---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to