Theano is not about to link directly to BLAS. The simplest way to fix this is to use anaconda and install mkl-service.
Make sure to use Theano dev version. We tested this recently and it work on Mac Linux and Windows. Fred Le 30 janv. 2017 20:56, "Raphael Shu" <[email protected]> a écrit : Hi, It turns out the LSTMs run very slow on CPU, the profiling results show that theano.tensor.blas.Gemv is the reason and the type of Gemv is Py. Does this result imply that the Gemv operation is run on Python level? Is there anyone can provide some tips on how to speed up the operation? Thanks ! Raphael Shu Function profiling ================== Message: /home/shu/research/deepy/deepy/networks/network.py:196 Time in 581 calls to Function.__call__: 7.022281e+01s Time in Function.fn.__call__: 7.018702e+01s (99.949%) Time in thunks: 7.015664e+01s (99.906%) Total compile time: 1.668830e-01s Number of Apply nodes: 49 Theano Optimizer time: 1.264119e-01s Theano validate time: 1.095724e-02s Theano Linker time (includes C, CUDA code generation/compiling): 2.585006e-02s Import time 1.466990e-03s Node make_thunk time 2.393484e-02s Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0, Subtensor{int64:int64:}.0, LST M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, Elemwise{C omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time 8.509159e-04s Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0}) time 8.258820e-04s Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, attention_ wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s Node Elemwise{Composite{tanh((i0 + i1))}}(InplaceDimShuffle{x,0}.0, uah) time 7.7 41451e-04s Time in all call to theano.grad() 0.000000e+00s Time since theano import 335.098s Class --- <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name> 99.2% 99.2% 69.597s 1.20e-02s Py 5810 10 theano.tensor.blas.Ge mv 0.7% 99.9% 0.488s 2.10e-04s C 2324 4 theano.tensor.elemwis e.Elemwise 0.0% 99.9% 0.026s 4.48e-05s C 581 1 theano.tensor.elemwis e.Sum 0.0% 100.0% 0.013s 1.81e-06s C 6972 12 theano.tensor.elemwis e.DimShuffle 0.0% 100.0% 0.010s 8.33e-06s C 1162 2 theano.tensor.basic.J oin 0.0% 100.0% 0.007s 1.21e-05s C 581 1 theano.tensor.subtens or.AdvancedSubtensor1 0.0% 100.0% 0.005s 2.95e-06s C 1743 3 theano.tensor.subtens or.Subtensor 0.0% 100.0% 0.005s 1.35e-06s C 3486 6 theano.compile.ops.Sh ape_i 0.0% 100.0% 0.003s 7.82e-07s C 3486 6 theano.tensor.basic.A llocEmpty 0.0% 100.0% 0.002s 1.51e-06s C 1162 2 theano.tensor.basic.R eshape 0.0% 100.0% 0.002s 2.60e-06s C 581 1 theano.tensor.nnet.nn et.Softmax 0.0% 100.0% 0.001s 1.09e-06s C 581 1 theano.compile.ops.Re broadcast ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) Ops --- <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name> 99.2% 99.2% 69.597s 1.20e-02s Py 5810 10 Gemv{inplace} 0.5% 99.7% 0.375s 6.46e-04s C 581 1 Elemwise{Composite{t anh((i0 + i1))}} 0.1% 99.8% 0.052s 8.99e-05s C 581 1 Elemwise{Composite{( (scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)] 0.0% 99.9% 0.033s 5.60e-05s C 581 1 Elemwise{Composite{( scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)] 0.0% 99.9% 0.028s 4.78e-05s C 581 1 Elemwise{mul,no_inpl ace} 0.0% 99.9% 0.026s 4.48e-05s C 581 1 Sum{axis=[0], acc_dt ype=float64} 0.0% 99.9% 0.010s 1.96e-06s C 5229 9 InplaceDimShuffle{1, 0} 0.0% 100.0% 0.010s 8.33e-06s C 1162 2 Join 0.0% 100.0% 0.007s 1.21e-05s C 581 1 AdvancedSubtensor1 0.0% 100.0% 0.004s 1.46e-06s C 2905 5 Shape_i{1} 0.0% 100.0% 0.003s 7.82e-07s C 3486 6 AllocEmpty{dtype='fl oat32'} 0.0% 100.0% 0.003s 4.41e-06s C 581 1 Subtensor{int64:int6 4:} 0.0% 100.0% 0.002s 1.51e-06s C 1162 2 Reshape{1} 0.0% 100.0% 0.002s 1.46e-06s C 1162 2 InplaceDimShuffle{x, 0} 0.0% 100.0% 0.002s 2.66e-06s C 581 1 Subtensor{::, :int64 :} 0.0% 100.0% 0.002s 2.60e-06s C 581 1 Softmax 0.0% 100.0% 0.001s 1.78e-06s C 581 1 Subtensor{:int64:} 0.0% 100.0% 0.001s 1.16e-06s C 581 1 InplaceDimShuffle{1, x} 0.0% 100.0% 0.001s 1.09e-06s C 581 1 Rebroadcast{1} 0.0% 100.0% 0.000s 8.04e-07s C 581 1 Shape_i{0} ... (remaining 0 Ops account for 0.00%(0.00s) of the runtime) Apply ------ <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name> 22.3% 22.3% 15.677s 2.70e-02s 581 39 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0}) 22.2% 44.5% 15.575s 2.68e-02s 581 38 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0}) 22.0% 66.5% 15.409s 2.65e-02s 581 41 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0}) 21.8% 88.4% 15.324s 2.64e-02s 581 40 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0}) 2.2% 90.5% 1.523s 2.62e-03s 581 42 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 2.2% 92.7% 1.523s 2.62e-03s 581 43 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 2.2% 94.9% 1.514s 2.61e-03s 581 27 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, attention_wa.T, Subtensor{:int64:}.0, TensorConstant{0.0} ) 2.2% 97.0% 1.513s 2.60e-03s 581 45 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 2.2% 99.2% 1.509s 2.60e-03s 581 44 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 0.5% 99.7% 0.375s 6.46e-04s 581 30 Elemwise{Composite{tanh((i0 + i 1))}}(InplaceDimShuffle{x,0}.0, uah) 0.1% 99.8% 0.052s 8.99e-05s 581 46 Elemwise{Composite{((scalar_sig moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Ge mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0 ) 0.0% 99.8% 0.033s 5.60e-05s 581 47 Elemwise{Composite{(scalar_sigm oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0, Elemwise{Composite{((scalar_s igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)].0) 0.0% 99.9% 0.031s 5.27e-05s 581 31 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 + i1))}}.0, attention_va, Ten sorConstant{0.0}) 0.0% 99.9% 0.028s 4.78e-05s 581 35 Elemwise{mul,no_inplace}(Inplac eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0) 0.0% 99.9% 0.026s 4.48e-05s 581 36 Sum{axis=[0], acc_dtype=float64 }(Elemwise{mul,no_inplace}.0) 0.0% 99.9% 0.007s 1.21e-05s 581 26 AdvancedSubtensor1(word_embed_e mbeddings, Rebroadcast{1}.0) 0.0% 100.0% 0.006s 1.03e-05s 581 48 Join(TensorConstant{0}, Elemwis e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0, Elemwise{Composite{((scalar_ sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)].0, Tenso rConstant{(1,) of 0.0}) 0.0% 100.0% 0.004s 6.38e-06s 581 37 Join(TensorConstant{0}, Sum{axi s=[0], acc_dtype=float64}.0, Reshape{1}.0) 0.0% 100.0% 0.003s 4.41e-06s 581 3 Subtensor{int64:int64:}(s, Cons tant{1000}, Constant{2000}) 0.0% 100.0% 0.002s 2.87e-06s 581 16 InplaceDimShuffle{1,0}(LSTM_uo) ... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime) -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout. Le 30 janv. 2017 20:56, "Raphael Shu" <[email protected]> a écrit : Hi, It turns out the LSTMs run very slow on CPU, the profiling results show that theano.tensor.blas.Gemv is the reason and the type of Gemv is Py. Does this result imply that the Gemv operation is run on Python level? Is there anyone can provide some tips on how to speed up the operation? Thanks ! Raphael Shu Function profiling ================== Message: /home/shu/research/deepy/deepy/networks/network.py:196 Time in 581 calls to Function.__call__: 7.022281e+01s Time in Function.fn.__call__: 7.018702e+01s (99.949%) Time in thunks: 7.015664e+01s (99.906%) Total compile time: 1.668830e-01s Number of Apply nodes: 49 Theano Optimizer time: 1.264119e-01s Theano validate time: 1.095724e-02s Theano Linker time (includes C, CUDA code generation/compiling): 2.585006e-02s Import time 1.466990e-03s Node make_thunk time 2.393484e-02s Node Elemwise{Composite{((scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Gemv{inplace}.0, Subtensor{int64:int64:}.0, LST M_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0) time 1.141071e-03s Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, Elemwise{C omposite{tanh((i0 + i1))}}.0, attention_va, TensorConstant{0.0}) time 8.509159e-04s Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0}) time 8.258820e-04s Node Gemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, attention_ wa.T, Subtensor{:int64:}.0, TensorConstant{0.0}) time 7.920265e-04s Node Elemwise{Composite{tanh((i0 + i1))}}(InplaceDimShuffle{x,0}.0, uah) time 7.7 41451e-04s Time in all call to theano.grad() 0.000000e+00s Time since theano import 335.098s Class --- <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name> 99.2% 99.2% 69.597s 1.20e-02s Py 5810 10 theano.tensor.blas.Ge mv 0.7% 99.9% 0.488s 2.10e-04s C 2324 4 theano.tensor.elemwis e.Elemwise 0.0% 99.9% 0.026s 4.48e-05s C 581 1 theano.tensor.elemwis e.Sum 0.0% 100.0% 0.013s 1.81e-06s C 6972 12 theano.tensor.elemwis e.DimShuffle 0.0% 100.0% 0.010s 8.33e-06s C 1162 2 theano.tensor.basic.J oin 0.0% 100.0% 0.007s 1.21e-05s C 581 1 theano.tensor.subtens or.AdvancedSubtensor1 0.0% 100.0% 0.005s 2.95e-06s C 1743 3 theano.tensor.subtens or.Subtensor 0.0% 100.0% 0.005s 1.35e-06s C 3486 6 theano.compile.ops.Sh ape_i 0.0% 100.0% 0.003s 7.82e-07s C 3486 6 theano.tensor.basic.A llocEmpty 0.0% 100.0% 0.002s 1.51e-06s C 1162 2 theano.tensor.basic.R eshape 0.0% 100.0% 0.002s 2.60e-06s C 581 1 theano.tensor.nnet.nn et.Softmax 0.0% 100.0% 0.001s 1.09e-06s C 581 1 theano.compile.ops.Re broadcast ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) Ops --- <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name> 99.2% 99.2% 69.597s 1.20e-02s Py 5810 10 Gemv{inplace} 0.5% 99.7% 0.375s 6.46e-04s C 581 1 Elemwise{Composite{t anh((i0 + i1))}} 0.1% 99.8% 0.052s 8.99e-05s C 581 1 Elemwise{Composite{( (scalar_sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)] 0.0% 99.9% 0.033s 5.60e-05s C 581 1 Elemwise{Composite{( scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)] 0.0% 99.9% 0.028s 4.78e-05s C 581 1 Elemwise{mul,no_inpl ace} 0.0% 99.9% 0.026s 4.48e-05s C 581 1 Sum{axis=[0], acc_dt ype=float64} 0.0% 99.9% 0.010s 1.96e-06s C 5229 9 InplaceDimShuffle{1, 0} 0.0% 100.0% 0.010s 8.33e-06s C 1162 2 Join 0.0% 100.0% 0.007s 1.21e-05s C 581 1 AdvancedSubtensor1 0.0% 100.0% 0.004s 1.46e-06s C 2905 5 Shape_i{1} 0.0% 100.0% 0.003s 7.82e-07s C 3486 6 AllocEmpty{dtype='fl oat32'} 0.0% 100.0% 0.003s 4.41e-06s C 581 1 Subtensor{int64:int6 4:} 0.0% 100.0% 0.002s 1.51e-06s C 1162 2 Reshape{1} 0.0% 100.0% 0.002s 1.46e-06s C 1162 2 InplaceDimShuffle{x, 0} 0.0% 100.0% 0.002s 2.66e-06s C 581 1 Subtensor{::, :int64 :} 0.0% 100.0% 0.002s 2.60e-06s C 581 1 Softmax 0.0% 100.0% 0.001s 1.78e-06s C 581 1 Subtensor{:int64:} 0.0% 100.0% 0.001s 1.16e-06s C 581 1 InplaceDimShuffle{1, x} 0.0% 100.0% 0.001s 1.09e-06s C 581 1 Rebroadcast{1} 0.0% 100.0% 0.000s 8.04e-07s C 581 1 Shape_i{0} ... (remaining 0 Ops account for 0.00%(0.00s) of the runtime) Apply ------ <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name> 22.3% 22.3% 15.677s 2.70e-02s 581 39 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wi.T, Join.0, TensorConstant{0.0}) 22.2% 44.5% 15.575s 2.68e-02s 581 38 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wo.T, Join.0, TensorConstant{0.0}) 22.0% 66.5% 15.409s 2.65e-02s 581 41 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wc.T, Join.0, TensorConstant{0.0}) 21.8% 88.4% 15.324s 2.64e-02s 581 40 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, LSTM_wf.T, Join.0, TensorConstant{0.0}) 2.2% 90.5% 1.523s 2.62e-03s 581 42 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_uc.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 2.2% 92.7% 1.523s 2.62e-03s 581 43 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_ui.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 2.2% 94.9% 1.514s 2.61e-03s 581 27 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, attention_wa.T, Subtensor{:int64:}.0, TensorConstant{0.0} ) 2.2% 97.0% 1.513s 2.60e-03s 581 45 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_uo.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 2.2% 99.2% 1.509s 2.60e-03s 581 44 Gemv{inplace}(Gemv{inplace}.0, TensorConstant{1.0}, LSTM_uf.T, Subtensor{:int64:}.0, TensorConstant{1.0}) 0.5% 99.7% 0.375s 6.46e-04s 581 30 Elemwise{Composite{tanh((i0 + i 1))}}(InplaceDimShuffle{x,0}.0, uah) 0.1% 99.8% 0.052s 8.99e-05s 581 46 Elemwise{Composite{((scalar_sig moid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)](LSTM_bf, Ge mv{inplace}.0, Subtensor{int64:int64:}.0, LSTM_bi, Gemv{inplace}.0, LSTM_bc, Gemv{inplace}.0 ) 0.0% 99.8% 0.033s 5.60e-05s 581 47 Elemwise{Composite{(scalar_sigm oid((i0 + i1)) * tanh(i2))}}[(0, 1)](LSTM_bo, Gemv{inplace}.0, Elemwise{Composite{((scalar_s igmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)].0) 0.0% 99.9% 0.031s 5.27e-05s 581 31 Gemv{inplace}(AllocEmpty{dtype= 'float32'}.0, TensorConstant{1.0}, Elemwise{Composite{tanh((i0 + i1))}}.0, attention_va, Ten sorConstant{0.0}) 0.0% 99.9% 0.028s 4.78e-05s 581 35 Elemwise{mul,no_inplace}(Inplac eDimShuffle{1,x}.0, Subtensor{::, :int64:}.0) 0.0% 99.9% 0.026s 4.48e-05s 581 36 Sum{axis=[0], acc_dtype=float64 }(Elemwise{mul,no_inplace}.0) 0.0% 99.9% 0.007s 1.21e-05s 581 26 AdvancedSubtensor1(word_embed_e mbeddings, Rebroadcast{1}.0) 0.0% 100.0% 0.006s 1.03e-05s 581 48 Join(TensorConstant{0}, Elemwis e{Composite{(scalar_sigmoid((i0 + i1)) * tanh(i2))}}[(0, 1)].0, Elemwise{Composite{((scalar_ sigmoid((i0 + i1)) * i2) + (scalar_sigmoid((i3 + i4)) * tanh((i5 + i6))))}}[(0, 1)].0, Tenso rConstant{(1,) of 0.0}) 0.0% 100.0% 0.004s 6.38e-06s 581 37 Join(TensorConstant{0}, Sum{axi s=[0], acc_dtype=float64}.0, Reshape{1}.0) 0.0% 100.0% 0.003s 4.41e-06s 581 3 Subtensor{int64:int64:}(s, Cons tant{1000}, Constant{2000}) 0.0% 100.0% 0.002s 2.87e-06s 581 16 InplaceDimShuffle{1,0}(LSTM_uo) ... (remaining 29 Apply instances account for 0.04%(0.02s) of the runtime) -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
