If I'm understanding your code correctly, you should be able to use 
tensordot
http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot
rather than doing the multiply and sum.

On Thursday, March 16, 2017 at 10:59:14 AM UTC-4, Eelke Spaak wrote:
>
> Apologies for the messed up profiling code, here is attempt 2:
>
> Class
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> 
> <Class name>
>   46.2%    46.2%      10.971s       2.74e-05s     C   400764      42   
> theano.sandbox.cuda.basic_ops.GpuElemwise
>   29.9%    76.0%       7.098s       3.72e-05s     C   190840      20   
> theano.sandbox.cuda.basic_ops.GpuCAReduce
>    7.2%    83.2%       1.699s       1.48e-05s     C   114504      12   
> theano.sandbox.cuda.blas.GpuDot22
>    3.8%    87.0%       0.911s       4.78e-05s     C    19084       2   
> theano.sandbox.cuda.basic_ops.GpuJoin
>    3.8%    90.9%       0.907s       5.59e-06s     C   162214      17   
> theano.sandbox.cuda.basic_ops.GpuFromHost
>    2.9%    93.8%       0.700s       1.05e-05s     C    66794       7   
> theano.sandbox.cuda.basic_ops.HostFromGpu
>    2.1%    95.9%       0.501s       1.14e-06s     C   438932      46   
> theano.sandbox.cuda.basic_ops.GpuReshape
>    1.5%    97.4%       0.348s       1.46e-06s     C   238550      25   
> theano.tensor.elemwise.Elemwise
>    1.4%    98.7%       0.327s       3.43e-05s     C     9542       1   
> theano.sandbox.cuda.blas.GpuGemv
>    0.4%    99.2%       0.097s       9.28e-07s     C   104962      11   
> theano.sandbox.cuda.basic_ops.GpuDimShuffle
>    0.3%    99.5%       0.081s       1.06e-06s     C    76336       8   
> theano.sandbox.cuda.basic_ops.GpuSubtensor
>    0.2%    99.7%       0.042s       4.35e-06s     C     9542       1   
> theano.tensor.basic.Join
>    0.1%    99.8%       0.033s       8.62e-07s     C    38168       4   
> theano.tensor.elemwise.DimShuffle
>    0.1%    99.9%       0.019s       9.75e-07s     C    19084       2   
> theano.tensor.subtensor.Subtensor
>    0.1%    99.9%       0.015s       1.54e-06s     C     9542       1   
> theano.sandbox.cuda.basic_ops.GpuAllocEmpty
>    0.1%   100.0%       0.012s       6.46e-07s     C    19084       2   
> theano.compile.ops.ViewOp
>    ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)
>
> Ops
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op 
> name>
>   24.7%    24.7%       5.860s       6.14e-05s     C     95420       10   
> GpuElemwise{mul,no_inplace}
>   17.6%    42.2%       4.173s       1.09e-04s     C     38168        4   
> GpuCAReduce{add}{1,1,1}
>    7.2%    49.4%       1.699s       1.48e-05s     C     114504       12   
> GpuDot22
>    4.1%    53.5%       0.974s       2.55e-05s     C     38168        4   
> GpuCAReduce{add}{0,1,0}
>    4.1%    57.6%       0.972s       2.55e-05s     C     38168        4   
> GpuCAReduce{add}{0,1}
>    3.8%    61.4%       0.911s       4.78e-05s     C     19084        2   
> GpuJoin
>    3.8%    65.2%       0.907s       5.59e-06s     C     162214       17   
> GpuFromHost
>    2.9%    68.2%       0.700s       1.05e-05s     C     66794        7   
> HostFromGpu
>    2.6%    70.7%       0.611s       6.40e-05s     C     9542        1   
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3))))}}[(0, 2)]
>    2.1%    72.9%       0.503s       5.28e-05s     C     9542        1   
> GpuElemwise{Composite{((i0 * i1) - scalar_softplus(i1))},no_inplace}
>    2.0%    74.8%       0.468s       4.91e-05s     C     9542        1   
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, 1)]
>    1.9%    76.7%       0.444s       1.16e-05s     C     38168        4   
> GpuCAReduce{add}{0,1,1}
>    1.7%    78.4%       0.404s       4.24e-05s     C     9542        1   
> GpuElemwise{Composite{((i0 + i1) + i2)}}[(0, 1)]
>    1.4%    79.8%       0.327s       3.43e-05s     C     9542        1   
> GpuGemv{inplace}
>    1.4%    81.1%       0.322s       1.69e-05s     C     19084        2   
> GpuCAReduce{add}{0,0,1}
>    1.3%    82.4%       0.313s       1.09e-05s     C     28626        3   
> GpuElemwise{Composite{((i0 * i1) + i2)}}[(0, 2)]
>    1.0%    83.5%       0.246s       1.29e-05s     C     19084        2   
> GpuElemwise{scalar_sigmoid,no_inplace}
>    0.9%    84.4%       0.221s       1.16e-06s     C     190840       20   
> GpuReshape{3}
>    0.9%    85.3%       0.219s       1.15e-06s     C     190840       20   
> GpuReshape{2}
>    0.9%    86.2%       0.214s       1.12e-05s     C     19084        2   
> GpuElemwise{Composite{(i0 + (i1 * sqr(i2)))},no_inplace}
>    ... (remaining 49 Ops account for  13.76%(3.27s) of the runtime)
>
> Apply
> ------
> <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
>   16.3%    16.3%       3.882s       4.07e-04s   9542   165   
> GpuCAReduce{add}{1,1,1}(GpuElemwise{Composite{((i0 * i1) - 
> scalar_softplus(i1))},no_inplace}.0)
>    3.4%    19.7%       0.810s       8.48e-05s   9542   169   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
>    3.4%    23.1%       0.802s       8.40e-05s   9542    71   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
>    3.1%    26.2%       0.730s       7.65e-05s   9542    70   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
>    3.0%    29.2%       0.720s       7.55e-05s   9542   170   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
>    2.9%    32.1%       0.692s       7.25e-05s   9542    47   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
>    2.9%    35.0%       0.681s       7.13e-05s   9542   134   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
>    2.6%    37.6%       0.611s       6.40e-05s   9542    63   
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3))))}}[(0, 
> 2)](CudaNdarrayConstant{
>    2.6%    40.1%       0.608s       6.37e-05s   9542    46   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
>    2.5%    42.7%       0.603s       6.32e-05s   9542   135   
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
>    2.1%    44.8%       0.503s       5.28e-05s   9542   161   
> GpuElemwise{Composite{((i0 * i1) - 
> scalar_softplus(i1))},no_inplace}(CudaNdarrayConstant{
>    2.0%    46.8%       0.468s       4.91e-05s   9542   163   
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, 
> 1)](CudaNdarrayConstant{[[[ 0.  0.  0. ...,  0.  0.  0.]
>
> ...
>
>    ... (remaining 181 Apply instances account for 42.13%(10.01s) of the 
> runtime)
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to