If I'm understanding your code correctly, you should be able to use
tensordot
http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot
rather than doing the multiply and sum.
On Thursday, March 16, 2017 at 10:59:14 AM UTC-4, Eelke Spaak wrote:
>
> Apologies for the messed up profiling code, here is attempt 2:
>
> Class
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply>
> <Class name>
> 46.2% 46.2% 10.971s 2.74e-05s C 400764 42
> theano.sandbox.cuda.basic_ops.GpuElemwise
> 29.9% 76.0% 7.098s 3.72e-05s C 190840 20
> theano.sandbox.cuda.basic_ops.GpuCAReduce
> 7.2% 83.2% 1.699s 1.48e-05s C 114504 12
> theano.sandbox.cuda.blas.GpuDot22
> 3.8% 87.0% 0.911s 4.78e-05s C 19084 2
> theano.sandbox.cuda.basic_ops.GpuJoin
> 3.8% 90.9% 0.907s 5.59e-06s C 162214 17
> theano.sandbox.cuda.basic_ops.GpuFromHost
> 2.9% 93.8% 0.700s 1.05e-05s C 66794 7
> theano.sandbox.cuda.basic_ops.HostFromGpu
> 2.1% 95.9% 0.501s 1.14e-06s C 438932 46
> theano.sandbox.cuda.basic_ops.GpuReshape
> 1.5% 97.4% 0.348s 1.46e-06s C 238550 25
> theano.tensor.elemwise.Elemwise
> 1.4% 98.7% 0.327s 3.43e-05s C 9542 1
> theano.sandbox.cuda.blas.GpuGemv
> 0.4% 99.2% 0.097s 9.28e-07s C 104962 11
> theano.sandbox.cuda.basic_ops.GpuDimShuffle
> 0.3% 99.5% 0.081s 1.06e-06s C 76336 8
> theano.sandbox.cuda.basic_ops.GpuSubtensor
> 0.2% 99.7% 0.042s 4.35e-06s C 9542 1
> theano.tensor.basic.Join
> 0.1% 99.8% 0.033s 8.62e-07s C 38168 4
> theano.tensor.elemwise.DimShuffle
> 0.1% 99.9% 0.019s 9.75e-07s C 19084 2
> theano.tensor.subtensor.Subtensor
> 0.1% 99.9% 0.015s 1.54e-06s C 9542 1
> theano.sandbox.cuda.basic_ops.GpuAllocEmpty
> 0.1% 100.0% 0.012s 6.46e-07s C 19084 2
> theano.compile.ops.ViewOp
> ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
>
> Ops
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op
> name>
> 24.7% 24.7% 5.860s 6.14e-05s C 95420 10
> GpuElemwise{mul,no_inplace}
> 17.6% 42.2% 4.173s 1.09e-04s C 38168 4
> GpuCAReduce{add}{1,1,1}
> 7.2% 49.4% 1.699s 1.48e-05s C 114504 12
> GpuDot22
> 4.1% 53.5% 0.974s 2.55e-05s C 38168 4
> GpuCAReduce{add}{0,1,0}
> 4.1% 57.6% 0.972s 2.55e-05s C 38168 4
> GpuCAReduce{add}{0,1}
> 3.8% 61.4% 0.911s 4.78e-05s C 19084 2
> GpuJoin
> 3.8% 65.2% 0.907s 5.59e-06s C 162214 17
> GpuFromHost
> 2.9% 68.2% 0.700s 1.05e-05s C 66794 7
> HostFromGpu
> 2.6% 70.7% 0.611s 6.40e-05s C 9542 1
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3))))}}[(0, 2)]
> 2.1% 72.9% 0.503s 5.28e-05s C 9542 1
> GpuElemwise{Composite{((i0 * i1) - scalar_softplus(i1))},no_inplace}
> 2.0% 74.8% 0.468s 4.91e-05s C 9542 1
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, 1)]
> 1.9% 76.7% 0.444s 1.16e-05s C 38168 4
> GpuCAReduce{add}{0,1,1}
> 1.7% 78.4% 0.404s 4.24e-05s C 9542 1
> GpuElemwise{Composite{((i0 + i1) + i2)}}[(0, 1)]
> 1.4% 79.8% 0.327s 3.43e-05s C 9542 1
> GpuGemv{inplace}
> 1.4% 81.1% 0.322s 1.69e-05s C 19084 2
> GpuCAReduce{add}{0,0,1}
> 1.3% 82.4% 0.313s 1.09e-05s C 28626 3
> GpuElemwise{Composite{((i0 * i1) + i2)}}[(0, 2)]
> 1.0% 83.5% 0.246s 1.29e-05s C 19084 2
> GpuElemwise{scalar_sigmoid,no_inplace}
> 0.9% 84.4% 0.221s 1.16e-06s C 190840 20
> GpuReshape{3}
> 0.9% 85.3% 0.219s 1.15e-06s C 190840 20
> GpuReshape{2}
> 0.9% 86.2% 0.214s 1.12e-05s C 19084 2
> GpuElemwise{Composite{(i0 + (i1 * sqr(i2)))},no_inplace}
> ... (remaining 49 Ops account for 13.76%(3.27s) of the runtime)
>
> Apply
> ------
> <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
> 16.3% 16.3% 3.882s 4.07e-04s 9542 165
> GpuCAReduce{add}{1,1,1}(GpuElemwise{Composite{((i0 * i1) -
> scalar_softplus(i1))},no_inplace}.0)
> 3.4% 19.7% 0.810s 8.48e-05s 9542 169
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
> 3.4% 23.1% 0.802s 8.40e-05s 9542 71
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
> 3.1% 26.2% 0.730s 7.65e-05s 9542 70
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
> 3.0% 29.2% 0.720s 7.55e-05s 9542 170
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{
> 2.9% 32.1% 0.692s 7.25e-05s 9542 47
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
> 2.9% 35.0% 0.681s 7.13e-05s 9542 134
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
> 2.6% 37.6% 0.611s 6.40e-05s 9542 63
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3))))}}[(0,
> 2)](CudaNdarrayConstant{
> 2.6% 40.1% 0.608s 6.37e-05s 9542 46
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
> 2.5% 42.7% 0.603s 6.32e-05s 9542 135
> GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{
> 2.1% 44.8% 0.503s 5.28e-05s 9542 161
> GpuElemwise{Composite{((i0 * i1) -
> scalar_softplus(i1))},no_inplace}(CudaNdarrayConstant{
> 2.0% 46.8% 0.468s 4.91e-05s 9542 163
> GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0,
> 1)](CudaNdarrayConstant{[[[ 0. 0. 0. ..., 0. 0. 0.]
>
> ...
>
> ... (remaining 181 Apply instances account for 42.13%(10.01s) of the
> runtime)
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.