If I'm understanding your code correctly, you should be able to use tensordot http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot rather than doing the multiply and sum.
On Thursday, March 16, 2017 at 10:59:14 AM UTC-4, Eelke Spaak wrote: > > Apologies for the messed up profiling code, here is attempt 2: > > Class > --- > <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> > <Class name> > 46.2% 46.2% 10.971s 2.74e-05s C 400764 42 > theano.sandbox.cuda.basic_ops.GpuElemwise > 29.9% 76.0% 7.098s 3.72e-05s C 190840 20 > theano.sandbox.cuda.basic_ops.GpuCAReduce > 7.2% 83.2% 1.699s 1.48e-05s C 114504 12 > theano.sandbox.cuda.blas.GpuDot22 > 3.8% 87.0% 0.911s 4.78e-05s C 19084 2 > theano.sandbox.cuda.basic_ops.GpuJoin > 3.8% 90.9% 0.907s 5.59e-06s C 162214 17 > theano.sandbox.cuda.basic_ops.GpuFromHost > 2.9% 93.8% 0.700s 1.05e-05s C 66794 7 > theano.sandbox.cuda.basic_ops.HostFromGpu > 2.1% 95.9% 0.501s 1.14e-06s C 438932 46 > theano.sandbox.cuda.basic_ops.GpuReshape > 1.5% 97.4% 0.348s 1.46e-06s C 238550 25 > theano.tensor.elemwise.Elemwise > 1.4% 98.7% 0.327s 3.43e-05s C 9542 1 > theano.sandbox.cuda.blas.GpuGemv > 0.4% 99.2% 0.097s 9.28e-07s C 104962 11 > theano.sandbox.cuda.basic_ops.GpuDimShuffle > 0.3% 99.5% 0.081s 1.06e-06s C 76336 8 > theano.sandbox.cuda.basic_ops.GpuSubtensor > 0.2% 99.7% 0.042s 4.35e-06s C 9542 1 > theano.tensor.basic.Join > 0.1% 99.8% 0.033s 8.62e-07s C 38168 4 > theano.tensor.elemwise.DimShuffle > 0.1% 99.9% 0.019s 9.75e-07s C 19084 2 > theano.tensor.subtensor.Subtensor > 0.1% 99.9% 0.015s 1.54e-06s C 9542 1 > theano.sandbox.cuda.basic_ops.GpuAllocEmpty > 0.1% 100.0% 0.012s 6.46e-07s C 19084 2 > theano.compile.ops.ViewOp > ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) > > Ops > --- > <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op > name> > 24.7% 24.7% 5.860s 6.14e-05s C 95420 10 > GpuElemwise{mul,no_inplace} > 17.6% 42.2% 4.173s 1.09e-04s C 38168 4 > GpuCAReduce{add}{1,1,1} > 7.2% 49.4% 1.699s 1.48e-05s C 114504 12 > GpuDot22 > 4.1% 53.5% 0.974s 2.55e-05s C 38168 4 > GpuCAReduce{add}{0,1,0} > 4.1% 57.6% 0.972s 2.55e-05s C 38168 4 > GpuCAReduce{add}{0,1} > 3.8% 61.4% 0.911s 4.78e-05s C 19084 2 > GpuJoin > 3.8% 65.2% 0.907s 5.59e-06s C 162214 17 > GpuFromHost > 2.9% 68.2% 0.700s 1.05e-05s C 66794 7 > HostFromGpu > 2.6% 70.7% 0.611s 6.40e-05s C 9542 1 > GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3))))}}[(0, 2)] > 2.1% 72.9% 0.503s 5.28e-05s C 9542 1 > GpuElemwise{Composite{((i0 * i1) - scalar_softplus(i1))},no_inplace} > 2.0% 74.8% 0.468s 4.91e-05s C 9542 1 > GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, 1)] > 1.9% 76.7% 0.444s 1.16e-05s C 38168 4 > GpuCAReduce{add}{0,1,1} > 1.7% 78.4% 0.404s 4.24e-05s C 9542 1 > GpuElemwise{Composite{((i0 + i1) + i2)}}[(0, 1)] > 1.4% 79.8% 0.327s 3.43e-05s C 9542 1 > GpuGemv{inplace} > 1.4% 81.1% 0.322s 1.69e-05s C 19084 2 > GpuCAReduce{add}{0,0,1} > 1.3% 82.4% 0.313s 1.09e-05s C 28626 3 > GpuElemwise{Composite{((i0 * i1) + i2)}}[(0, 2)] > 1.0% 83.5% 0.246s 1.29e-05s C 19084 2 > GpuElemwise{scalar_sigmoid,no_inplace} > 0.9% 84.4% 0.221s 1.16e-06s C 190840 20 > GpuReshape{3} > 0.9% 85.3% 0.219s 1.15e-06s C 190840 20 > GpuReshape{2} > 0.9% 86.2% 0.214s 1.12e-05s C 19084 2 > GpuElemwise{Composite{(i0 + (i1 * sqr(i2)))},no_inplace} > ... (remaining 49 Ops account for 13.76%(3.27s) of the runtime) > > Apply > ------ > <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name> > 16.3% 16.3% 3.882s 4.07e-04s 9542 165 > GpuCAReduce{add}{1,1,1}(GpuElemwise{Composite{((i0 * i1) - > scalar_softplus(i1))},no_inplace}.0) > 3.4% 19.7% 0.810s 8.48e-05s 9542 169 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ > 3.4% 23.1% 0.802s 8.40e-05s 9542 71 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ > 3.1% 26.2% 0.730s 7.65e-05s 9542 70 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ > 3.0% 29.2% 0.720s 7.55e-05s 9542 170 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ > 2.9% 32.1% 0.692s 7.25e-05s 9542 47 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{ > 2.9% 35.0% 0.681s 7.13e-05s 9542 134 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{ > 2.6% 37.6% 0.611s 6.40e-05s 9542 63 > GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3))))}}[(0, > 2)](CudaNdarrayConstant{ > 2.6% 40.1% 0.608s 6.37e-05s 9542 46 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{ > 2.5% 42.7% 0.603s 6.32e-05s 9542 135 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,1,2,x}.0, CudaNdarrayConstant{ > 2.1% 44.8% 0.503s 5.28e-05s 9542 161 > GpuElemwise{Composite{((i0 * i1) - > scalar_softplus(i1))},no_inplace}(CudaNdarrayConstant{ > 2.0% 46.8% 0.468s 4.91e-05s 9542 163 > GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, > 1)](CudaNdarrayConstant{[[[ 0. 0. 0. ..., 0. 0. 0.] > > ... > > ... (remaining 181 Apply instances account for 42.13%(10.01s) of the > runtime) > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.