[theano-users] Re: Some help optimizing a function involving 1D dot products for multidimensional tensors
If I'm understanding your code correctly, you should be able to use tensordot http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot rather than doing the multiply and sum. On Thursday, March 16, 2017 at 10:59:14 AM UTC-4, Eelke Spaak wrote: > > Apologies for the messed up profiling code, here is attempt 2: > > Class > --- > <% time> <#call> <#apply> > > 46.2%46.2% 10.971s 2.74e-05s C 400764 42 > theano.sandbox.cuda.basic_ops.GpuElemwise > 29.9%76.0% 7.098s 3.72e-05s C 190840 20 > theano.sandbox.cuda.basic_ops.GpuCAReduce >7.2%83.2% 1.699s 1.48e-05s C 114504 12 > theano.sandbox.cuda.blas.GpuDot22 >3.8%87.0% 0.911s 4.78e-05s C19084 2 > theano.sandbox.cuda.basic_ops.GpuJoin >3.8%90.9% 0.907s 5.59e-06s C 162214 17 > theano.sandbox.cuda.basic_ops.GpuFromHost >2.9%93.8% 0.700s 1.05e-05s C66794 7 > theano.sandbox.cuda.basic_ops.HostFromGpu >2.1%95.9% 0.501s 1.14e-06s C 438932 46 > theano.sandbox.cuda.basic_ops.GpuReshape >1.5%97.4% 0.348s 1.46e-06s C 238550 25 > theano.tensor.elemwise.Elemwise >1.4%98.7% 0.327s 3.43e-05s C 9542 1 > theano.sandbox.cuda.blas.GpuGemv >0.4%99.2% 0.097s 9.28e-07s C 104962 11 > theano.sandbox.cuda.basic_ops.GpuDimShuffle >0.3%99.5% 0.081s 1.06e-06s C76336 8 > theano.sandbox.cuda.basic_ops.GpuSubtensor >0.2%99.7% 0.042s 4.35e-06s C 9542 1 > theano.tensor.basic.Join >0.1%99.8% 0.033s 8.62e-07s C38168 4 > theano.tensor.elemwise.DimShuffle >0.1%99.9% 0.019s 9.75e-07s C19084 2 > theano.tensor.subtensor.Subtensor >0.1%99.9% 0.015s 1.54e-06s C 9542 1 > theano.sandbox.cuda.basic_ops.GpuAllocEmpty >0.1% 100.0% 0.012s 6.46e-07s C19084 2 > theano.compile.ops.ViewOp >... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) > > Ops > --- > <% time> <#call> <#apply> name> > 24.7%24.7% 5.860s 6.14e-05s C 95420 10 > GpuElemwise{mul,no_inplace} > 17.6%42.2% 4.173s 1.09e-04s C 381684 > GpuCAReduce{add}{1,1,1} >7.2%49.4% 1.699s 1.48e-05s C 114504 12 > GpuDot22 >4.1%53.5% 0.974s 2.55e-05s C 381684 > GpuCAReduce{add}{0,1,0} >4.1%57.6% 0.972s 2.55e-05s C 381684 > GpuCAReduce{add}{0,1} >3.8%61.4% 0.911s 4.78e-05s C 190842 > GpuJoin >3.8%65.2% 0.907s 5.59e-06s C 162214 17 > GpuFromHost >2.9%68.2% 0.700s 1.05e-05s C 667947 > HostFromGpu >2.6%70.7% 0.611s 6.40e-05s C 95421 > GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3}}[(0, 2)] >2.1%72.9% 0.503s 5.28e-05s C 95421 > GpuElemwise{Composite{((i0 * i1) - scalar_softplus(i1))},no_inplace} >2.0%74.8% 0.468s 4.91e-05s C 95421 > GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, 1)] >1.9%76.7% 0.444s 1.16e-05s C 381684 > GpuCAReduce{add}{0,1,1} >1.7%78.4% 0.404s 4.24e-05s C 95421 > GpuElemwise{Composite{((i0 + i1) + i2)}}[(0, 1)] >1.4%79.8% 0.327s 3.43e-05s C 95421 > GpuGemv{inplace} >1.4%81.1% 0.322s 1.69e-05s C 190842 > GpuCAReduce{add}{0,0,1} >1.3%82.4% 0.313s 1.09e-05s C 286263 > GpuElemwise{Composite{((i0 * i1) + i2)}}[(0, 2)] >1.0%83.5% 0.246s 1.29e-05s C 190842 > GpuElemwise{scalar_sigmoid,no_inplace} >0.9%84.4% 0.221s 1.16e-06s C 190840 20 > GpuReshape{3} >0.9%85.3% 0.219s 1.15e-06s C 190840 20 > GpuReshape{2} >0.9%86.2% 0.214s 1.12e-05s C 190842 > GpuElemwise{Composite{(i0 + (i1 * sqr(i2)))},no_inplace} >... (remaining 49 Ops account for 13.76%(3.27s) of the runtime) > > Apply > -- > <% time><#call> > 16.3%16.3% 3.882s 4.07e-04s 9542 165 > GpuCAReduce{add}{1,1,1}(GpuElemwise{Composite{((i0 * i1) - > scalar_softplus(i1))},no_inplace}.0) >3.4%19.7% 0.810s 8.48e-05s 9542 169 > GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ >3.4%23.1% 0.802s 8.40e-05s
[theano-users] Re: Some help optimizing a function involving 1D dot products for multidimensional tensors
Apologies for the messed up profiling code, here is attempt 2: Class --- <% time> <#call> <#apply> 46.2%46.2% 10.971s 2.74e-05s C 400764 42 theano.sandbox.cuda.basic_ops.GpuElemwise 29.9%76.0% 7.098s 3.72e-05s C 190840 20 theano.sandbox.cuda.basic_ops.GpuCAReduce 7.2%83.2% 1.699s 1.48e-05s C 114504 12 theano.sandbox.cuda.blas.GpuDot22 3.8%87.0% 0.911s 4.78e-05s C19084 2 theano.sandbox.cuda.basic_ops.GpuJoin 3.8%90.9% 0.907s 5.59e-06s C 162214 17 theano.sandbox.cuda.basic_ops.GpuFromHost 2.9%93.8% 0.700s 1.05e-05s C66794 7 theano.sandbox.cuda.basic_ops.HostFromGpu 2.1%95.9% 0.501s 1.14e-06s C 438932 46 theano.sandbox.cuda.basic_ops.GpuReshape 1.5%97.4% 0.348s 1.46e-06s C 238550 25 theano.tensor.elemwise.Elemwise 1.4%98.7% 0.327s 3.43e-05s C 9542 1 theano.sandbox.cuda.blas.GpuGemv 0.4%99.2% 0.097s 9.28e-07s C 104962 11 theano.sandbox.cuda.basic_ops.GpuDimShuffle 0.3%99.5% 0.081s 1.06e-06s C76336 8 theano.sandbox.cuda.basic_ops.GpuSubtensor 0.2%99.7% 0.042s 4.35e-06s C 9542 1 theano.tensor.basic.Join 0.1%99.8% 0.033s 8.62e-07s C38168 4 theano.tensor.elemwise.DimShuffle 0.1%99.9% 0.019s 9.75e-07s C19084 2 theano.tensor.subtensor.Subtensor 0.1%99.9% 0.015s 1.54e-06s C 9542 1 theano.sandbox.cuda.basic_ops.GpuAllocEmpty 0.1% 100.0% 0.012s 6.46e-07s C19084 2 theano.compile.ops.ViewOp ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) Ops --- <% time> <#call> <#apply> 24.7%24.7% 5.860s 6.14e-05s C 95420 10 GpuElemwise{mul,no_inplace} 17.6%42.2% 4.173s 1.09e-04s C 381684 GpuCAReduce{add}{1,1,1} 7.2%49.4% 1.699s 1.48e-05s C 114504 12 GpuDot22 4.1%53.5% 0.974s 2.55e-05s C 381684 GpuCAReduce{add}{0,1,0} 4.1%57.6% 0.972s 2.55e-05s C 381684 GpuCAReduce{add}{0,1} 3.8%61.4% 0.911s 4.78e-05s C 190842 GpuJoin 3.8%65.2% 0.907s 5.59e-06s C 162214 17 GpuFromHost 2.9%68.2% 0.700s 1.05e-05s C 667947 HostFromGpu 2.6%70.7% 0.611s 6.40e-05s C 95421 GpuElemwise{Composite{(i0 + (-scalar_sigmoid(((i1 + i2) + i3}}[(0, 2)] 2.1%72.9% 0.503s 5.28e-05s C 95421 GpuElemwise{Composite{((i0 * i1) - scalar_softplus(i1))},no_inplace} 2.0%74.8% 0.468s 4.91e-05s C 95421 GpuElemwise{Composite{(i0 + (-scalar_sigmoid(i1)))}}[(0, 1)] 1.9%76.7% 0.444s 1.16e-05s C 381684 GpuCAReduce{add}{0,1,1} 1.7%78.4% 0.404s 4.24e-05s C 95421 GpuElemwise{Composite{((i0 + i1) + i2)}}[(0, 1)] 1.4%79.8% 0.327s 3.43e-05s C 95421 GpuGemv{inplace} 1.4%81.1% 0.322s 1.69e-05s C 190842 GpuCAReduce{add}{0,0,1} 1.3%82.4% 0.313s 1.09e-05s C 286263 GpuElemwise{Composite{((i0 * i1) + i2)}}[(0, 2)] 1.0%83.5% 0.246s 1.29e-05s C 190842 GpuElemwise{scalar_sigmoid,no_inplace} 0.9%84.4% 0.221s 1.16e-06s C 190840 20 GpuReshape{3} 0.9%85.3% 0.219s 1.15e-06s C 190840 20 GpuReshape{2} 0.9%86.2% 0.214s 1.12e-05s C 190842 GpuElemwise{Composite{(i0 + (i1 * sqr(i2)))},no_inplace} ... (remaining 49 Ops account for 13.76%(3.27s) of the runtime) Apply -- <% time><#call> 16.3%16.3% 3.882s 4.07e-04s 9542 165 GpuCAReduce{add}{1,1,1}(GpuElemwise{Composite{((i0 * i1) - scalar_softplus(i1))},no_inplace}.0) 3.4%19.7% 0.810s 8.48e-05s 9542 169 GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ 3.4%23.1% 0.802s 8.40e-05s 954271 GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ 3.1%26.2% 0.730s 7.65e-05s 954270 GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ 3.0%29.2% 0.720s 7.55e-05s 9542 170 GpuElemwise{mul,no_inplace}(GpuDimShuffle{0,x,1,2}.0, CudaNdarrayConstant{ 2.9%32.1% 0.692s 7.25e-05s 954247