I discussed this with @lamblin. We could do an optimization to fix this, but it would be a very narrow special case. We won't do it in the short term. But you can manually do it yourself. Instead of calling tile, you can reshape cases[group] and reach to 3d tensor with the right dimensions set as broadcastable. This would allow you to do what you want efficently without having alloc in the graph. This is a very good use of broadcasting.
Frédéric On Wed, Feb 15, 2017 at 12:16 PM Frédéric Bastien < [email protected]> wrote: > tile generate alloc. To help you about the broadcasting I need more > information. > > what is: > cases.type? > reach.type? > > Fred > On Tue, Feb 7, 2017 at 4:51 PM Frédéric Bastien < > [email protected]> wrote: > > There is a high quantity of GpuAlloc. What you have shown don't tell us > what need it in Theano. Can you run the theano function with profiling, and > before the script end call theano.debugprint(your_theano_function) and send > this output? It will tell us what need it in the graph. > > On Fri, Feb 3, 2017 at 4:22 AM Šarūnas S. <[email protected]> wrote: > > I wrote a script in theano and started profiling it. What I noticed is GPU > spends most of the time in GpuAlloc . > > Could somebody explain me why this is happening and how I could reduce it? > In C or C++ I would preallocate it, but not sure how to do this in theano. > > > I am running on Windows 8.1 with Nvidia GTX 1070 with Theano > @ 0.9.0dev4.dev-3c0be3d94102ac6864b2e5ab52ae96d07c6375c6 > > > I am attaching extensive profile result below: > > Function profiling > ================== > Message: Sum of all(2) printed profiles at exit excluding Scan op > profile. > Time in 200 calls to Function.__call__: 3.463001e+00s > Time in Function.fn.__call__: 3.451001e+00s (99.653%) > Time in thunks: 3.425293e+00s (98.911%) > Total compile time: 1.413800e+01s > Number of Apply nodes: 590 > Theano Optimizer time: 1.158200e+01s > Theano validate time: 9.390018e-01s > Theano Linker time (includes C, CUDA code generation/compiling): > 2.107000e+00s > Import time 3.500128e-02s > Node make_thunk time 2.042000e+00s > Node GpuCAReduce{add}{0,1}(GpuElemwise{Composite{(i0 * (i1 * i2 > ))}}[(0, 2)].0) time 9.000063e-03s > Node GpuCAReduce{add}{0,1}(GpuElemwise{Mul}[(0, 1)].0) time > 7.999897e-03s > Node GpuDimShuffle{0,x}(GpuCAReduce{add}{0,1}.0) time > 6.999969e-03s > Node Shape_i{1}(<CudaNdarrayType(float32, matrix)>) time > 4.999876e-03s > Node GpuElemwise{Mul}[(0, 1)](CudaNdarrayConstant{[[ 240.]]}, > GpuDimShuffle{0,x}.0) time 4.999876e-03s > > > Time in all call to theano.grad() 0.000000e+00s > Time since theano import 41.580s > Class > --- > <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> > <Class name> > 90.5% 90.5% 3.100s 3.37e-04s C 9200 92 > theano.sandbox.cuda.basic_ops.GpuAlloc > 7.4% 97.9% 0.254s 4.19e-06s C 60600 606 > theano.sandbox.cuda.basic_ops.GpuElemwise > 1.0% 98.9% 0.034s 2.77e-06s C 12200 122 > theano.sandbox.cuda.basic_ops.GpuCAReduce > 0.5% 99.4% 0.017s 1.84e-06s C 9200 92 > theano.sandbox.cuda.basic_ops.GpuReshape > 0.5% 99.9% 0.016s 7.45e-07s C 21400 214 > theano.sandbox.cuda.basic_ops.GpuDimShuffle > 0.1% 99.9% 0.003s 1.57e-06s C 1900 19 > theano.tensor.elemwise.Elemwise > 0.1% 100.0% 0.002s 5.24e-07s C 3800 38 > theano.compile.ops.Shape_i > 0.0% 100.0% 0.000s 0.00e+00s C 1900 19 > theano.tensor.opt.MakeVector > ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime) > > > Ops > --- > <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op > name> > 90.5% 90.5% 3.100s 3.37e-04s C 9200 92 > GpuAlloc > 1.7% 92.2% 0.058s 4.41e-06s C 13100 131 > GpuElemwise{Mul}[(0, 1)] > 1.0% 93.2% 0.034s 3.21e-06s C 10600 106 > GpuElemwise{maximum,no_inplace} > 1.0% 94.2% 0.034s 2.77e-06s C 12200 122 > GpuCAReduce{add}{0,1} > 0.7% 94.8% 0.023s 3.54e-06s C 6500 65 > GpuElemwise{Composite{maximum(((i0 + i1) - i2), i3)}}[(0, 0)] > 0.5% 95.4% 0.018s 3.27e-06s C 5500 55 > GpuElemwise{mul,no_inplace} > 0.5% 95.9% 0.018s 4.61e-06s C 3900 39 > GpuElemwise{Composite{((i0 * i1) / i2)}}[(0, 1)] > 0.5% 96.4% 0.017s 1.84e-06s C 9200 92 > GpuReshape{2} > 0.4% 96.8% 0.014s 4.33e-06s C 3200 32 > GpuElemwise{Composite{(i0 * (i1 * i2))}}[(0, 2)] > 0.2% 97.0% 0.008s 8.69e-07s C 9200 92 > GpuDimShuffle{1,0} > 0.2% 97.3% 0.008s 5.33e-06s C 1500 15 > GpuElemwise{Composite{((i0 * i1) / i2)},no_inplace} > 0.2% 97.5% 0.008s 6.52e-07s C 12200 122 > GpuDimShuffle{0,x} > 0.2% 97.7% 0.007s 4.38e-06s C 1600 16 > GpuElemwise{Composite{(((i0 * i1 * maximum(i2, i3)) / (maximum(i2, i3) + > maximum(i4, i3))) + ((i5 * i6 * maximum(i4, i3 > > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
