Re: [theano-users] GPU spends most of the time in GpuAlloc

Šarūnas S . Tue, 21 Feb 2017 23:16:55 -0800

Sorry for late reply. Thanks for having a look into that. 

By type do you mean dimensions or what? 
cases is a square matrix (eg. with shape (32,32) stored as a shared variable
reach is a column vector (eg. with shape (32,1)) which is a resulg of the 
graph creation where a numpy vector is multiplied with tensors.


I am slighly unsure but how would you do this with broadcasting? I need to 
multiply each row of cases with reach column.


On Tuesday, 21 February 2017 23:44:42 UTC+1, nouiz wrote:
>
> I discussed this with @lamblin. We could do an optimization to fix this, 
> but it would be a very narrow special case. We won't do it in the short 
> term. But you can manually do it yourself. Instead of calling tile, you can 
> reshape cases[group] and reach to 3d tensor with the right dimensions set 
> as broadcastable. This would allow you to do what you want efficently 
> without having alloc in the graph. This is a very good use of broadcasting.
>
> Frédéric
>
> On Wed, Feb 15, 2017 at 12:16 PM Frédéric Bastien <[email protected] 
> <javascript:>> wrote:
>
>> tile generate alloc. To help you about the broadcasting I need more 
>> information.
>>
>> what is:
>> cases.type?
>> reach.type?
>>
>> Fred
>> On Tue, Feb 7, 2017 at 4:51 PM Frédéric Bastien <[email protected] 
>> <javascript:>> wrote:
>>
>>> There is a high quantity of GpuAlloc. What you have shown don't tell us 
>>> what need it in Theano. Can you run the theano function with profiling, and 
>>> before the script end call theano.debugprint(your_theano_function) and send 
>>> this output? It will tell us what need it in the graph.
>>>
>>> On Fri, Feb 3, 2017 at 4:22 AM Šarūnas S. <[email protected] 
>>> <javascript:>> wrote:
>>>
>>>> I wrote a script in theano and started profiling it. What I noticed is 
>>>> GPU spends most of the time in GpuAlloc . 
>>>>
>>>> Could somebody explain me why this is happening and how I could reduce 
>>>> it?
>>>> In C or C++ I would preallocate it, but not sure how to do this in 
>>>> theano.   
>>>>
>>>> I am running on Windows 8.1 with Nvidia GTX 1070 with Theano 
>>>> @ 0.9.0dev4.dev-3c0be3d94102ac6864b2e5ab52ae96d07c6375c6 
>>>>
>>>>
>>>> I am attaching extensive profile result below:
>>>>
>>>> Function profiling
>>>> ==================
>>>>   Message: Sum of all(2) printed profiles at exit excluding Scan op 
>>>> profile.
>>>>   Time in 200 calls to Function.__call__: 3.463001e+00s
>>>>   Time in Function.fn.__call__: 3.451001e+00s (99.653%)
>>>>   Time in thunks: 3.425293e+00s (98.911%)
>>>>   Total compile time: 1.413800e+01s
>>>>     Number of Apply nodes: 590
>>>>     Theano Optimizer time: 1.158200e+01s
>>>>        Theano validate time: 9.390018e-01s
>>>>     Theano Linker time (includes C, CUDA code generation/compiling): 
>>>> 2.107000e+00s
>>>>        Import time 3.500128e-02s
>>>>        Node make_thunk time 2.042000e+00s
>>>>            Node GpuCAReduce{add}{0,1}(GpuElemwise{Composite{(i0 * (i1 * 
>>>> i2))}}[(0, 2)].0) time 9.000063e-03s
>>>>            Node GpuCAReduce{add}{0,1}(GpuElemwise{Mul}[(0, 1)].0) time 
>>>> 7.999897e-03s
>>>>            Node GpuDimShuffle{0,x}(GpuCAReduce{add}{0,1}.0) time 
>>>> 6.999969e-03s
>>>>            Node Shape_i{1}(<CudaNdarrayType(float32, matrix)>) time 
>>>> 4.999876e-03s
>>>>            Node GpuElemwise{Mul}[(0, 1)](CudaNdarrayConstant{[[ 240.
>>>> ]]}, GpuDimShuffle{0,x}.0) time 4.999876e-03s
>>>>
>>>>
>>>> Time in all call to theano.grad() 0.000000e+00s
>>>> Time since theano import 41.580s
>>>> Class
>>>> ---
>>>> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> 
>>>> <Class name>
>>>>   90.5%    90.5%       3.100s       3.37e-04s     C     9200      92   
>>>> theano.sandbox.cuda.basic_ops.GpuAlloc
>>>>    7.4%    97.9%       0.254s       4.19e-06s     C    60600     606   
>>>> theano.sandbox.cuda.basic_ops.GpuElemwise
>>>>    1.0%    98.9%       0.034s       2.77e-06s     C    12200     122   
>>>> theano.sandbox.cuda.basic_ops.GpuCAReduce
>>>>    0.5%    99.4%       0.017s       1.84e-06s     C     9200      92   
>>>> theano.sandbox.cuda.basic_ops.GpuReshape
>>>>    0.5%    99.9%       0.016s       7.45e-07s     C    21400     214   
>>>> theano.sandbox.cuda.basic_ops.GpuDimShuffle
>>>>    0.1%    99.9%       0.003s       1.57e-06s     C     1900      19   
>>>> theano.tensor.elemwise.Elemwise
>>>>    0.1%   100.0%       0.002s       5.24e-07s     C     3800      38   
>>>> theano.compile.ops.Shape_i
>>>>    0.0%   100.0%       0.000s       0.00e+00s     C     1900      19   
>>>> theano.tensor.opt.MakeVector
>>>>    ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)
>>>>
>>>>
>>>> Ops
>>>> ---
>>>> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> 
>>>> <Op name>
>>>>   90.5%    90.5%       3.100s       3.37e-04s     C     9200       92 
>>>>   GpuAlloc
>>>>    1.7%    92.2%       0.058s       4.41e-06s     C     13100      131 
>>>>   GpuElemwise{Mul}[(0, 1)]
>>>>    1.0%    93.2%       0.034s       3.21e-06s     C     10600      106 
>>>>   GpuElemwise{maximum,no_inplace}
>>>>    1.0%    94.2%       0.034s       2.77e-06s     C     12200      122 
>>>>   GpuCAReduce{add}{0,1}
>>>>    0.7%    94.8%       0.023s       3.54e-06s     C     6500       65 
>>>>   GpuElemwise{Composite{maximum(((i0 + i1) - i2), i3)}}[(0, 0)]
>>>>    0.5%    95.4%       0.018s       3.27e-06s     C     5500       55 
>>>>   GpuElemwise{mul,no_inplace}
>>>>    0.5%    95.9%       0.018s       4.61e-06s     C     3900       39 
>>>>   GpuElemwise{Composite{((i0 * i1) / i2)}}[(0, 1)]
>>>>    0.5%    96.4%       0.017s       1.84e-06s     C     9200       92 
>>>>   GpuReshape{2}
>>>>    0.4%    96.8%       0.014s       4.33e-06s     C     3200       32 
>>>>   GpuElemwise{Composite{(i0 * (i1 * i2))}}[(0, 2)]
>>>>    0.2%    97.0%       0.008s       8.69e-07s     C     9200       92 
>>>>   GpuDimShuffle{1,0}
>>>>    0.2%    97.3%       0.008s       5.33e-06s     C     1500       15 
>>>>   GpuElemwise{Composite{((i0 * i1) / i2)},no_inplace}
>>>>    0.2%    97.5%       0.008s       6.52e-07s     C     12200      122 
>>>>   GpuDimShuffle{0,x}
>>>>    0.2%    97.7%       0.007s       4.38e-06s     C     1600       16 
>>>>   GpuElemwise{Composite{(((i0 * i1 * maximum(i2, i3)) / (maximum(i2, i3
>>>> ) + maximum(i4, i3))) + ((i5 * i6 * maximum(i4, i3
>>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] GPU spends most of the time in GpuAlloc

Reply via email to