Re: [theano-users] Re: IfElse GPU version

Šarūnas S . Sat, 25 Mar 2017 02:22:31 -0700

Nouiz sorry I understand what you were refering by is constant. I've 
mislead you with my example.


This is a more realistic example:

import theano as th
import theano.tensor as T

allowed_branch = th.shared( np.cast['float32']( 0 ) )

x = T.matrix('x')
y = T.matrix('y')
f = x ** 2 + y ** 2 + 2 * x * y  
 
result = th.ifelse.ifelse( T.gt( allowed_branch, T.constant( 0 ) ), f, 
T.zeros( (2,2) ) )
                        


I am working on a realtime system which in a given situation will 
constructs a relevant computational graph, compute its result and display 
it. 
However, the graphs are relatively big and each of their compilation takes 
too long so I cant compile realtime. Thus I have to somehow precompile. 

I have decided to precompile a general graph in which all the possible 
graphs are nested. Then during realtime I would set which parts of the 
general graph to use using the *allowed_branch* variables and *if* nodes. 
Since afaik ifs are evaluated lazily in each case I would only be using the 
relevant part of the graph so my computational cost is minimal.


On Saturday, 25 March 2017 10:04:21 UTC+1, Šarūnas S. wrote:
>
> I suspect that ifelse is running on GPU because this is the profile I get
>
> ==================
>   Message: Sum of all(44) printed profiles at exit excluding Scan op 
> profile.
>   Time in 95 calls to Function.__call__: 2.309995e-01s
>   Time in Function.fn.__call__: 2.299995e-01s (99.567%)
>   Time in thunks: 2.307765e-01s (99.903%)
>   Total compile time: 1.360100e+01s
>     Number of Apply nodes: 416
>     Theano Optimizer time: 6.314001e+00s
>        Theano validate time: 9.200015e-01s
>     Theano Linker time (includes C, CUDA code generation/compiling): 
> 1.169000e+00s
>        Import time 2.799892e-02s
>        Node make_thunk time 1.108999e+00s
>            Node GpuElemwise{Composite{(i0 * ((i1 * i2) + (i1 * i3)))}}[(0, 
> 2)](CudaNdarrayConstant{0.5}, CudaNdarrayConstant{0.833333313465}, 
> GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0) time 6.999969e-03s
>            Node GpuElemwise{Composite{(-minimum(i0, maximum(minimum(i0, 
> (maximum((i1 - i2), i3) + i2)), (((i1 + i2) * i4) + 
> i1))))},no_inplace}(<CudaNdarrayType(float32, scalar)>, 
> <CudaNdarrayType(float32, scalar)>, <CudaNdarrayType(float32, scalar)>, 
> CudaNdarrayConstant{120.0}, <CudaNdarrayType(float32, scalar)>) time 
> 4.999876e-03s
>            Node GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, 
> matrix)>, GpuElemwise{TrueDiv}[(0, 0)].0) time 4.000187e-03s
>            Node HostFromGpu(<CudaNdarrayType(float32, scalar)>) time 
> 3.999949e-03s
>            Node GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0, 
> GpuDimShuffle{x,0}.0) time 3.999949e-03s
>
> Time in all call to theano.grad() 0.000000e+00s
> Time since theano import 28.959s
> Class
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> 
> <Class name>
>   55.4%    55.4%       0.128s       8.71e-05s     C     1468     301   
> theano.sandbox.cuda.basic_ops.GpuElemwise
>   25.6%    81.0%       0.059s       1.03e-04s     C      571     106   
> theano.sandbox.cuda.basic_ops.GpuCAReduce
>    9.1%    90.1%       0.021s       3.72e-05s     C      564     150   
> theano.sandbox.cuda.basic_ops.HostFromGpu
>    5.6%    95.7%       0.013s       6.04e-06s     Py    2148     168   
> theano.ifelse.IfElse
>    3.5%    99.1%       0.008s       2.16e-04s     C       37       4   
> theano.compile.ops.DeepCopyOp
>    0.4%    99.6%       0.001s       1.60e-06s     C      623     122   
> theano.sandbox.cuda.basic_ops.GpuDimShuffle
>    0.4%   100.0%       0.001s       1.97e-06s     C      506     110   
> theano.tensor.elemwise.Elemwise
>    ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)
>
> Ops
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op 
> name>
>   16.9%    16.9%       0.039s       1.22e-04s     C      319       58   
> GpuElemwise{mul,no_inplace}
>   10.0%    26.9%       0.023s       1.49e-04s     C      155       30   
> GpuCAReduce{add}{1,0}
>    9.1%    36.0%       0.021s       3.72e-05s     C      564      150   
> HostFromGpu
>    8.2%    44.2%       0.019s       1.23e-04s     C      154       30   
> GpuCAReduce{add}{0,1}
>    6.9%    51.1%       0.016s       6.61e-05s     C      242       44   
> GpuElemwise{Mul}[(0, 1)]
>    6.5%    57.6%       0.015s       6.20e-05s     C      242       44   
> GpuElemwise{maximum,no_inplace}
>    6.5%    64.1%       0.015s       6.19e-05s     C      242       44   
> GpuCAReduce{maximum}{1}
>    5.6%    69.7%       0.013s       6.04e-06s     Py    2148      168   
> if{inplace,gpu}
>    3.5%    73.2%       0.008s       5.59e-05s     C      143       26   
> GpuElemwise{TrueDiv}[(0, 0)]
>    3.5%    76.7%       0.008s       2.16e-04s     C       37        4   
> DeepCopyOp
>    2.6%    79.3%       0.006s       8.95e-05s     C       67       16   
> GpuElemwise{Mul}[(0, 2)]
>    2.2%    81.4%       0.005s       1.25e-04s     C       40        4   
> GpuElemwise{Maximum}[(0, 0)]
>    1.7%    83.2%       0.004s       2.00e-04s     C       20        2   
> GpuElemwise{Composite{maximum(i0, maximum(i1, maximum(i2, i3)))}}[(0, 0)]
>    1.7%    84.9%       0.004s       4.93e-04s     C        8        8   
> GpuElemwise{neg,no_inplace}
>    1.3%    86.2%       0.003s       1.36e-04s     C       22        4   
> GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}
>    1.3%    87.5%       0.003s       2.50e-04s     C       12        3   
> GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2), 
> i3) + i2)), ((i4 * i5) + i1)))}}[(0, 4)]
>    1.3%    88.8%       0.003s       9.08e-05s     C       33        6   
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)]
>    0.9%    89.6%       0.002s       3.03e-05s     C       66       12   
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)]
>    0.9%    90.5%       0.002s       1.00e-04s     C       20        2   
> GpuCAReduce{add}{1,1}
>    0.9%    91.4%       0.002s       2.50e-04s     C        8        3   
> GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2), 
> i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}
>    ... (remaining 28 Ops account for   8.62%(0.02s) of the runtime)
>
> Apply
> ------
> <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
>    1.7%     1.7%       0.004s       4.00e-04s     10   365   
> GpuElemwise{Maximum}[(0, 0)](if{inplace,gpu}.0, if{inplace,gpu}.0)
>    1.3%     3.0%       0.003s       3.00e-04s     10   105   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{TrueDiv}[(0, 0)].0)
>    1.3%     4.3%       0.003s       3.00e-04s     10   356   
> GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0, GpuDimShuffle{0,x}.0)
>    1.3%     5.6%       0.003s       3.00e-04s     10   143   
> GpuCAReduce{add}{1,0}(GpuElemwise{mul,no_inplace}.0)
>    1.3%     6.9%       0.003s       3.00e-04s     10   112   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{TrueDiv}[(0, 0)].0)
>    1.3%     8.2%       0.003s       3.00e-04s     10   169   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
>    1.3%     9.5%       0.003s       3.00e-04s     10   136   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{TrueDiv}[(0, 0)].0)
>    1.3%    10.8%       0.003s       3.00e-04s     10   217   
> GpuCAReduce{add}{0,1}(GpuElemwise{mul,no_inplace}.0)
>    1.3%    12.1%       0.003s       3.00e-04s     10   184   
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)](GpuElemwise{TrueDiv}[(0, 
> 0)].0, GpuElemwise{maximum,no_inplace}.0, GpuElemwise{add,no_inplace}.0)
>    1.3%    13.4%       0.003s       5.96e-04s      5     1   
> HostFromGpu(GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, 
> (maximum((i1 - i2), i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}.0)
>    0.9%    14.3%       0.002s       1.69e-04s     12     0   
> DeepCopyOp(<CudaNdarrayType(float32, scalar)>)
>    0.9%    15.2%       0.002s       2.00e-04s     10   148   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
>    0.9%    16.0%       0.002s       2.00e-04s     10   153   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
>    0.9%    16.9%       0.002s       2.00e-04s     10   126   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{TrueDiv}[(0, 0)].0)
>    0.9%    17.8%       0.002s       2.00e-04s     10   412   
> GpuCAReduce{add}{1,1}(GpuElemwise{Composite{(((i0 + i1) + i2) + i3)}}[(0, 
> 0)].0)
>    0.9%    18.6%       0.002s       2.00e-04s     10   103   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{TrueDiv}[(0, 0)].0)
>    0.9%    19.5%       0.002s       2.00e-04s     10    89   
> GpuElemwise{TrueDiv}[(0, 0)](GpuElemwise{maximum,no_inplace}.0, 
> GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}.0)
>    0.9%    20.4%       0.002s       2.00e-04s     10     3   
> GpuElemwise{maximum,no_inplace}(<CudaNdarrayType(float32, col)>, 
> CudaNdarrayConstant{[[ 0.001]]})
>    0.9%    21.2%       0.002s       2.00e-04s     10   134   
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
> GpuElemwise{TrueDiv}[(0, 0)].0)
>    0.9%    22.1%       0.002s       2.00e-04s     10   300   
> GpuElemwise{Mul}[(0, 1)](GpuElemwise{Composite{minimum(i0, 
> maximum(minimum(i0, (maximum((i1 - i2), i3) + i2)), ((i4 * i5) + 
> i1)))},no_inplace}.0, GpuDimShuffle{x,0}.0)
>    ... (remaining 941 Apply instances account for 77.89%(0.18s) of the 
> runtime)
>
> Here are tips to potentially make your code run faster
>                  (if you think of new ones, suggest them on the mailing 
> list).
>                  Test them first, as they are not guaranteed to always 
> provide a speedup.
>   Sorry, no tip for today.
>
> And as you see ifelse is being shown as a PY operation which I would 
> presume run on CPU. So where does it run? Also, what do you mean by add a 
> condition is constant? 
>
>
>
>
>
>
>
>
>
>
> P.S In case you need  these are my Theano flags
>
> os.environ['THEANO_FLAGS'] = 
> ",optimizer=fast_run,floatX=float32,device=gpu,linker=cvm"
> os.environ['THEANO_FLAGS'] += ',allow_gc=False,'
> os.environ['THEANO_FLAGS'] += ',lib.cnmem=0.3'
> os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
> os.environ['THEANO_FLAGS'] += ',profile=true'
>
>
> On Friday, 24 March 2017 23:09:11 UTC+1, nouiz wrote:
>>
>> What tell you the ifelse is on the CPU?
>>
>> Anyway, add the condition is constant Theano will remove it during the 
>> compilation.
>>
>> Fred
>>
>> Le ven. 24 mars 2017 12:41, Šarūnas S. <[email protected]> a écrit :
>>
>>> Please find a code example:
>>>
>>> import theano as th
>>> import theano.tensor as T
>>>
>>> retval = th.ifelse.ifelse( T.gt(T.constant(2.0),T.constant(1.0)), T.ones
>>> ((500,1)),T.zeros((250,1)))
>>>
>>> On Friday, 24 March 2017 17:33:59 UTC+1, Šarūnas S. wrote:
>>>>
>>>> I am using theano version 0.9.0.rc2.dev version.
>>>>
>>>>
>>>>
>>>> On Friday, 24 March 2017 17:32:33 UTC+1, Šarūnas S. wrote:
>>>>>
>>>>> In my graph I have a few IfElse nodes and I am wondering how and where 
>>>>> they are executed. 
>>>>>
>>>>> At first I ran the code with linker=cvm in my THEANO_FLAGS but after 
>>>>> profiling it looked like the ifelse is being executed on the CPU. Then I 
>>>>> forced the linker=c to check whether the IfElse will go through and I got 
>>>>> the NotImplementedError: if{inplace, gpu} cannot produce C code. Btw 
>>>>> removing inline optimization did not help as it still gave the same 
>>>>> error. 
>>>>>
>>>>> So does IfElse have a GPU implementation? If yes how do I use it? 
>>>>> Also, does it do lazy evaluation or not? 
>>>>>
>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: IfElse GPU version

Reply via email to