Re: [theano-users] Re: IfElse GPU version

Šarūnas S . Sat, 25 Mar 2017 02:05:16 -0700

I suspect that ifelse is running on GPU because this is the profile I get

==================
  Message: Sum of all(44) printed profiles at exit excluding Scan op 
profile.
  Time in 95 calls to Function.__call__: 2.309995e-01s
  Time in Function.fn.__call__: 2.299995e-01s (99.567%)
  Time in thunks: 2.307765e-01s (99.903%)
  Total compile time: 1.360100e+01s
    Number of Apply nodes: 416
    Theano Optimizer time: 6.314001e+00s
       Theano validate time: 9.200015e-01s
    Theano Linker time (includes C, CUDA code generation/compiling): 
1.169000e+00s
       Import time 2.799892e-02s
       Node make_thunk time 1.108999e+00s
           Node GpuElemwise{Composite{(i0 * ((i1 * i2) + (i1 * i3)))}}[(0, 
2)](CudaNdarrayConstant{0.5}, CudaNdarrayConstant{0.833333313465}, 
GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0) time 6.999969e-03s
           Node GpuElemwise{Composite{(-minimum(i0, maximum(minimum(i0, 
(maximum((i1 - i2), i3) + i2)), (((i1 + i2) * i4) + 
i1))))},no_inplace}(<CudaNdarrayType(float32, scalar)>, 
<CudaNdarrayType(float32, scalar)>, <CudaNdarrayType(float32, scalar)>, 
CudaNdarrayConstant{120.0}, <CudaNdarrayType(float32, scalar)>) time 
4.999876e-03s
           Node GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, 
matrix)>, GpuElemwise{TrueDiv}[(0, 0)].0) time 4.000187e-03s
           Node HostFromGpu(<CudaNdarrayType(float32, scalar)>) time 
3.999949e-03s
           Node GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0, 
GpuDimShuffle{x,0}.0) time 3.999949e-03s


Time in all call to theano.grad() 0.000000e+00s
Time since theano import 28.959s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> 
<Class name>
  55.4%    55.4%       0.128s       8.71e-05s     C     1468     301   
theano.sandbox.cuda.basic_ops.GpuElemwise
  25.6%    81.0%       0.059s       1.03e-04s     C      571     106   
theano.sandbox.cuda.basic_ops.GpuCAReduce
   9.1%    90.1%       0.021s       3.72e-05s     C      564     150   
theano.sandbox.cuda.basic_ops.HostFromGpu
   5.6%    95.7%       0.013s       6.04e-06s     Py    2148     168   
theano.ifelse.IfElse
   3.5%    99.1%       0.008s       2.16e-04s     C       37       4   
theano.compile.ops.DeepCopyOp
   0.4%    99.6%       0.001s       1.60e-06s     C      623     122   
theano.sandbox.cuda.basic_ops.GpuDimShuffle
   0.4%   100.0%       0.001s       1.97e-06s     C      506     110   
theano.tensor.elemwise.Elemwise
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op 
name>
  16.9%    16.9%       0.039s       1.22e-04s     C      319       58   
GpuElemwise{mul,no_inplace}
  10.0%    26.9%       0.023s       1.49e-04s     C      155       30   
GpuCAReduce{add}{1,0}
   9.1%    36.0%       0.021s       3.72e-05s     C      564      150   
HostFromGpu
   8.2%    44.2%       0.019s       1.23e-04s     C      154       30   
GpuCAReduce{add}{0,1}
   6.9%    51.1%       0.016s       6.61e-05s     C      242       44   
GpuElemwise{Mul}[(0, 1)]
   6.5%    57.6%       0.015s       6.20e-05s     C      242       44   
GpuElemwise{maximum,no_inplace}
   6.5%    64.1%       0.015s       6.19e-05s     C      242       44   
GpuCAReduce{maximum}{1}
   5.6%    69.7%       0.013s       6.04e-06s     Py    2148      168   
if{inplace,gpu}
   3.5%    73.2%       0.008s       5.59e-05s     C      143       26   
GpuElemwise{TrueDiv}[(0, 0)]
   3.5%    76.7%       0.008s       2.16e-04s     C       37        4   
DeepCopyOp
   2.6%    79.3%       0.006s       8.95e-05s     C       67       16   
GpuElemwise{Mul}[(0, 2)]
   2.2%    81.4%       0.005s       1.25e-04s     C       40        4   
GpuElemwise{Maximum}[(0, 0)]
   1.7%    83.2%       0.004s       2.00e-04s     C       20        2   
GpuElemwise{Composite{maximum(i0, maximum(i1, maximum(i2, i3)))}}[(0, 0)]
   1.7%    84.9%       0.004s       4.93e-04s     C        8        8   
GpuElemwise{neg,no_inplace}
   1.3%    86.2%       0.003s       1.36e-04s     C       22        4   
GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}
   1.3%    87.5%       0.003s       2.50e-04s     C       12        3   
GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2), 
i3) + i2)), ((i4 * i5) + i1)))}}[(0, 4)]
   1.3%    88.8%       0.003s       9.08e-05s     C       33        6   
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)]
   0.9%    89.6%       0.002s       3.03e-05s     C       66       12   
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)]
   0.9%    90.5%       0.002s       1.00e-04s     C       20        2   
GpuCAReduce{add}{1,1}
   0.9%    91.4%       0.002s       2.50e-04s     C        8        3   
GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2), 
i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}
   ... (remaining 28 Ops account for   8.62%(0.02s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
   1.7%     1.7%       0.004s       4.00e-04s     10   365   
GpuElemwise{Maximum}[(0, 0)](if{inplace,gpu}.0, if{inplace,gpu}.0)
   1.3%     3.0%       0.003s       3.00e-04s     10   105   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{TrueDiv}[(0, 0)].0)
   1.3%     4.3%       0.003s       3.00e-04s     10   356   
GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0, GpuDimShuffle{0,x}.0)
   1.3%     5.6%       0.003s       3.00e-04s     10   143   
GpuCAReduce{add}{1,0}(GpuElemwise{mul,no_inplace}.0)
   1.3%     6.9%       0.003s       3.00e-04s     10   112   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{TrueDiv}[(0, 0)].0)
   1.3%     8.2%       0.003s       3.00e-04s     10   169   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
   1.3%     9.5%       0.003s       3.00e-04s     10   136   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{TrueDiv}[(0, 0)].0)
   1.3%    10.8%       0.003s       3.00e-04s     10   217   
GpuCAReduce{add}{0,1}(GpuElemwise{mul,no_inplace}.0)
   1.3%    12.1%       0.003s       3.00e-04s     10   184   
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)](GpuElemwise{TrueDiv}[(0, 
0)].0, GpuElemwise{maximum,no_inplace}.0, GpuElemwise{add,no_inplace}.0)
   1.3%    13.4%       0.003s       5.96e-04s      5     1   
HostFromGpu(GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, 
(maximum((i1 - i2), i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}.0)
   0.9%    14.3%       0.002s       1.69e-04s     12     0   
DeepCopyOp(<CudaNdarrayType(float32, scalar)>)
   0.9%    15.2%       0.002s       2.00e-04s     10   148   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
   0.9%    16.0%       0.002s       2.00e-04s     10   153   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
   0.9%    16.9%       0.002s       2.00e-04s     10   126   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{TrueDiv}[(0, 0)].0)
   0.9%    17.8%       0.002s       2.00e-04s     10   412   
GpuCAReduce{add}{1,1}(GpuElemwise{Composite{(((i0 + i1) + i2) + i3)}}[(0, 
0)].0)
   0.9%    18.6%       0.002s       2.00e-04s     10   103   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{TrueDiv}[(0, 0)].0)
   0.9%    19.5%       0.002s       2.00e-04s     10    89   
GpuElemwise{TrueDiv}[(0, 0)](GpuElemwise{maximum,no_inplace}.0, 
GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}.0)
   0.9%    20.4%       0.002s       2.00e-04s     10     3   
GpuElemwise{maximum,no_inplace}(<CudaNdarrayType(float32, col)>, 
CudaNdarrayConstant{[[ 0.001]]})
   0.9%    21.2%       0.002s       2.00e-04s     10   134   
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>, 
GpuElemwise{TrueDiv}[(0, 0)].0)
   0.9%    22.1%       0.002s       2.00e-04s     10   300   
GpuElemwise{Mul}[(0, 1)](GpuElemwise{Composite{minimum(i0, 
maximum(minimum(i0, (maximum((i1 - i2), i3) + i2)), ((i4 * i5) + 
i1)))},no_inplace}.0, GpuDimShuffle{x,0}.0)
   ... (remaining 941 Apply instances account for 77.89%(0.18s) of the 
runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing 
list).
                 Test them first, as they are not guaranteed to always 
provide a speedup.
  Sorry, no tip for today.

And as you see ifelse is being shown as a PY operation which I would 
presume run on CPU. So where does it run? Also, what do you mean by add a 
condition is constant? 










P.S In case you need  these are my Theano flags

os.environ['THEANO_FLAGS'] = 
",optimizer=fast_run,floatX=float32,device=gpu,linker=cvm"
os.environ['THEANO_FLAGS'] += ',allow_gc=False,'
os.environ['THEANO_FLAGS'] += ',lib.cnmem=0.3'
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
os.environ['THEANO_FLAGS'] += ',profile=true'


On Friday, 24 March 2017 23:09:11 UTC+1, nouiz wrote:
>
> What tell you the ifelse is on the CPU?
>
> Anyway, add the condition is constant Theano will remove it during the 
> compilation.
>
> Fred
>
> Le ven. 24 mars 2017 12:41, Šarūnas S. <[email protected] <javascript:>> 
> a écrit :
>
>> Please find a code example:
>>
>> import theano as th
>> import theano.tensor as T
>>
>> retval = th.ifelse.ifelse( T.gt(T.constant(2.0),T.constant(1.0)), T.ones
>> ((500,1)),T.zeros((250,1)))
>>
>> On Friday, 24 March 2017 17:33:59 UTC+1, Šarūnas S. wrote:
>>>
>>> I am using theano version 0.9.0.rc2.dev version.
>>>
>>>
>>>
>>> On Friday, 24 March 2017 17:32:33 UTC+1, Šarūnas S. wrote:
>>>>
>>>> In my graph I have a few IfElse nodes and I am wondering how and where 
>>>> they are executed. 
>>>>
>>>> At first I ran the code with linker=cvm in my THEANO_FLAGS but after 
>>>> profiling it looked like the ifelse is being executed on the CPU. Then I 
>>>> forced the linker=c to check whether the IfElse will go through and I got 
>>>> the NotImplementedError: if{inplace, gpu} cannot produce C code. Btw 
>>>> removing inline optimization did not help as it still gave the same error. 
>>>>
>>>> So does IfElse have a GPU implementation? If yes how do I use it? Also, 
>>>> does it do lazy evaluation or not? 
>>>>
>>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: IfElse GPU version

Reply via email to