I suspect that ifelse is running on GPU because this is the profile I get
==================
Message: Sum of all(44) printed profiles at exit excluding Scan op
profile.
Time in 95 calls to Function.__call__: 2.309995e-01s
Time in Function.fn.__call__: 2.299995e-01s (99.567%)
Time in thunks: 2.307765e-01s (99.903%)
Total compile time: 1.360100e+01s
Number of Apply nodes: 416
Theano Optimizer time: 6.314001e+00s
Theano validate time: 9.200015e-01s
Theano Linker time (includes C, CUDA code generation/compiling):
1.169000e+00s
Import time 2.799892e-02s
Node make_thunk time 1.108999e+00s
Node GpuElemwise{Composite{(i0 * ((i1 * i2) + (i1 * i3)))}}[(0,
2)](CudaNdarrayConstant{0.5}, CudaNdarrayConstant{0.833333313465},
GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0) time 6.999969e-03s
Node GpuElemwise{Composite{(-minimum(i0, maximum(minimum(i0,
(maximum((i1 - i2), i3) + i2)), (((i1 + i2) * i4) +
i1))))},no_inplace}(<CudaNdarrayType(float32, scalar)>,
<CudaNdarrayType(float32, scalar)>, <CudaNdarrayType(float32, scalar)>,
CudaNdarrayConstant{120.0}, <CudaNdarrayType(float32, scalar)>) time
4.999876e-03s
Node GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32,
matrix)>, GpuElemwise{TrueDiv}[(0, 0)].0) time 4.000187e-03s
Node HostFromGpu(<CudaNdarrayType(float32, scalar)>) time
3.999949e-03s
Node GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0,
GpuDimShuffle{x,0}.0) time 3.999949e-03s
Time in all call to theano.grad() 0.000000e+00s
Time since theano import 28.959s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply>
<Class name>
55.4% 55.4% 0.128s 8.71e-05s C 1468 301
theano.sandbox.cuda.basic_ops.GpuElemwise
25.6% 81.0% 0.059s 1.03e-04s C 571 106
theano.sandbox.cuda.basic_ops.GpuCAReduce
9.1% 90.1% 0.021s 3.72e-05s C 564 150
theano.sandbox.cuda.basic_ops.HostFromGpu
5.6% 95.7% 0.013s 6.04e-06s Py 2148 168
theano.ifelse.IfElse
3.5% 99.1% 0.008s 2.16e-04s C 37 4
theano.compile.ops.DeepCopyOp
0.4% 99.6% 0.001s 1.60e-06s C 623 122
theano.sandbox.cuda.basic_ops.GpuDimShuffle
0.4% 100.0% 0.001s 1.97e-06s C 506 110
theano.tensor.elemwise.Elemwise
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op
name>
16.9% 16.9% 0.039s 1.22e-04s C 319 58
GpuElemwise{mul,no_inplace}
10.0% 26.9% 0.023s 1.49e-04s C 155 30
GpuCAReduce{add}{1,0}
9.1% 36.0% 0.021s 3.72e-05s C 564 150
HostFromGpu
8.2% 44.2% 0.019s 1.23e-04s C 154 30
GpuCAReduce{add}{0,1}
6.9% 51.1% 0.016s 6.61e-05s C 242 44
GpuElemwise{Mul}[(0, 1)]
6.5% 57.6% 0.015s 6.20e-05s C 242 44
GpuElemwise{maximum,no_inplace}
6.5% 64.1% 0.015s 6.19e-05s C 242 44
GpuCAReduce{maximum}{1}
5.6% 69.7% 0.013s 6.04e-06s Py 2148 168
if{inplace,gpu}
3.5% 73.2% 0.008s 5.59e-05s C 143 26
GpuElemwise{TrueDiv}[(0, 0)]
3.5% 76.7% 0.008s 2.16e-04s C 37 4
DeepCopyOp
2.6% 79.3% 0.006s 8.95e-05s C 67 16
GpuElemwise{Mul}[(0, 2)]
2.2% 81.4% 0.005s 1.25e-04s C 40 4
GpuElemwise{Maximum}[(0, 0)]
1.7% 83.2% 0.004s 2.00e-04s C 20 2
GpuElemwise{Composite{maximum(i0, maximum(i1, maximum(i2, i3)))}}[(0, 0)]
1.7% 84.9% 0.004s 4.93e-04s C 8 8
GpuElemwise{neg,no_inplace}
1.3% 86.2% 0.003s 1.36e-04s C 22 4
GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}
1.3% 87.5% 0.003s 2.50e-04s C 12 3
GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2),
i3) + i2)), ((i4 * i5) + i1)))}}[(0, 4)]
1.3% 88.8% 0.003s 9.08e-05s C 33 6
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)]
0.9% 89.6% 0.002s 3.03e-05s C 66 12
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)]
0.9% 90.5% 0.002s 1.00e-04s C 20 2
GpuCAReduce{add}{1,1}
0.9% 91.4% 0.002s 2.50e-04s C 8 3
GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2),
i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}
... (remaining 28 Ops account for 8.62%(0.02s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
1.7% 1.7% 0.004s 4.00e-04s 10 365
GpuElemwise{Maximum}[(0, 0)](if{inplace,gpu}.0, if{inplace,gpu}.0)
1.3% 3.0% 0.003s 3.00e-04s 10 105
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{TrueDiv}[(0, 0)].0)
1.3% 4.3% 0.003s 3.00e-04s 10 356
GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0, GpuDimShuffle{0,x}.0)
1.3% 5.6% 0.003s 3.00e-04s 10 143
GpuCAReduce{add}{1,0}(GpuElemwise{mul,no_inplace}.0)
1.3% 6.9% 0.003s 3.00e-04s 10 112
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{TrueDiv}[(0, 0)].0)
1.3% 8.2% 0.003s 3.00e-04s 10 169
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
1.3% 9.5% 0.003s 3.00e-04s 10 136
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{TrueDiv}[(0, 0)].0)
1.3% 10.8% 0.003s 3.00e-04s 10 217
GpuCAReduce{add}{0,1}(GpuElemwise{mul,no_inplace}.0)
1.3% 12.1% 0.003s 3.00e-04s 10 184
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)](GpuElemwise{TrueDiv}[(0,
0)].0, GpuElemwise{maximum,no_inplace}.0, GpuElemwise{add,no_inplace}.0)
1.3% 13.4% 0.003s 5.96e-04s 5 1
HostFromGpu(GpuElemwise{Composite{minimum(i0, maximum(minimum(i0,
(maximum((i1 - i2), i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}.0)
0.9% 14.3% 0.002s 1.69e-04s 12 0
DeepCopyOp(<CudaNdarrayType(float32, scalar)>)
0.9% 15.2% 0.002s 2.00e-04s 10 148
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
0.9% 16.0% 0.002s 2.00e-04s 10 153
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
0.9% 16.9% 0.002s 2.00e-04s 10 126
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{TrueDiv}[(0, 0)].0)
0.9% 17.8% 0.002s 2.00e-04s 10 412
GpuCAReduce{add}{1,1}(GpuElemwise{Composite{(((i0 + i1) + i2) + i3)}}[(0,
0)].0)
0.9% 18.6% 0.002s 2.00e-04s 10 103
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{TrueDiv}[(0, 0)].0)
0.9% 19.5% 0.002s 2.00e-04s 10 89
GpuElemwise{TrueDiv}[(0, 0)](GpuElemwise{maximum,no_inplace}.0,
GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}.0)
0.9% 20.4% 0.002s 2.00e-04s 10 3
GpuElemwise{maximum,no_inplace}(<CudaNdarrayType(float32, col)>,
CudaNdarrayConstant{[[ 0.001]]})
0.9% 21.2% 0.002s 2.00e-04s 10 134
GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
GpuElemwise{TrueDiv}[(0, 0)].0)
0.9% 22.1% 0.002s 2.00e-04s 10 300
GpuElemwise{Mul}[(0, 1)](GpuElemwise{Composite{minimum(i0,
maximum(minimum(i0, (maximum((i1 - i2), i3) + i2)), ((i4 * i5) +
i1)))},no_inplace}.0, GpuDimShuffle{x,0}.0)
... (remaining 941 Apply instances account for 77.89%(0.18s) of the
runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing
list).
Test them first, as they are not guaranteed to always
provide a speedup.
Sorry, no tip for today.
And as you see ifelse is being shown as a PY operation which I would
presume run on CPU. So where does it run? Also, what do you mean by add a
condition is constant?
P.S In case you need these are my Theano flags
os.environ['THEANO_FLAGS'] =
",optimizer=fast_run,floatX=float32,device=gpu,linker=cvm"
os.environ['THEANO_FLAGS'] += ',allow_gc=False,'
os.environ['THEANO_FLAGS'] += ',lib.cnmem=0.3'
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
os.environ['THEANO_FLAGS'] += ',profile=true'
On Friday, 24 March 2017 23:09:11 UTC+1, nouiz wrote:
>
> What tell you the ifelse is on the CPU?
>
> Anyway, add the condition is constant Theano will remove it during the
> compilation.
>
> Fred
>
> Le ven. 24 mars 2017 12:41, Šarūnas S. <[email protected] <javascript:>>
> a écrit :
>
>> Please find a code example:
>>
>> import theano as th
>> import theano.tensor as T
>>
>> retval = th.ifelse.ifelse( T.gt(T.constant(2.0),T.constant(1.0)), T.ones
>> ((500,1)),T.zeros((250,1)))
>>
>> On Friday, 24 March 2017 17:33:59 UTC+1, Šarūnas S. wrote:
>>>
>>> I am using theano version 0.9.0.rc2.dev version.
>>>
>>>
>>>
>>> On Friday, 24 March 2017 17:32:33 UTC+1, Šarūnas S. wrote:
>>>>
>>>> In my graph I have a few IfElse nodes and I am wondering how and where
>>>> they are executed.
>>>>
>>>> At first I ran the code with linker=cvm in my THEANO_FLAGS but after
>>>> profiling it looked like the ifelse is being executed on the CPU. Then I
>>>> forced the linker=c to check whether the IfElse will go through and I got
>>>> the NotImplementedError: if{inplace, gpu} cannot produce C code. Btw
>>>> removing inline optimization did not help as it still gave the same error.
>>>>
>>>> So does IfElse have a GPU implementation? If yes how do I use it? Also,
>>>> does it do lazy evaluation or not?
>>>>
>>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.