Nouiz sorry I understand what you were refering by is constant. I've
mislead you with my example.
This is a more realistic example:
import theano as th
import theano.tensor as T
allowed_branch = th.shared( np.cast['float32']( 0 ) )
x = T.matrix('x')
y = T.matrix('y')
f = x ** 2 + y ** 2 + 2 * x * y
result = th.ifelse.ifelse( T.gt( allowed_branch, T.constant( 0 ) ), f,
T.zeros( (2,2) ) )
I am working on a realtime system which in a given situation will
constructs a relevant computational graph, compute its result and display
it.
However, the graphs are relatively big and each of their compilation takes
too long so I cant compile realtime. Thus I have to somehow precompile.
I have decided to precompile a general graph in which all the possible
graphs are nested. Then during realtime I would set which parts of the
general graph to use using the *allowed_branch* variables and *if* nodes.
Since afaik ifs are evaluated lazily in each case I would only be using the
relevant part of the graph so my computational cost is minimal.
On Saturday, 25 March 2017 10:04:21 UTC+1, Šarūnas S. wrote:
>
> I suspect that ifelse is running on GPU because this is the profile I get
>
> ==================
> Message: Sum of all(44) printed profiles at exit excluding Scan op
> profile.
> Time in 95 calls to Function.__call__: 2.309995e-01s
> Time in Function.fn.__call__: 2.299995e-01s (99.567%)
> Time in thunks: 2.307765e-01s (99.903%)
> Total compile time: 1.360100e+01s
> Number of Apply nodes: 416
> Theano Optimizer time: 6.314001e+00s
> Theano validate time: 9.200015e-01s
> Theano Linker time (includes C, CUDA code generation/compiling):
> 1.169000e+00s
> Import time 2.799892e-02s
> Node make_thunk time 1.108999e+00s
> Node GpuElemwise{Composite{(i0 * ((i1 * i2) + (i1 * i3)))}}[(0,
> 2)](CudaNdarrayConstant{0.5}, CudaNdarrayConstant{0.833333313465},
> GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0) time 6.999969e-03s
> Node GpuElemwise{Composite{(-minimum(i0, maximum(minimum(i0,
> (maximum((i1 - i2), i3) + i2)), (((i1 + i2) * i4) +
> i1))))},no_inplace}(<CudaNdarrayType(float32, scalar)>,
> <CudaNdarrayType(float32, scalar)>, <CudaNdarrayType(float32, scalar)>,
> CudaNdarrayConstant{120.0}, <CudaNdarrayType(float32, scalar)>) time
> 4.999876e-03s
> Node GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32,
> matrix)>, GpuElemwise{TrueDiv}[(0, 0)].0) time 4.000187e-03s
> Node HostFromGpu(<CudaNdarrayType(float32, scalar)>) time
> 3.999949e-03s
> Node GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0,
> GpuDimShuffle{x,0}.0) time 3.999949e-03s
>
> Time in all call to theano.grad() 0.000000e+00s
> Time since theano import 28.959s
> Class
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply>
> <Class name>
> 55.4% 55.4% 0.128s 8.71e-05s C 1468 301
> theano.sandbox.cuda.basic_ops.GpuElemwise
> 25.6% 81.0% 0.059s 1.03e-04s C 571 106
> theano.sandbox.cuda.basic_ops.GpuCAReduce
> 9.1% 90.1% 0.021s 3.72e-05s C 564 150
> theano.sandbox.cuda.basic_ops.HostFromGpu
> 5.6% 95.7% 0.013s 6.04e-06s Py 2148 168
> theano.ifelse.IfElse
> 3.5% 99.1% 0.008s 2.16e-04s C 37 4
> theano.compile.ops.DeepCopyOp
> 0.4% 99.6% 0.001s 1.60e-06s C 623 122
> theano.sandbox.cuda.basic_ops.GpuDimShuffle
> 0.4% 100.0% 0.001s 1.97e-06s C 506 110
> theano.tensor.elemwise.Elemwise
> ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
>
> Ops
> ---
> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op
> name>
> 16.9% 16.9% 0.039s 1.22e-04s C 319 58
> GpuElemwise{mul,no_inplace}
> 10.0% 26.9% 0.023s 1.49e-04s C 155 30
> GpuCAReduce{add}{1,0}
> 9.1% 36.0% 0.021s 3.72e-05s C 564 150
> HostFromGpu
> 8.2% 44.2% 0.019s 1.23e-04s C 154 30
> GpuCAReduce{add}{0,1}
> 6.9% 51.1% 0.016s 6.61e-05s C 242 44
> GpuElemwise{Mul}[(0, 1)]
> 6.5% 57.6% 0.015s 6.20e-05s C 242 44
> GpuElemwise{maximum,no_inplace}
> 6.5% 64.1% 0.015s 6.19e-05s C 242 44
> GpuCAReduce{maximum}{1}
> 5.6% 69.7% 0.013s 6.04e-06s Py 2148 168
> if{inplace,gpu}
> 3.5% 73.2% 0.008s 5.59e-05s C 143 26
> GpuElemwise{TrueDiv}[(0, 0)]
> 3.5% 76.7% 0.008s 2.16e-04s C 37 4
> DeepCopyOp
> 2.6% 79.3% 0.006s 8.95e-05s C 67 16
> GpuElemwise{Mul}[(0, 2)]
> 2.2% 81.4% 0.005s 1.25e-04s C 40 4
> GpuElemwise{Maximum}[(0, 0)]
> 1.7% 83.2% 0.004s 2.00e-04s C 20 2
> GpuElemwise{Composite{maximum(i0, maximum(i1, maximum(i2, i3)))}}[(0, 0)]
> 1.7% 84.9% 0.004s 4.93e-04s C 8 8
> GpuElemwise{neg,no_inplace}
> 1.3% 86.2% 0.003s 1.36e-04s C 22 4
> GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}
> 1.3% 87.5% 0.003s 2.50e-04s C 12 3
> GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2),
> i3) + i2)), ((i4 * i5) + i1)))}}[(0, 4)]
> 1.3% 88.8% 0.003s 9.08e-05s C 33 6
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)]
> 0.9% 89.6% 0.002s 3.03e-05s C 66 12
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)]
> 0.9% 90.5% 0.002s 1.00e-04s C 20 2
> GpuCAReduce{add}{1,1}
> 0.9% 91.4% 0.002s 2.50e-04s C 8 3
> GpuElemwise{Composite{minimum(i0, maximum(minimum(i0, (maximum((i1 - i2),
> i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}
> ... (remaining 28 Ops account for 8.62%(0.02s) of the runtime)
>
> Apply
> ------
> <% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
> 1.7% 1.7% 0.004s 4.00e-04s 10 365
> GpuElemwise{Maximum}[(0, 0)](if{inplace,gpu}.0, if{inplace,gpu}.0)
> 1.3% 3.0% 0.003s 3.00e-04s 10 105
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{TrueDiv}[(0, 0)].0)
> 1.3% 4.3% 0.003s 3.00e-04s 10 356
> GpuElemwise{Mul}[(0, 1)](GpuDimShuffle{x,x}.0, GpuDimShuffle{0,x}.0)
> 1.3% 5.6% 0.003s 3.00e-04s 10 143
> GpuCAReduce{add}{1,0}(GpuElemwise{mul,no_inplace}.0)
> 1.3% 6.9% 0.003s 3.00e-04s 10 112
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{TrueDiv}[(0, 0)].0)
> 1.3% 8.2% 0.003s 3.00e-04s 10 169
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
> 1.3% 9.5% 0.003s 3.00e-04s 10 136
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{TrueDiv}[(0, 0)].0)
> 1.3% 10.8% 0.003s 3.00e-04s 10 217
> GpuCAReduce{add}{0,1}(GpuElemwise{mul,no_inplace}.0)
> 1.3% 12.1% 0.003s 3.00e-04s 10 184
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 0)](GpuElemwise{TrueDiv}[(0,
> 0)].0, GpuElemwise{maximum,no_inplace}.0, GpuElemwise{add,no_inplace}.0)
> 1.3% 13.4% 0.003s 5.96e-04s 5 1
> HostFromGpu(GpuElemwise{Composite{minimum(i0, maximum(minimum(i0,
> (maximum((i1 - i2), i3) + i2)), (((i2 + i1) * i4) + i1)))},no_inplace}.0)
> 0.9% 14.3% 0.002s 1.69e-04s 12 0
> DeepCopyOp(<CudaNdarrayType(float32, scalar)>)
> 0.9% 15.2% 0.002s 2.00e-04s 10 148
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
> 0.9% 16.0% 0.002s 2.00e-04s 10 153
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{Composite{(i0 * (i1 / i2))}}[(0, 1)].0)
> 0.9% 16.9% 0.002s 2.00e-04s 10 126
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{TrueDiv}[(0, 0)].0)
> 0.9% 17.8% 0.002s 2.00e-04s 10 412
> GpuCAReduce{add}{1,1}(GpuElemwise{Composite{(((i0 + i1) + i2) + i3)}}[(0,
> 0)].0)
> 0.9% 18.6% 0.002s 2.00e-04s 10 103
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{TrueDiv}[(0, 0)].0)
> 0.9% 19.5% 0.002s 2.00e-04s 10 89
> GpuElemwise{TrueDiv}[(0, 0)](GpuElemwise{maximum,no_inplace}.0,
> GpuElemwise{Composite{((i0 + i1) + i2)},no_inplace}.0)
> 0.9% 20.4% 0.002s 2.00e-04s 10 3
> GpuElemwise{maximum,no_inplace}(<CudaNdarrayType(float32, col)>,
> CudaNdarrayConstant{[[ 0.001]]})
> 0.9% 21.2% 0.002s 2.00e-04s 10 134
> GpuElemwise{mul,no_inplace}(<CudaNdarrayType(float32, matrix)>,
> GpuElemwise{TrueDiv}[(0, 0)].0)
> 0.9% 22.1% 0.002s 2.00e-04s 10 300
> GpuElemwise{Mul}[(0, 1)](GpuElemwise{Composite{minimum(i0,
> maximum(minimum(i0, (maximum((i1 - i2), i3) + i2)), ((i4 * i5) +
> i1)))},no_inplace}.0, GpuDimShuffle{x,0}.0)
> ... (remaining 941 Apply instances account for 77.89%(0.18s) of the
> runtime)
>
> Here are tips to potentially make your code run faster
> (if you think of new ones, suggest them on the mailing
> list).
> Test them first, as they are not guaranteed to always
> provide a speedup.
> Sorry, no tip for today.
>
> And as you see ifelse is being shown as a PY operation which I would
> presume run on CPU. So where does it run? Also, what do you mean by add a
> condition is constant?
>
>
>
>
>
>
>
>
>
>
> P.S In case you need these are my Theano flags
>
> os.environ['THEANO_FLAGS'] =
> ",optimizer=fast_run,floatX=float32,device=gpu,linker=cvm"
> os.environ['THEANO_FLAGS'] += ',allow_gc=False,'
> os.environ['THEANO_FLAGS'] += ',lib.cnmem=0.3'
> os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
> os.environ['THEANO_FLAGS'] += ',profile=true'
>
>
> On Friday, 24 March 2017 23:09:11 UTC+1, nouiz wrote:
>>
>> What tell you the ifelse is on the CPU?
>>
>> Anyway, add the condition is constant Theano will remove it during the
>> compilation.
>>
>> Fred
>>
>> Le ven. 24 mars 2017 12:41, Šarūnas S. <[email protected]> a écrit :
>>
>>> Please find a code example:
>>>
>>> import theano as th
>>> import theano.tensor as T
>>>
>>> retval = th.ifelse.ifelse( T.gt(T.constant(2.0),T.constant(1.0)), T.ones
>>> ((500,1)),T.zeros((250,1)))
>>>
>>> On Friday, 24 March 2017 17:33:59 UTC+1, Šarūnas S. wrote:
>>>>
>>>> I am using theano version 0.9.0.rc2.dev version.
>>>>
>>>>
>>>>
>>>> On Friday, 24 March 2017 17:32:33 UTC+1, Šarūnas S. wrote:
>>>>>
>>>>> In my graph I have a few IfElse nodes and I am wondering how and where
>>>>> they are executed.
>>>>>
>>>>> At first I ran the code with linker=cvm in my THEANO_FLAGS but after
>>>>> profiling it looked like the ifelse is being executed on the CPU. Then I
>>>>> forced the linker=c to check whether the IfElse will go through and I got
>>>>> the NotImplementedError: if{inplace, gpu} cannot produce C code. Btw
>>>>> removing inline optimization did not help as it still gave the same
>>>>> error.
>>>>>
>>>>> So does IfElse have a GPU implementation? If yes how do I use it?
>>>>> Also, does it do lazy evaluation or not?
>>>>>
>>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.