Just for the dot reduction part, using batched_dot should give a O(log(n))
graph depth:
# assumes K > 1
B = A
for i in reversed(bin(K)[3:]):
if not int(i):
B = T.batched_dot(B[::2], B[1::2])
else:
B = T.dot(T.batched_dot(B[:-1:2], B[1::2]), B[-1])
This works if K is known in advance.
An efficient way to construct A is still needed.
Thursday, February 16, 2017 at 10:16:09 PM UTC+8, Oussama Souihli wrote:
> P.S.
> If it helps, yes the indices are exactly the ones I have in the code.
>
> Thanks
> Oussama
>
> On 16 February 2017 at 23:00, Oussama Souihli <[email protected]
> <javascript:>> wrote:
>
>> Hi Fred,
>>
>> Thank you for the quick reply and clever suggestions, you really hit the
>> nail on the head !
>>
>> I tried the suggested vectorized code, but unfortunately still
>> experienced the max recursion limit exception.
>> I also tried the workaround, but still get the max recusion limit until I
>> set it to 20K and I get a segmentation fault:
>>
>> Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
>>
>>
>> Can you think of other suggestions that I could try ?
>> If for-loop approach is doomed because of the max recursion issue, is
>> there perhaps a way I could rewrite my code such that scan becomes
>> reasonably faster ?
>>
>>
>>
>> On 16 February 2017 at 22:22, Frédéric Bastien <[email protected]
>> <javascript:>> wrote:
>>
>>> Is the indices exatly the one you have in your code? If so, you can
>>> probably take advantage of this to vectorize the code and make the for loop
>>> case work.
>>>
>>> The size are so small then using scan will be super slow. Each iteration
>>> of scan have big overhead and you only work on scalar mostly at each
>>> iteration. So it is normal that it is slow with scan.
>>>
>>> For the for loop, there was a problem of indices. I fixed it. Here is a
>>> partial vectorization of the code that will make it faster to execute and
>>> make a smaller graph:
>>>
>>> coss = tt.cos(theta)
>>> sins = tt.sin(theta)
>>>
>>> for k in range(0, K):
>>> A = tt.set_subtensor(A[indices[k, 0], indices[k, 0]], coss[k])
>>> A = tt.set_subtensor(A[indices[k, 1], indices[k, 1]], coss[k])
>>> A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], sins[k])
>>> A = tt.set_subtensor(A[indices[k, 1], indices[k, 0]], -sins[k])
>>> B = tt.dot(A, B)
>>>
>>> The call to cos and sin are only done once. This make the graph smaller.
>>>
>>> You can also do the indexing call only once. THis make the graph still
>>> smaller:
>>>
>>> coss = tt.cos(theta)
>>> sins = tt.cos(theta)
>>>
>>> for k in range(0, K):
>>> idx0 = indices[k, 0]
>>> idx1 = indices[k, 1]
>>> A = tt.set_subtensor(A[idx0, idx0], coss[k])
>>> A = tt.set_subtensor(A[idx1, idx1], coss[k])
>>> A = tt.set_subtensor(A[idx0, idx1], sins[k])
>>> A = tt.set_subtensor(A[idx1, idx0], -sins[k])
>>> B = tt.dot(A, B)
>>>
>>> The if the indices is exactly what you created and don't change, it can
>>> become that is even smaller:
>>>
>>> for k in range(0, K):
>>> idx0 = K//(N-1)
>>> idx1 = indices[k, 1]
>>> A = tt.set_subtensor(A[idx0, idx0], coss[k])
>>> A = tt.set_subtensor(A[idx1, idx1], coss[k])
>>> A = tt.set_subtensor(A[idx0, idx1], sins[k])
>>> A = tt.set_subtensor(A[idx1, idx0], -sins[k])
>>> B = tt.dot(A, B)
>>>
>>> You can do similar with j.
>>>
>>> If with that, you still have the max recursion limit, try the work
>>> around in this issue.
>>>
>>> https://github.com/Theano/Theano/issues/3607
>>>
>>> Fred
>>>
>>> On Thu, Feb 16, 2017 at 7:53 AM Oussama Souihli <[email protected]
>>> <javascript:>> wrote:
>>>
>>>> Hi Fred, Adam, Kiuhnm, all
>>>>
>>>> Thank you for the detailed and quick replies and multiple suggestions.
>>>> I tried bacthed_dot but still experienced both issues (maximum depth
>>>> recursion when using a for loop and very slow epochs when using
>>>> theano.scan),
>>>> so now I'm thinking the problem is also possibly due to the way I
>>>> populate the array A (using sub_tensor).
>>>>
>>>> Sorry for imposing on your kindness, but would you mind taking a look
>>>> at the attached, *very short*, minimal examples reproducing what I'm
>>>> trying to achieve ?
>>>> The examples are self-contained and only require python with numpy and
>>>> theano packages, nothing else.
>>>>
>>>> - *Issue_with_for_loop.py*:
>>>> - Reproduces the for-loop maximum recursion issue.
>>>> - if you run it as is, it will throw the exception on B.eval()
>>>>
>>>> - *Issue_with_scan.py*:
>>>> - This one shows the alternative way of implementing the logic
>>>> using scan.
>>>> - Unfortunately in this case it returns a result and gives the
>>>> impression like it works fine, but if I run it in Keras the epoch
>>>> duration
>>>> becomes prohibitively large (400,000 seconds per epoch !!)
>>>>
>>>>
>>>> If you can think of a way that can achieve the logic implemented in
>>>> either sheets with maximum efficiency theano-wise, I'd really appreciate
>>>> it.
>>>>
>>>>
>>>> With gratitude,
>>>> Oussama
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected] <javascript:>.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "theano-users" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/theano-users/Yz8g-cejB8c/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.