Is the indices exatly the one you have in your code? If so, you can
probably take advantage of this to vectorize the code and make the for loop
case work.
The size are so small then using scan will be super slow. Each iteration of
scan have big overhead and you only work on scalar mostly at each
iteration. So it is normal that it is slow with scan.
For the for loop, there was a problem of indices. I fixed it. Here is a
partial vectorization of the code that will make it faster to execute and
make a smaller graph:
coss = tt.cos(theta)
sins = tt.sin(theta)
for k in range(0, K):
A = tt.set_subtensor(A[indices[k, 0], indices[k, 0]], coss[k])
A = tt.set_subtensor(A[indices[k, 1], indices[k, 1]], coss[k])
A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], sins[k])
A = tt.set_subtensor(A[indices[k, 1], indices[k, 0]], -sins[k])
B = tt.dot(A, B)
The call to cos and sin are only done once. This make the graph smaller.
You can also do the indexing call only once. THis make the graph still
smaller:
coss = tt.cos(theta)
sins = tt.cos(theta)
for k in range(0, K):
idx0 = indices[k, 0]
idx1 = indices[k, 1]
A = tt.set_subtensor(A[idx0, idx0], coss[k])
A = tt.set_subtensor(A[idx1, idx1], coss[k])
A = tt.set_subtensor(A[idx0, idx1], sins[k])
A = tt.set_subtensor(A[idx1, idx0], -sins[k])
B = tt.dot(A, B)
The if the indices is exactly what you created and don't change, it can
become that is even smaller:
for k in range(0, K):
idx0 = K//(N-1)
idx1 = indices[k, 1]
A = tt.set_subtensor(A[idx0, idx0], coss[k])
A = tt.set_subtensor(A[idx1, idx1], coss[k])
A = tt.set_subtensor(A[idx0, idx1], sins[k])
A = tt.set_subtensor(A[idx1, idx0], -sins[k])
B = tt.dot(A, B)
You can do similar with j.
If with that, you still have the max recursion limit, try the work around
in this issue.
https://github.com/Theano/Theano/issues/3607
Fred
On Thu, Feb 16, 2017 at 7:53 AM Oussama Souihli <[email protected]>
wrote:
> Hi Fred, Adam, Kiuhnm, all
>
> Thank you for the detailed and quick replies and multiple suggestions.
> I tried bacthed_dot but still experienced both issues (maximum depth
> recursion when using a for loop and very slow epochs when using
> theano.scan),
> so now I'm thinking the problem is also possibly due to the way I
> populate the array A (using sub_tensor).
>
> Sorry for imposing on your kindness, but would you mind taking a look at
> the attached, *very short*, minimal examples reproducing what I'm trying
> to achieve ?
> The examples are self-contained and only require python with numpy and
> theano packages, nothing else.
>
> - *Issue_with_for_loop.py*:
> - Reproduces the for-loop maximum recursion issue.
> - if you run it as is, it will throw the exception on B.eval()
>
> - *Issue_with_scan.py*:
> - This one shows the alternative way of implementing the logic using
> scan.
> - Unfortunately in this case it returns a result and gives the
> impression like it works fine, but if I run it in Keras the epoch
> duration
> becomes prohibitively large (400,000 seconds per epoch !!)
>
>
> If you can think of a way that can achieve the logic implemented in either
> sheets with maximum efficiency theano-wise, I'd really appreciate it.
>
>
> With gratitude,
> Oussama
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.