Hi Fred, Adam, Kiuhnm, all Thank you for the detailed and quick replies and multiple suggestions. I tried bacthed_dot but still experienced both issues (maximum depth recursion when using a for loop and very slow epochs when using theano.scan), so now I'm thinking the problem is also possibly due to the way I populate the array A (using sub_tensor).

## Advertising

Sorry for imposing on your kindness, but would you mind taking a look at the attached, *very short*, minimal examples reproducing what I'm trying to achieve ? The examples are self-contained and only require python with numpy and theano packages, nothing else. - *Issue_with_for_loop.py*: - Reproduces the for-loop maximum recursion issue. - if you run it as is, it will throw the exception on B.eval() - *Issue_with_scan.py*: - This one shows the alternative way of implementing the logic using scan. - Unfortunately in this case it returns a result and gives the impression like it works fine, but if I run it in Keras the epoch duration becomes prohibitively large (400,000 seconds per epoch !!) If you can think of a way that can achieve the logic implemented in either sheets with maximum efficiency theano-wise, I'd really appreciate it. With gratitude, Oussama -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

import numpy as np import theano as t import theano.tensor as tt # Size of each matrix A N = 100 # Number of matrices to multiply, so if N =100, K = 100*99/2 = 4950 K = N * (N - 1) // 2 def get_indices(): indices = np.int64(np.zeros((K, 2))) k = 0 for i in range(0, N-1): for j in range(i + 1, N): indices[k, 0] = i indices[k, 1] = j k += 1 return indices def one_step(location, value): G = tt.eye(N) G = tt.set_subtensor(G[location[0], location[0]], tt.cos(value)) G = tt.set_subtensor(G[location[1], location[1]], tt.cos(value)) G = tt.set_subtensor(G[location[0], location[1]], tt.sin(value)) G = tt.set_subtensor(G[location[1], location[0]], -tt.sin(value)) return G if __name__ == '__main__': # Vector of K angles, each k_th angle will be used to construct the k_th matrix A theta = t.shared(np.random.uniform(low=0., high=2*np.pi, size=K)) # This will determine where the angles will be placed in each matrix, it's not very important for our problem indices = t.shared(get_indices()) A = tt.eye(N) B = tt.eye(N) A, _ = t.scan(fn=one_step, outputs_info=None, sequences=[indices, theta]) B = A[-1] # this is in case you want to test "B" # I chose B's determinant instead of the whole matrix because B is too large to display print (np.linalg.det(B.eval()))

import numpy as np import theano as t import theano.tensor as tt ''' This gives the below error when you do B.eval(): File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 92, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) RuntimeError: maximum recursion depth exceeded ''' # Size of each matrix A N = 100 # Number of matrices to multiply, so if N =100, K = 100*99/2 = 4950 K = N * (N - 1) // 2 def get_indices(): indices = np.int64(np.zeros((K, 2))) k = 0 for i in range(0, N-1): for j in range(i + 1, N): indices[k, 0] = i indices[k, 1] = j k += 1 return indices if __name__ == '__main__': # Size of each matrix A N = 100 # Number of matrices to multiply, so if N =100, K = 100*99/2 = 4950 K = N * (N - 1) // 2 # Vector of K angles, each k_th angle will be used to construct the k_th matrix A theta = t.shared(np.random.uniform(low=0., high=2*np.pi, size=K)) # This will determine where the angles will be placed in each matrix, it's not very important for our problem indices = t.shared(get_indices()) A = tt.eye(N) B = tt.eye(N) for k in range(0, K): A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], tt.cos(theta[k])) A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], tt.cos(theta[k])) A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], -tt.sin(theta[k])) A = tt.set_subtensor(A[indices[k, 0], indices[k, 1]], tt.sin(theta[k])) B = tt.dot(A, B) # this is in case you want to test "B" # I chose B's determinant instead of the whole matrix because B is too large to display print (np.linalg.det(B.eval()))