Hello! I have a problem when implementing RNN in theano. I have unitary hidden matrix and I manage to stay on unitary matrix manifold at each update. I modify stanard sgd procedure s.t. for ordinary parameter it performs gradient step update updates[param] = param - learning_rate * grad
and for unitary matrix update defined as follows: if manifold._exponential: param_updates = manifold.exp(pp, manifold.lincomb(pp, -learning_rate, manifold.proj(pp, gg))) else: param_updates = manifold.retr(pp, manifold.lincomb(pp, -learning_rate, manifold.proj(pp, gg))) Here exp and retr are exponential mapping and retraction on unitary matrix manifold http://www.manopt.org/reference/manopt/manifolds/stiefel/stiefelcomplexfactory.html <http://www.manopt.org/reference/manopt/manifolds/stiefel/stiefelcomplexfactory.html>As I see, theano currently doesn't support complex numbers so I work with tensors of shape (2, n, n) with first dimension corrensponding to real part and second to imaginary. Here are exp and retr functions: def retr(self, X, U): YR, YI = self.frac(X + U) Q, R = T.nlinalg.qr(YR + 1j * YI) Y = T.stack([Q.real, Q.imag]) return Y def concat(self, arrays, axis): return T.concatenate(arrays, axis=axis+1) def exp(self, X, U): # The exponential (in the sense of Lie group theory) of a tangent # vector U at X. first = self.concat([X, U], axis=1) XhU = self.complex_dot(self.hconj(X), U) second = complex_expm(self.concat([self.concat([XhU, -self. complex_dot(self.hconj(U), U)], 1), self.concat([self.identity(), XhU], 1)], 0 )) third = self.concat([complex_expm(-XhU), self.zeros()], 0) exponential = self.complex_dot(self.complex_dot(first, second), third) return exponential Here frac method is just getting a pair of A[0, :, :] and A[1, :, :] from A. Complex dot performs product of two matrices as a complex matrices, hconj performs complex conjugation. Note that exponential mapping involves exponent of complex matrix. So I write Op based on tensor.slinalg.expm https://github.com/Theano/theano/blob/8cc53673d7aa37873a753b7f847af4a185298f4d/theano/tensor/slinalg.py#L356-L425 Initially I wrote code without grad because I didn't unrestand how to transform real case gradients to complex. But then I realize that this gradient can be viewed as a directional derivative of expm at direction gA.T (or hconj(gA) for complex case). This directional derivative can be retrieved from matrix exponent of 2nx2n, matrix. So for at least small n this method pass finide difference test. Here is the code: class ComplexExpm(Op): """ Compute the matrix exponential of a square array. """ __props__ = () def make_node(self, A): assert imported_scipy, ( "Scipy not available. Scipy is needed for the Expm op") A = as_tensor_variable(A) assert A.ndim == 3 expm = theano.tensor.tensor3(dtype=A.dtype) return Apply(self, [A, ], [expm, ]) def perform(self, node, inputs, outputs): (A,) = inputs (expm,) = outputs temp = scipy.linalg.expm(A[0, :, :] + 1j * A[1, :, :]) expm[0] = np.stack([temp.real, temp.imag]) def grad(self, inputs, outputs): (A,) = inputs (g_out,) = outputs return [ComplexExpmGrad()(A, g_out)] def infer_shape(self, node, shapes): return [shapes[0]] def _hconj_internal(x): x_hconj = np.transpose(x, axes=(0, 2, 1)).copy() x_hconj[1, :, :] = -x_hconj[1, :, :] return x_hconj class ComplexExpmGrad(Op): """ Gradient of the matrix exponential of a square array. """ __props__ = () def make_node(self, A, gw): assert imported_scipy, ( "Scipy not available. Scipy is needed for the Expm op") A = as_tensor_variable(A) assert A.ndim == 3 out = theano.tensor.tensor3(dtype=A.dtype) return Apply(self, [A, gw], [out, ]) def infer_shape(self, node, shapes): return [shapes[0]] def perform(self, node, inputs, outputs): (A, gA) = inputs (out,) = outputs n = A.shape[1] Z = np.zeros_like(A) res_mat = np.concatenate([np.concatenate([A, _hconj_internal(gA)], axis=2), np.concatenate([Z, A], axis=2)], axis=1) res = scipy.linalg.expm(res_mat[0, :, :] + 1j * res_mat[1, :, :]) out_mat = np.stack([res[:n, n:].real, res[:n, n:].imag], axis=0) out[0] = _hconj_internal(out_mat).astype(A.dtype) complex_expm = ComplexExpm() Here becomes the problem. I study a paper about complex RNN and use they experimental setup with models. My model is just their RNN with their unitarity replaced with mine. https://github.com/amarshah/complex_RNN When I use retraction then I have normal network learning with my unitary matrix being unitary: Learning rate is 0.0010000000474974513 How much of unitarity U holds? U norm 11.31370849898476 Delta between U^*U and I: 1.3531515725470711e-14 Iteration: 0 mse: 1.4083181893131802 How much of unitarity U holds? U norm 11.313708498984749 Delta between U^*U and I: 1.397582563930589e-14 Iteration: 1 mse: 1.8754714993993804 How much of unitarity U holds? U norm 11.313708498984754 Delta between U^*U and I: 1.3569776678228608e-14 Iteration: 2 mse: 1.4130059596425537 But if I switch to exponential mapping, eventually after first update my matrix become zero: Learning rate is 0.0010000000474974513 How much of unitarity U holds? U norm 11.31370849898476 Delta between U^*U and I: 1.3531515725470711e-14 Iteration: 0 mse: 1.5589767423436534 How much of unitarity U holds? U norm 0.0 Delta between U^*U and I: 11.313708498984761 Iteration: 1 mse: 1.3212516523445936 How much of unitarity U holds? U norm 0.0 Delta between U^*U and I: 11.313708498984761 Iteration: 2 mse: 1.222838183394413 I suggest that there is a problem in execution of Exponential Op because all other parts are working well. Initially I wrote ComplexExpm Op without grad so when I saw a problem I was suggest that I need a grad. But even after I write grad, the problem isn't gone. Also as I remember when using custom Ops with numpy code inside one can debug this code at runtime (because Op execute the same numpy code). At now even if I make breakpoints inside ComplexExpm, program didn't stops at them. Can you suggest any mechanisms of Theano that makes this parameter erasing happen? -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
