Re: [theano-users] Theano RNN. Parameter go to zero after update.

Pascal Lamblin Mon, 03 Oct 2016 14:44:06 -0700

Hi,

Just a couple of hints:
- you can try verify_grad on ComplexExpm to check if the gradient is
consistent with a finite-difference estimate
- pdb breakpoints inside the perform() method of both ComplexExpm and
ComplexExpmGrad should be triggered during the computation, if they are
not, it would mean that those nodes are not in the computation graph.
- You can use PdbBreakpointOp to monitor the values of some variables
and drop to pdb only if some condition is met.


On Mon, Oct 03, 2016, const.be...@gmail.com wrote:
> Hello!
> 
> I have a problem when implementing RNN in theano. I have unitary hidden 
> matrix and I manage to stay on unitary matrix manifold at each update. I 
> modify stanard sgd procedure s.t. for ordinary parameter it performs 
> gradient step update
> updates[param] = param - learning_rate * grad
> 
> 
> 
> and for unitary matrix update defined as follows:
> if manifold._exponential:
>    param_updates = manifold.exp(pp, manifold.lincomb(pp, -learning_rate, 
> manifold.proj(pp, gg)))
> else:
>    param_updates = manifold.retr(pp, manifold.lincomb(pp, -learning_rate, 
> manifold.proj(pp, gg)))
> 
> Here exp and retr are exponential mapping and retraction on unitary matrix 
> manifold
> 
> http://www.manopt.org/reference/manopt/manifolds/stiefel/stiefelcomplexfactory.html
> 
> 
> <http://www.manopt.org/reference/manopt/manifolds/stiefel/stiefelcomplexfactory.html>As
>  
> I see, theano currently doesn't support complex numbers so I work with 
> tensors of shape (2, n, n) with first dimension corrensponding to real part 
> and second to imaginary.
> 
> Here are exp and retr functions:
> 
>     def retr(self, X, U):
>         YR, YI = self.frac(X + U)
> 
> 
>         Q, R = T.nlinalg.qr(YR + 1j * YI)
>         Y = T.stack([Q.real, Q.imag])
>         return Y
> 
> 
>     def concat(self, arrays, axis):
>         return T.concatenate(arrays, axis=axis+1)
> 
> 
>     def exp(self, X, U):
>         # The exponential (in the sense of Lie group theory) of a tangent
>         # vector U at X.
>         first = self.concat([X, U], axis=1)
>         XhU = self.complex_dot(self.hconj(X), U)
>         second = complex_expm(self.concat([self.concat([XhU, -self.
> complex_dot(self.hconj(U), U)], 1),
>                                   self.concat([self.identity(), XhU], 1)], 0
> ))
>         third = self.concat([complex_expm(-XhU), self.zeros()], 0)
>         exponential = self.complex_dot(self.complex_dot(first, second), 
> third)
>         return exponential
> 
> 
> Here frac method is just getting a pair of A[0, :, :] and A[1, :, :] from 
> A. Complex dot performs product of two matrices as a complex matrices, 
> hconj performs complex conjugation.
> 
> Note that exponential mapping involves exponent of complex matrix. So I 
> write Op based on tensor.slinalg.expm
> 
> https://github.com/Theano/theano/blob/8cc53673d7aa37873a753b7f847af4a185298f4d/theano/tensor/slinalg.py#L356-L425
> 
> Initially I wrote code without grad because I didn't unrestand how to 
> transform real case gradients to complex. But then I realize that this 
> gradient can be viewed as a directional derivative of expm at direction 
> gA.T (or hconj(gA) for complex case). This directional derivative can be 
> retrieved from matrix exponent of 2nx2n, matrix. So for at least small n 
> this method pass finide difference test.
> Here is the code:
> 
> class ComplexExpm(Op):
>     """
>     Compute the matrix exponential of a square array.
>     """
> 
> 
>     __props__ = ()
> 
> 
>     def make_node(self, A):
>         assert imported_scipy, (
>             "Scipy not available. Scipy is needed for the Expm op")
> 
> 
>         A = as_tensor_variable(A)
>         assert A.ndim == 3
>         expm = theano.tensor.tensor3(dtype=A.dtype)
>         return Apply(self, [A, ], [expm, ])
> 
> 
>     def perform(self, node, inputs, outputs):
>         (A,) = inputs
>         (expm,) = outputs
>         temp = scipy.linalg.expm(A[0, :, :] + 1j * A[1, :, :])
>         expm[0] = np.stack([temp.real, temp.imag])
> 
> 
>     def grad(self, inputs, outputs):
>         (A,) = inputs
>         (g_out,) = outputs
>         return [ComplexExpmGrad()(A, g_out)]
> 
> 
>     def infer_shape(self, node, shapes):
>         return [shapes[0]]
> 
> 
> 
> 
> def _hconj_internal(x):
>     x_hconj = np.transpose(x, axes=(0, 2, 1)).copy()
>     x_hconj[1, :, :] = -x_hconj[1, :, :]
>     return x_hconj
> 
> 
> 
> 
> class ComplexExpmGrad(Op):
>     """
>     Gradient of the matrix exponential of a square array.
>     """
> 
> 
>     __props__ = ()
> 
> 
>     def make_node(self, A, gw):
>         assert imported_scipy, (
>             "Scipy not available. Scipy is needed for the Expm op")
>         A = as_tensor_variable(A)
>         assert A.ndim == 3
>         out = theano.tensor.tensor3(dtype=A.dtype)
>         return Apply(self, [A, gw], [out, ])
> 
> 
>     def infer_shape(self, node, shapes):
>         return [shapes[0]]
> 
> 
>     def perform(self, node, inputs, outputs):
>         (A, gA) = inputs
>         (out,) = outputs
>         n = A.shape[1]
>         Z = np.zeros_like(A)
>         res_mat = np.concatenate([np.concatenate([A, _hconj_internal(gA)], 
> axis=2),
>                                   np.concatenate([Z, A], axis=2)], axis=1)
>         res = scipy.linalg.expm(res_mat[0, :, :] + 1j * res_mat[1, :, :])
>         out_mat = np.stack([res[:n, n:].real, res[:n, n:].imag], axis=0)
>         out[0] = _hconj_internal(out_mat).astype(A.dtype)
> 
> 
> complex_expm = ComplexExpm()
> 
> Here becomes the problem. I study a paper about complex RNN and use they 
> experimental setup with models. My model is just their RNN with their 
> unitarity replaced with mine.
> 
> https://github.com/amarshah/complex_RNN
> 
> When I use retraction then I have normal network learning with my unitary 
> matrix being unitary:
> 
> Learning rate is 0.0010000000474974513
> How much of unitarity U holds?
> U norm 11.31370849898476
> Delta between U^*U and I: 1.3531515725470711e-14
> Iteration: 0
> mse: 1.4083181893131802
> 
> 
> How much of unitarity U holds?
> U norm 11.313708498984749
> Delta between U^*U and I: 1.397582563930589e-14
> Iteration: 1
> mse: 1.8754714993993804
> 
> 
> How much of unitarity U holds?
> U norm 11.313708498984754
> Delta between U^*U and I: 1.3569776678228608e-14
> Iteration: 2
> mse: 1.4130059596425537
> 
>  But if I switch to exponential mapping, eventually after first update my 
> matrix become zero:
> 
> Learning rate is 0.0010000000474974513
> How much of unitarity U holds?
> U norm 11.31370849898476
> Delta between U^*U and I: 1.3531515725470711e-14
> Iteration: 0
> mse: 1.5589767423436534
> 
> 
> How much of unitarity U holds?
> U norm 0.0
> Delta between U^*U and I: 11.313708498984761
> Iteration: 1
> mse: 1.3212516523445936
> 
> 
> How much of unitarity U holds?
> U norm 0.0
> Delta between U^*U and I: 11.313708498984761
> Iteration: 2
> mse: 1.222838183394413
> 
> I suggest that there is a problem in execution of Exponential Op because 
> all other parts are working well. Initially I wrote ComplexExpm Op without 
> grad so when I saw a problem I was suggest that I need a grad. But even 
> after I write grad, the problem isn't gone.
> 
> Also as I remember when using custom Ops with numpy code inside one can 
> debug this code at runtime (because Op execute the same numpy code). At now 
> even if I make breakpoints inside ComplexExpm, program didn't stops at them.
> 
> Can you suggest any mechanisms of Theano that makes this parameter erasing 
> happen?
> 
> 
> 
> 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to theano-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Theano RNN. Parameter go to zero after update.

Reply via email to