[theano-users] Theano RNN. Parameter go to zero after update.

const . belev Mon, 03 Oct 2016 01:19:57 -0700

Hello!

I have a problem when implementing RNN in theano. I have unitary hidden 
matrix and I manage to stay on unitary matrix manifold at each update. I 
modify stanard sgd procedure s.t. for ordinary parameter it performs 
gradient step update
updates[param] = param - learning_rate * grad




and for unitary matrix update defined as follows:
if manifold._exponential:
   param_updates = manifold.exp(pp, manifold.lincomb(pp, -learning_rate, 
manifold.proj(pp, gg)))
else:
   param_updates = manifold.retr(pp, manifold.lincomb(pp, -learning_rate, 
manifold.proj(pp, gg)))

Here exp and retr are exponential mapping and retraction on unitary matrix 
manifold

http://www.manopt.org/reference/manopt/manifolds/stiefel/stiefelcomplexfactory.html


<http://www.manopt.org/reference/manopt/manifolds/stiefel/stiefelcomplexfactory.html>As
 
I see, theano currently doesn't support complex numbers so I work with 
tensors of shape (2, n, n) with first dimension corrensponding to real part 
and second to imaginary.

Here are exp and retr functions:

    def retr(self, X, U):
        YR, YI = self.frac(X + U)


        Q, R = T.nlinalg.qr(YR + 1j * YI)
        Y = T.stack([Q.real, Q.imag])
        return Y


    def concat(self, arrays, axis):
        return T.concatenate(arrays, axis=axis+1)


    def exp(self, X, U):
        # The exponential (in the sense of Lie group theory) of a tangent
        # vector U at X.
        first = self.concat([X, U], axis=1)
        XhU = self.complex_dot(self.hconj(X), U)
        second = complex_expm(self.concat([self.concat([XhU, -self.
complex_dot(self.hconj(U), U)], 1),
                                  self.concat([self.identity(), XhU], 1)], 0
))
        third = self.concat([complex_expm(-XhU), self.zeros()], 0)
        exponential = self.complex_dot(self.complex_dot(first, second), 
third)
        return exponential


Here frac method is just getting a pair of A[0, :, :] and A[1, :, :] from 
A. Complex dot performs product of two matrices as a complex matrices, 
hconj performs complex conjugation.

Note that exponential mapping involves exponent of complex matrix. So I 
write Op based on tensor.slinalg.expm

https://github.com/Theano/theano/blob/8cc53673d7aa37873a753b7f847af4a185298f4d/theano/tensor/slinalg.py#L356-L425

Initially I wrote code without grad because I didn't unrestand how to 
transform real case gradients to complex. But then I realize that this 
gradient can be viewed as a directional derivative of expm at direction 
gA.T (or hconj(gA) for complex case). This directional derivative can be 
retrieved from matrix exponent of 2nx2n, matrix. So for at least small n 
this method pass finide difference test.
Here is the code:

class ComplexExpm(Op):
    """
    Compute the matrix exponential of a square array.
    """


    __props__ = ()


    def make_node(self, A):
        assert imported_scipy, (
            "Scipy not available. Scipy is needed for the Expm op")


        A = as_tensor_variable(A)
        assert A.ndim == 3
        expm = theano.tensor.tensor3(dtype=A.dtype)
        return Apply(self, [A, ], [expm, ])


    def perform(self, node, inputs, outputs):
        (A,) = inputs
        (expm,) = outputs
        temp = scipy.linalg.expm(A[0, :, :] + 1j * A[1, :, :])
        expm[0] = np.stack([temp.real, temp.imag])


    def grad(self, inputs, outputs):
        (A,) = inputs
        (g_out,) = outputs
        return [ComplexExpmGrad()(A, g_out)]


    def infer_shape(self, node, shapes):
        return [shapes[0]]




def _hconj_internal(x):
    x_hconj = np.transpose(x, axes=(0, 2, 1)).copy()
    x_hconj[1, :, :] = -x_hconj[1, :, :]
    return x_hconj




class ComplexExpmGrad(Op):
    """
    Gradient of the matrix exponential of a square array.
    """


    __props__ = ()


    def make_node(self, A, gw):
        assert imported_scipy, (
            "Scipy not available. Scipy is needed for the Expm op")
        A = as_tensor_variable(A)
        assert A.ndim == 3
        out = theano.tensor.tensor3(dtype=A.dtype)
        return Apply(self, [A, gw], [out, ])


    def infer_shape(self, node, shapes):
        return [shapes[0]]


    def perform(self, node, inputs, outputs):
        (A, gA) = inputs
        (out,) = outputs
        n = A.shape[1]
        Z = np.zeros_like(A)
        res_mat = np.concatenate([np.concatenate([A, _hconj_internal(gA)], 
axis=2),
                                  np.concatenate([Z, A], axis=2)], axis=1)
        res = scipy.linalg.expm(res_mat[0, :, :] + 1j * res_mat[1, :, :])
        out_mat = np.stack([res[:n, n:].real, res[:n, n:].imag], axis=0)
        out[0] = _hconj_internal(out_mat).astype(A.dtype)


complex_expm = ComplexExpm()

Here becomes the problem. I study a paper about complex RNN and use they 
experimental setup with models. My model is just their RNN with their 
unitarity replaced with mine.

https://github.com/amarshah/complex_RNN

When I use retraction then I have normal network learning with my unitary 
matrix being unitary:

Learning rate is 0.0010000000474974513
How much of unitarity U holds?
U norm 11.31370849898476
Delta between U^*U and I: 1.3531515725470711e-14
Iteration: 0
mse: 1.4083181893131802


How much of unitarity U holds?
U norm 11.313708498984749
Delta between U^*U and I: 1.397582563930589e-14
Iteration: 1
mse: 1.8754714993993804


How much of unitarity U holds?
U norm 11.313708498984754
Delta between U^*U and I: 1.3569776678228608e-14
Iteration: 2
mse: 1.4130059596425537

 But if I switch to exponential mapping, eventually after first update my 
matrix become zero:

Learning rate is 0.0010000000474974513
How much of unitarity U holds?
U norm 11.31370849898476
Delta between U^*U and I: 1.3531515725470711e-14
Iteration: 0
mse: 1.5589767423436534


How much of unitarity U holds?
U norm 0.0
Delta between U^*U and I: 11.313708498984761
Iteration: 1
mse: 1.3212516523445936


How much of unitarity U holds?
U norm 0.0
Delta between U^*U and I: 11.313708498984761
Iteration: 2
mse: 1.222838183394413

I suggest that there is a problem in execution of Exponential Op because 
all other parts are working well. Initially I wrote ComplexExpm Op without 
grad so when I saw a problem I was suggest that I need a grad. But even 
after I write grad, the problem isn't gone.

Also as I remember when using custom Ops with numpy code inside one can 
debug this code at runtime (because Op execute the same numpy code). At now 
even if I make breakpoints inside ComplexExpm, program didn't stops at them.

Can you suggest any mechanisms of Theano that makes this parameter erasing 
happen?





-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Theano RNN. Parameter go to zero after update.

Reply via email to