[theano-users] Re: Gradient Problem (always 0)

Pascal Lamblin Fri, 30 Jun 2017 15:59:06 -0700

I'm assuming m and n were defined as T.vector(), and that the last line of 
the "def rnn(...)" functions is actually "return x_t, r_t", is that correct?


Do you have a non-zero gradient for Wrec?
Can you monitor the value of theano.grad(cost, Wrec).sum() is?
Normally, the sum of the gradient wrt Wrec should be equal to the gradient 
wrt dot(u, v). So if the gradient wrt Wrec is not zero everywhere, but its 
sum is zero, then it would explain that result.
If we manually backprop, then we can see that the gradient of the cost wrt 
u is equivalent to grad(cost, Wrec).sum() * v (and the gradient wrt v 
should be equivalent to grad(cost, Wrec).sum() * u). Can you monitor those 
values?


On Wednesday, June 28, 2017 at 7:12:06 PM UTC-4, Mohamed Akrout wrote:
>
> Hi all,
>
> I am running a neuroscience with an recurrent neural network model with 
> Theano:
>
>
>
> def rnn(u_t, x_tm1, r_tm1, Wrec):
>          x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec + 
> u_t[:,Nin:]) )
>          r_t = f_hidden(x_t)
>
>
> then I define the scan function to iterate at each time step iteration
>
> [x, r], _ = theano.scan(fn=rnn,
>                                     outputs_info=[x0_, f_hidden(x0_)],
>                                     sequences=u,
>                                     non_sequences=[Wrec])
>
> Wrec and brec are learnt by stochastic gradient descent: g = T.grad(cost , 
> [Wrec, brec])
>
> where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout])) with 
> z = f_output(T.dot(r, Wout_.T) + bout )
>
> Until now, everything works good.
>
>
>
> Now I want to add two new vectors, let's call them u and v so that the 
> initial rnn function becomes:
>
>
> def rnn(u_t, x_tm1, r_tm1, Wrec, *u, v*):
>          x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u, 
> v)* ) + brec + u_t[:,Nin:]) )
>          r_t = f_hidden(x_t)
>
> [x, r], _ = theano.scan(fn=rnn,
>                                     outputs_info=[x0_, f_hidden(x0_)],
>                                     sequences=u,
>                                     non_sequences=[Wrec,* m, n*])
>
> m and n are the variables corresponding to u and v in the main function.
>
> and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
>
> I am blocked since 2 weeks now on this problem. I verified that the values 
> are not integer by using dtype=theano.config.floatX every where in the 
> definition of the variables.
>
> As you can see the link between the cost and m (or n) is: the cost 
> function depends on  z, and z depends on r and r is one of the outputs of 
> the rnn function that uses m and n in the equation.
>
> Do you have any ideas why this does not work ?
>
> Any idea is welcome. I hope I can unblock this problem soon.
> Thank you!
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Gradient Problem (always 0)

Reply via email to