http://nbviewer.jupyter.org/github/craffel/theano-tutorial/blob/master/Theano%20Tutorial.ipynb
 
(section 24):

 updates = []

 for param in params:

 param_update = theano.shared(param.get_value()*0., broadcastable=param.
broadcastable)

 updates.append((param, param - learning_rate*param_update))
 updates.append((param_update, momentum*param_update + (1. - momentum)*T.
grad(cost, param)))


The last two lines appear to be different from the classical momentum 
definition:

update = momentum * previous_update - learning_rate * gradient
W_new = W_old + update

I tested the implementation from the tutorial, and no matter what value of 
momentum I use, the results are very similar. On the other hand, if I 
implement the momentum as:

updates.append((param, param + param_update))     
updates.append((param_update, momentum * param_update - learning_rate * T.
grad(cost, param)))


It works as expected (faster convergence as momentum increases). 

Can anyone explain the implementation in the tutorial? Where did it come 
from? 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to