Yes, you're right - I'm not sure where that definition of momentum came from, but it is a bit strange. I've updated it to the more common definition. Thanks for pointing it out.
On Thursday, September 8, 2016 at 10:19:16 AM UTC-7, Michael Klachko wrote: > > > http://nbviewer.jupyter.org/github/craffel/theano-tutorial/blob/master/Theano%20Tutorial.ipynb > > (section 24): > > updates = [] > > for param in params: > > param_update = theano.shared(param.get_value()*0., broadcastable=param. > broadcastable) > > updates.append((param, param - learning_rate*param_update)) > updates.append((param_update, momentum*param_update + (1. - momentum)*T. > grad(cost, param))) > > > The last two lines appear to be different from the classical momentum > definition: > > update = momentum * previous_update - learning_rate * gradient > W_new = W_old + update > > I tested the implementation from the tutorial, and no matter what value of > momentum I use, the results are very similar. On the other hand, if I > implement the momentum as: > > updates.append((param, param + param_update)) > updates.append((param_update, momentum * param_update - learning_rate * T. > grad(cost, param))) > > > It works as expected (faster convergence as momentum increases). > > Can anyone explain the implementation in the tutorial? Where did it come > from? > > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
