Yes, you're right - I'm not sure where that definition of momentum came 
from, but it is a bit strange.  I've updated it to the more common 
definition.  Thanks for pointing it out.

On Thursday, September 8, 2016 at 10:19:16 AM UTC-7, Michael Klachko wrote:
>
>
> http://nbviewer.jupyter.org/github/craffel/theano-tutorial/blob/master/Theano%20Tutorial.ipynb
>  
> (section 24):
>
>  updates = []
>
>  for param in params:
>
>  param_update = theano.shared(param.get_value()*0., broadcastable=param.
> broadcastable)
>
>  updates.append((param, param - learning_rate*param_update))
>  updates.append((param_update, momentum*param_update + (1. - momentum)*T.
> grad(cost, param)))
>
>
> The last two lines appear to be different from the classical momentum 
> definition:
>
> update = momentum * previous_update - learning_rate * gradient
> W_new = W_old + update
>
> I tested the implementation from the tutorial, and no matter what value of 
> momentum I use, the results are very similar. On the other hand, if I 
> implement the momentum as:
>
> updates.append((param, param + param_update))     
> updates.append((param_update, momentum * param_update - learning_rate * T.
> grad(cost, param)))
>
>
> It works as expected (faster convergence as momentum increases). 
>
> Can anyone explain the implementation in the tutorial? Where did it come 
> from? 
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to