Scan can perform the accumulation of partial gradients at each timestep,
so it does not keep track of each of the partial gradients (even though
you may have access to a cumulative sum by digging through its output
buffers).

The simplest way would be to define a dummy sequence z full of zeros,
of shape (n_steps, *W.shape). Then, each time you use W in the step
function, use (W + z_t). That way, if you get the gradient of your cost
wrt z, it should be equal to the gradient of the cost wrt W at each
timestep.

On Fri, Oct 07, 2016, John Moore wrote:
> Somewhat equivalently, how could I take each of the gradient updates 
> instead of scan just summing all gradient updates automatically?
> 
> On Friday, October 7, 2016 at 4:16:46 PM UTC-4, John Moore wrote:
> >
> > Hi All, 
> >
> > My understanding of BPTT is to unfold the network, take the gradients 
> > through time, then average the weight updates.
> > How do I obtain the weight updates at each timestep? I know that scan 
> > automatically performs BPTT for you, so that it gives you only one weight 
> > update. 
> >
> > Any insight appreciated.
> >
> > Thanks,
> > John
> >
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to