Scan can perform the accumulation of partial gradients at each timestep, so it does not keep track of each of the partial gradients (even though you may have access to a cumulative sum by digging through its output buffers).
The simplest way would be to define a dummy sequence z full of zeros, of shape (n_steps, *W.shape). Then, each time you use W in the step function, use (W + z_t). That way, if you get the gradient of your cost wrt z, it should be equal to the gradient of the cost wrt W at each timestep. On Fri, Oct 07, 2016, John Moore wrote: > Somewhat equivalently, how could I take each of the gradient updates > instead of scan just summing all gradient updates automatically? > > On Friday, October 7, 2016 at 4:16:46 PM UTC-4, John Moore wrote: > > > > Hi All, > > > > My understanding of BPTT is to unfold the network, take the gradients > > through time, then average the weight updates. > > How do I obtain the weight updates at each timestep? I know that scan > > automatically performs BPTT for you, so that it gives you only one weight > > update. > > > > Any insight appreciated. > > > > Thanks, > > John > > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- Pascal -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
