Hello, I'm trying to see if I can implement BPTT(2*h*, *h*) as defined in Williams and Peng (1990; doi:10.1162/neco.1990.2.4.490) using theano.scan.
It will be used for a long sequence of thousands of steps, where calculating gradient only once at the end of the sequence (which is the way BPTT is usually implemented in existing Theano codes I can find on the web) would not be very feasible. Instead, BPTT(2*h*, *h*) would involve, starting at time *t*: - Forward propagate *h* steps to time (*t* + *h*) - Calculate gradient by looking back 2*h* steps, using inputs and states in interval (*t* - *h*) to (*t* + *h*) - Update parameters, then repeat until the end of sequence My question is: - Is it possible to pass truncate_gradient = 2 * n_steps to a theano.scan loop, and will it produce the behavior described above? I glanced at the relevant parts in scan_op.py in Theano source code, but my (not very well educated) impression was that it might not be the case - If that cannot be done, how would one implement such a mechanism using theano.scan? My guess would be that theano.scan needs to be done with n_steps = 2 * *h* steps, but I roll back the inputs and states by *h* steps after doing one gradient calc & parameter update? - Is there a public Theano code that implements this that I can refer to? Thank you in advance for any input! -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
