[theano-users] BPTT with truncate_gradient longer than n_steps

Hosang Yoon Mon, 05 Dec 2016 16:20:25 -0800

Hello,

I'm trying to see if I can implement BPTT(2*h*, *h*) as defined in Williams 
and Peng (1990; doi:10.1162/neco.1990.2.4.490) using theano.scan.


It will be used for a long sequence of thousands of steps, where 
calculating gradient only once at the end of the sequence (which is the way 
BPTT is usually implemented in existing Theano codes I can find on the web) 
would not be very feasible.

Instead, BPTT(2*h*, *h*) would involve, starting at time *t*:

- Forward propagate *h* steps to time (*t* + *h*)
- Calculate gradient by looking back 2*h* steps, using inputs and states in 
interval (*t* - *h*) to (*t* + *h*)
- Update parameters, then repeat until the end of sequence

My question is:

- Is it possible to pass truncate_gradient = 2 * n_steps to a theano.scan 
loop, and will it produce the behavior described above?

I glanced at the relevant parts in scan_op.py in Theano source code, but my 
(not very well educated) impression was that it might not be the case

- If that cannot be done, how would one implement such a mechanism using 
theano.scan?

My guess would be that theano.scan needs to be done with n_steps = 2 * *h* 
steps, but I roll back the inputs and states by *h* steps after doing one 
gradient calc & parameter update?

- Is there a public Theano code that implements this that I can refer to?

Thank you in advance for any input!

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] BPTT with truncate_gradient longer than n_steps

Reply via email to