I'm trying to implement policy gradient reinforcement learning using the
REINFORCE algorithm. I'm struggling to figure out how compute the gradient
with theano. The code I'm trying to replicate is quite similar to this
(which is for torch)
https://github.com/syyeung/frameglimpses/blob/master/model/ReinforceNormal.lua
My model current is:
m = T.dot(weights, input) + b
m_rnd = m + mrg_stream.normal(0,0.1)
cost = T.sum(T.log(1/(2*0.1**2*PI)*T.exp(-(m_rnd-m)**2/(2*0.1**2)))) *
reward
grads = T.grad(cost,[weights,b],known_grads={m_rnd: (m_rnd-m)/0.1**2})
Which, if I understand known_grads correctly, will compute the gradient of
the weights/bias using (m_rnd-m)/0.01 as the gradient for m_rnd. Is my
understanding correct? I'm not asking about the REINFORCE algorithm, or my
calculation, just want to make sure that I'm understanding known_grads
correctly.
Thanks
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.