I'm trying to implement policy gradient reinforcement learning using the 
REINFORCE algorithm. I'm struggling to figure out how compute the gradient 
with theano. The code I'm trying to replicate is quite similar to this 
(which is for torch)
https://github.com/syyeung/frameglimpses/blob/master/model/ReinforceNormal.lua

My model current is:
m = T.dot(weights, input) + b
m_rnd = m + mrg_stream.normal(0,0.1)
cost = T.sum(T.log(1/(2*0.1**2*PI)*T.exp(-(m_rnd-m)**2/(2*0.1**2)))) * 
reward
grads = T.grad(cost,[weights,b],known_grads={m_rnd: (m_rnd-m)/0.1**2})

 Which, if I understand known_grads correctly, will compute the gradient of 
the weights/bias using (m_rnd-m)/0.01 as the gradient for m_rnd. Is my 
understanding correct? I'm not asking about the REINFORCE algorithm, or my 
calculation, just want to make sure that I'm understanding known_grads 
correctly.

Thanks

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to