Yes, your understanding of known_grads seems to be correct.

On Mon, Oct 31, 2016, [email protected] wrote:
> I'm trying to implement policy gradient reinforcement learning using the 
> REINFORCE algorithm. I'm struggling to figure out how compute the gradient 
> with theano. The code I'm trying to replicate is quite similar to this 
> (which is for torch)
> https://github.com/syyeung/frameglimpses/blob/master/model/ReinforceNormal.lua
> 
> My model current is:
> m = T.dot(weights, input) + b
> m_rnd = m + mrg_stream.normal(0,0.1)
> cost = T.sum(T.log(1/(2*0.1**2*PI)*T.exp(-(m_rnd-m)**2/(2*0.1**2)))) * 
> reward
> grads = T.grad(cost,[weights,b],known_grads={m_rnd: (m_rnd-m)/0.1**2})
> 
>  Which, if I understand known_grads correctly, will compute the gradient of 
> the weights/bias using (m_rnd-m)/0.01 as the gradient for m_rnd. Is my 
> understanding correct? I'm not asking about the REINFORCE algorithm, or my 
> calculation, just want to make sure that I'm understanding known_grads 
> correctly.
> 
> Thanks
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to