Yes, your understanding of known_grads seems to be correct. On Mon, Oct 31, 2016, [email protected] wrote: > I'm trying to implement policy gradient reinforcement learning using the > REINFORCE algorithm. I'm struggling to figure out how compute the gradient > with theano. The code I'm trying to replicate is quite similar to this > (which is for torch) > https://github.com/syyeung/frameglimpses/blob/master/model/ReinforceNormal.lua > > My model current is: > m = T.dot(weights, input) + b > m_rnd = m + mrg_stream.normal(0,0.1) > cost = T.sum(T.log(1/(2*0.1**2*PI)*T.exp(-(m_rnd-m)**2/(2*0.1**2)))) * > reward > grads = T.grad(cost,[weights,b],known_grads={m_rnd: (m_rnd-m)/0.1**2}) > > Which, if I understand known_grads correctly, will compute the gradient of > the weights/bias using (m_rnd-m)/0.01 as the gradient for m_rnd. Is my > understanding correct? I'm not asking about the REINFORCE algorithm, or my > calculation, just want to make sure that I'm understanding known_grads > correctly. > > Thanks > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout.
-- Pascal -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
