Also, if the final cost has some direct contribution from the
parameters, that do not go through the output (for instance, weight
norm penalty), then this will give different results, unless that
contribution to the global cost is passed as the cost (instead of None).

On Thu, May 11, 2017, Pascal Lamblin wrote:
> On Sun, May 07, 2017, Vena Li wrote:
> > I have a question about computing gradient. In general, I can use the 
> > theano.gradient.grad to compute the gradient for the whole graph given the 
> > cost. In my situation, I cannot compute the cost, but I know the gradient 
> > of the weights last layer. Can I still automatically compute the gradient 
> > of previous layers? My understanding is that I should still use the same 
> > function, but put the cost as None and give the ones I know as known_grads.
> 
> You would probably need the gradient wrt the output of the last layer
> (the activations, or pre-activation values), rather than the parameters.
> In general, the gradient wrt the bias will correspond to the gradient
> wrt the pre-activation value.
> 
> > 
> > I tried a small example, where
> > 
> > The original gradient is the following
> > 
> > original_all_grad = T.grad(cost=loss_train, wrt=params)
> > 
> > These are the parameters
> > params = lasagne.layers.get_all_params(model, trainable=True)
> > last_layer_params =model.get_params(trainable=True)
> > other_param = params[0:-1]
> > 
> > I computed the last layer gradient still using the same cost, although 
> > later this would change.
> > known_grad = T.grad(loss_train, last_layer_params)
> > 
> > Compute the gradient with respect to the known gradient.
> > output = lasagne.layers.get_output(model)
> > dic = OrderedDict([(output, known_grad[0])])
> > all_grad = T.grad(cost=None,wrt=other_param, known_grads=dic)
> > 
> > The surprising result is that the all_grad and original_all_grad will not 
> > be identical values for other params. I am not sure what I did wrong here.
> > 
> > I am really gratefully for any help.
> 
> My guess is that you gave the gradient wrt the bias as a "known
> gradient" wrt the post-activation output, but it corresponds to the
> pre-activation value.
> 
> If you want to use the post-activation output, then you can do something
> like this (I'm not sure if it will work exactly like that since I'm not
> sure how Lasagne builds its graph):
> 
> known_grad = T.grad(loss_train, output)
> dic = OrderedDict([(output, known_grad)])
> all_grad = T.grad(cost=None, wrt=..., known_grads=dic)
> 
> or, if you want to use the pre-activation one, you would have to get the
> Theano variable that is the input of the activation function of the last
> layer, say we call it preact_output, and then:
> 
> all_grad = T.grad(cost=None, wrt=..., known_grads=[(preact_output, 
> known_grad)]
> 
> > 
> > Vena
> > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google Groups 
> > "theano-users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to theano-users+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Pascal
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to theano-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to