Re: [theano-users] Question about using T.grad when the cost cannot be provided, but only some gradient

Pascal Lamblin Wed, 10 May 2017 18:15:48 -0700

On Sun, May 07, 2017, Vena Li wrote:
> I have a question about computing gradient. In general, I can use the 
> theano.gradient.grad to compute the gradient for the whole graph given the 
> cost. In my situation, I cannot compute the cost, but I know the gradient 
> of the weights last layer. Can I still automatically compute the gradient 
> of previous layers? My understanding is that I should still use the same 
> function, but put the cost as None and give the ones I know as known_grads.


You would probably need the gradient wrt the output of the last layer
(the activations, or pre-activation values), rather than the parameters.
In general, the gradient wrt the bias will correspond to the gradient
wrt the pre-activation value.

> 
> I tried a small example, where
> 
> The original gradient is the following
> 
> original_all_grad = T.grad(cost=loss_train, wrt=params)
> 
> These are the parameters
> params = lasagne.layers.get_all_params(model, trainable=True)
> last_layer_params =model.get_params(trainable=True)
> other_param = params[0:-1]
> 
> I computed the last layer gradient still using the same cost, although later 
> this would change.
> known_grad = T.grad(loss_train, last_layer_params)
> 
> Compute the gradient with respect to the known gradient.
> output = lasagne.layers.get_output(model)
> dic = OrderedDict([(output, known_grad[0])])
> all_grad = T.grad(cost=None,wrt=other_param, known_grads=dic)
> 
> The surprising result is that the all_grad and original_all_grad will not be 
> identical values for other params. I am not sure what I did wrong here.
> 
> I am really gratefully for any help.

My guess is that you gave the gradient wrt the bias as a "known
gradient" wrt the post-activation output, but it corresponds to the
pre-activation value.

If you want to use the post-activation output, then you can do something
like this (I'm not sure if it will work exactly like that since I'm not
sure how Lasagne builds its graph):

known_grad = T.grad(loss_train, output)
dic = OrderedDict([(output, known_grad)])
all_grad = T.grad(cost=None, wrt=..., known_grads=dic)

or, if you want to use the pre-activation one, you would have to get the
Theano variable that is the input of the activation function of the last
layer, say we call it preact_output, and then:

all_grad = T.grad(cost=None, wrt=..., known_grads=[(preact_output, known_grad)]

> 
> Vena
> 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Question about using T.grad when the cost cannot be provided, but only some gradient

Reply via email to