On Sun, May 07, 2017, Vena Li wrote: > I have a question about computing gradient. In general, I can use the > theano.gradient.grad to compute the gradient for the whole graph given the > cost. In my situation, I cannot compute the cost, but I know the gradient > of the weights last layer. Can I still automatically compute the gradient > of previous layers? My understanding is that I should still use the same > function, but put the cost as None and give the ones I know as known_grads.
You would probably need the gradient wrt the output of the last layer (the activations, or pre-activation values), rather than the parameters. In general, the gradient wrt the bias will correspond to the gradient wrt the pre-activation value. > > I tried a small example, where > > The original gradient is the following > > original_all_grad = T.grad(cost=loss_train, wrt=params) > > These are the parameters > params = lasagne.layers.get_all_params(model, trainable=True) > last_layer_params =model.get_params(trainable=True) > other_param = params[0:-1] > > I computed the last layer gradient still using the same cost, although later > this would change. > known_grad = T.grad(loss_train, last_layer_params) > > Compute the gradient with respect to the known gradient. > output = lasagne.layers.get_output(model) > dic = OrderedDict([(output, known_grad[0])]) > all_grad = T.grad(cost=None,wrt=other_param, known_grads=dic) > > The surprising result is that the all_grad and original_all_grad will not be > identical values for other params. I am not sure what I did wrong here. > > I am really gratefully for any help. My guess is that you gave the gradient wrt the bias as a "known gradient" wrt the post-activation output, but it corresponds to the pre-activation value. If you want to use the post-activation output, then you can do something like this (I'm not sure if it will work exactly like that since I'm not sure how Lasagne builds its graph): known_grad = T.grad(loss_train, output) dic = OrderedDict([(output, known_grad)]) all_grad = T.grad(cost=None, wrt=..., known_grads=dic) or, if you want to use the pre-activation one, you would have to get the Theano variable that is the input of the activation function of the last layer, say we call it preact_output, and then: all_grad = T.grad(cost=None, wrt=..., known_grads=[(preact_output, known_grad)] > > Vena > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- Pascal -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
