Re: [theano-users] Calculating the hessian in a neural network (Newton's method for optimizing a neural network)

廖震宇 Thu, 17 Nov 2016 08:28:22 -0800

Hi Daniel: 

This example is very useful to me. What I need is the gradient and hessian 
from the one node of certain layer wrt the input.  But when I make the 
following modifications on the cost 'c', I meet some issues. First, it 
seems that the output of each layer of nnet is not a tensor vector but a 
tensor.matrix. Therefore, in define the new cost 'c'. I have to use 
y[0][0], not y[0].  Second, the reason why I use y[0][0]+0 is to transfer 
the 1*1 tensor variable into scalar 
so that I can use tt.grad(). This is wired to me somehow, do you have 
better method ?


Thanks a lot ! 

import theano
import theano.tensor as tt
import theano.gradient

input_size = 4
hidden_size = 3
output_size = 2
Wh_flat = theano.shared(numpy.random.randn(input_size * 
hidden_size).astype(theano.config.floatX))
bh = theano.shared(numpy.zeros(hidden_size, dtype=theano.config.floatX))
Wy_flat = theano.shared(numpy.random.randn(hidden_size * 
output_size).astype(theano.config.floatX))
by = theano.shared(numpy.zeros(output_size, dtype=theano.config.floatX))
parameters = [Wh_flat, bh, Wy_flat, by]
Wh = Wh_flat.reshape((input_size, hidden_size))
Wy = Wy_flat.reshape((hidden_size, output_size))
x = tt.matrix(dtype=theano.config.floatX)
z = tt.matrix(dtype=theano.config.floatX)
h = tt.nnet.sigmoid(theano.dot(x, Wh) + bh)
y = tt.nnet.softmax(theano.dot(h, Wy) + by)
# Here modified
c = y[0][0]+0
gs = theano.grad(c, parameters)
hs = theano.gradient.hessian(c, parameters)

# Here modified
f = theano.function([x, z], [y] + gs + hs)
batch_size = 5
print f(numpy.random.randn(batch_size, 
input_size).astype(theano.config.floatX),
        numpy.random.randn(batch_size, 
output_size).astype(theano.config.floatX))


On Monday, July 6, 2015 at 11:45:29 AM UTC-4, Daniel Renshaw wrote:
>
> This is discussed in a previous thread: 
> https://groups.google.com/forum/#!msg/theano-users/ZGNuroagymI/285lxgVgb3oJ
>
> Here is a complete example of using theano.gradient.hessian with matrix 
> shaped parameters.
>
> import numpy
>
> import theano
> import theano.tensor as tt
> import theano.gradient
>
> input_size = 4
> hidden_size = 3
> output_size = 2
> Wh_flat = theano.shared(numpy.random.randn(input_size * 
> hidden_size).astype(theano.config.floatX))
> bh = theano.shared(numpy.zeros(hidden_size, dtype=theano.config.floatX))
> Wy_flat = theano.shared(numpy.random.randn(hidden_size * 
> output_size).astype(theano.config.floatX))
> by = theano.shared(numpy.zeros(output_size, dtype=theano.config.floatX))
> parameters = [Wh_flat, bh, Wy_flat, by]
> Wh = Wh_flat.reshape((input_size, hidden_size))
> Wy = Wy_flat.reshape((hidden_size, output_size))
> x = tt.matrix(dtype=theano.config.floatX)
> z = tt.matrix(dtype=theano.config.floatX)
> h = tt.nnet.sigmoid(theano.dot(x, Wh) + bh)
> y = tt.nnet.softmax(theano.dot(h, Wy) + by)
> c = tt.nnet.categorical_crossentropy(y, z).mean()
> gs = theano.grad(c, parameters)
> hs = theano.gradient.hessian(c, parameters)
> f = theano.function([x, z], [y, c] + gs + hs)
> batch_size = 5
> print f(numpy.random.randn(batch_size, 
> input_size).astype(theano.config.floatX),
>         numpy.random.randn(batch_size, 
> output_size).astype(theano.config.floatX))
>
> Daniel
>
>
> On 6 July 2015 at 15:12, frans09 <[email protected] <javascript:>> wrote:
>
>> Hi all, 
>>
>> It is a fairly complicated and specific question, but I haven't found a 
>> way to do this and I wonder if anyone has experience with this:
>> calculating the hessian of a neural network with respect to the 
>> parameters. 
>>
>> The main thing that makes it so difficult, is that the parameters (the 
>> weights) are saved in matrix arrays shared variables and the biases in 
>> vector arrays shared variables, one for each layer.  Hence, for every layer 
>> I have:
>>
>> - one matrix (shared variable) with the weights of all connections
>> - one vector (shared variable) with the biases for this layer
>>
>> Thus, if I calculate the gradient with respect to any of these shared 
>> variables, I just get back an array of the same shape whose elements 
>> represent the gradient of the loss function with respect to the weight of a 
>> single connection. To update the variable with gradient descent, I can then 
>> take the shared variable and subtract the learning rate times the gradient. 
>> Thus, the parameters are updated by taking the arrays as a whole and then 
>> doing the update for each shared variable. 
>>
>> Hence, the gradient looks different as usual, and is not a vector. 
>> However, now how can we calculate the Hessian? I tried 
>> theano.gradient.hessian(), however, this obviously needs a vector as input, 
>> i.e. a long vector that has all parameters stacked in it. Such a vector 
>> could be obtained by flattening and concatenating all matrices and vectors 
>> that represent the weights and biases per layer, however, when I try this 
>> and calculate the hessian, this vector is not part of the symbolic graph of 
>> the loss function, since the symbolic graph uses the matrices and vectors 
>> to calculate the loss by using matrix multiplication.
>>
>> I realize it might be possible to alter the symbolic graph and use a long 
>> vector of all parameters to calculate the loss, however, this would be very 
>> hard to set up firstly, and secondly would take away the benefits of matrix 
>> multiplication that makes it much faster to train these networks, thus 
>> seems not a good idea to me. 
>>
>> Furthermore, once you have the hessian (which would be a matrix with 
>> dimensions [total_number_of_parameters, total_number_of_parameters]), then 
>> how could this be used to perform an update using Newton's method? Because 
>> the gradient is not in the usual vector-shape, once cannot simply multiply 
>> the inverse of the hessian with the gradient, as should be done in Newton's 
>> method. how would this work?
>>
>> I wonder if anyone has experience doing this, or whether there is a 
>> common (efficient) way to do this for Neural Networks?
>>
>> Thanks in advance!
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Calculating the hessian in a neural network (Newton's method for optimizing a neural network)

Reply via email to