Hi Daniel:
This example is very useful to me. What I need is the gradient and hessian
from the one node of certain layer wrt the input. But when I make the
following modifications on the cost 'c', I meet some issues. First, it
seems that the output of each layer of nnet is not a tensor vector but a
tensor.matrix. Therefore, in define the new cost 'c'. I have to use
y[0][0], not y[0]. Second, the reason why I use y[0][0]+0 is to transfer
the 1*1 tensor variable into scalar
so that I can use tt.grad(). This is wired to me somehow, do you have
better method ?
Thanks a lot !
import theano
import theano.tensor as tt
import theano.gradient
input_size = 4
hidden_size = 3
output_size = 2
Wh_flat = theano.shared(numpy.random.randn(input_size *
hidden_size).astype(theano.config.floatX))
bh = theano.shared(numpy.zeros(hidden_size, dtype=theano.config.floatX))
Wy_flat = theano.shared(numpy.random.randn(hidden_size *
output_size).astype(theano.config.floatX))
by = theano.shared(numpy.zeros(output_size, dtype=theano.config.floatX))
parameters = [Wh_flat, bh, Wy_flat, by]
Wh = Wh_flat.reshape((input_size, hidden_size))
Wy = Wy_flat.reshape((hidden_size, output_size))
x = tt.matrix(dtype=theano.config.floatX)
z = tt.matrix(dtype=theano.config.floatX)
h = tt.nnet.sigmoid(theano.dot(x, Wh) + bh)
y = tt.nnet.softmax(theano.dot(h, Wy) + by)
# Here modified
c = y[0][0]+0
gs = theano.grad(c, parameters)
hs = theano.gradient.hessian(c, parameters)
# Here modified
f = theano.function([x, z], [y] + gs + hs)
batch_size = 5
print f(numpy.random.randn(batch_size,
input_size).astype(theano.config.floatX),
numpy.random.randn(batch_size,
output_size).astype(theano.config.floatX))
On Monday, July 6, 2015 at 11:45:29 AM UTC-4, Daniel Renshaw wrote:
>
> This is discussed in a previous thread:
> https://groups.google.com/forum/#!msg/theano-users/ZGNuroagymI/285lxgVgb3oJ
>
> Here is a complete example of using theano.gradient.hessian with matrix
> shaped parameters.
>
> import numpy
>
> import theano
> import theano.tensor as tt
> import theano.gradient
>
> input_size = 4
> hidden_size = 3
> output_size = 2
> Wh_flat = theano.shared(numpy.random.randn(input_size *
> hidden_size).astype(theano.config.floatX))
> bh = theano.shared(numpy.zeros(hidden_size, dtype=theano.config.floatX))
> Wy_flat = theano.shared(numpy.random.randn(hidden_size *
> output_size).astype(theano.config.floatX))
> by = theano.shared(numpy.zeros(output_size, dtype=theano.config.floatX))
> parameters = [Wh_flat, bh, Wy_flat, by]
> Wh = Wh_flat.reshape((input_size, hidden_size))
> Wy = Wy_flat.reshape((hidden_size, output_size))
> x = tt.matrix(dtype=theano.config.floatX)
> z = tt.matrix(dtype=theano.config.floatX)
> h = tt.nnet.sigmoid(theano.dot(x, Wh) + bh)
> y = tt.nnet.softmax(theano.dot(h, Wy) + by)
> c = tt.nnet.categorical_crossentropy(y, z).mean()
> gs = theano.grad(c, parameters)
> hs = theano.gradient.hessian(c, parameters)
> f = theano.function([x, z], [y, c] + gs + hs)
> batch_size = 5
> print f(numpy.random.randn(batch_size,
> input_size).astype(theano.config.floatX),
> numpy.random.randn(batch_size,
> output_size).astype(theano.config.floatX))
>
> Daniel
>
>
> On 6 July 2015 at 15:12, frans09 <[email protected] <javascript:>> wrote:
>
>> Hi all,
>>
>> It is a fairly complicated and specific question, but I haven't found a
>> way to do this and I wonder if anyone has experience with this:
>> calculating the hessian of a neural network with respect to the
>> parameters.
>>
>> The main thing that makes it so difficult, is that the parameters (the
>> weights) are saved in matrix arrays shared variables and the biases in
>> vector arrays shared variables, one for each layer. Hence, for every layer
>> I have:
>>
>> - one matrix (shared variable) with the weights of all connections
>> - one vector (shared variable) with the biases for this layer
>>
>> Thus, if I calculate the gradient with respect to any of these shared
>> variables, I just get back an array of the same shape whose elements
>> represent the gradient of the loss function with respect to the weight of a
>> single connection. To update the variable with gradient descent, I can then
>> take the shared variable and subtract the learning rate times the gradient.
>> Thus, the parameters are updated by taking the arrays as a whole and then
>> doing the update for each shared variable.
>>
>> Hence, the gradient looks different as usual, and is not a vector.
>> However, now how can we calculate the Hessian? I tried
>> theano.gradient.hessian(), however, this obviously needs a vector as input,
>> i.e. a long vector that has all parameters stacked in it. Such a vector
>> could be obtained by flattening and concatenating all matrices and vectors
>> that represent the weights and biases per layer, however, when I try this
>> and calculate the hessian, this vector is not part of the symbolic graph of
>> the loss function, since the symbolic graph uses the matrices and vectors
>> to calculate the loss by using matrix multiplication.
>>
>> I realize it might be possible to alter the symbolic graph and use a long
>> vector of all parameters to calculate the loss, however, this would be very
>> hard to set up firstly, and secondly would take away the benefits of matrix
>> multiplication that makes it much faster to train these networks, thus
>> seems not a good idea to me.
>>
>> Furthermore, once you have the hessian (which would be a matrix with
>> dimensions [total_number_of_parameters, total_number_of_parameters]), then
>> how could this be used to perform an update using Newton's method? Because
>> the gradient is not in the usual vector-shape, once cannot simply multiply
>> the inverse of the hessian with the gradient, as should be done in Newton's
>> method. how would this work?
>>
>> I wonder if anyone has experience doing this, or whether there is a
>> common (efficient) way to do this for Neural Networks?
>>
>> Thanks in advance!
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.