[theano-users] Using scan to accumulate gradients for gradient descent

srinivasvr . edu Thu, 04 Aug 2016 21:12:42 -0700

1) In the MLP described in example, I realize that the dot product is taken 
between between all the examples in the minibatch and the weight matrix. 
Before noting was being done in the example, I tried accumulating the 
gradient over each example in a given minibatch and repeat the same for 
every minibatch, But, I'm unable to accumulate the gradients using scan. 
The code is as below:


Hidden Layer class:
import theano.tensor as t
from theano import shared
import numpy as np
from theano.tensor.nnet.nnet import sigmoid
class HiddenLayer:
        def __init__(self,inputs,dim,prev_dim):
                self.weights=shared(np.random.randn(dim,prev_dim), borrow=
True)
                self.biases=shared(np.random.randn(dim,1), borrow=True)
                self.zs=self.weights.dot(inputs)+self.biases
                self.activations=sigmoid(self.zs)

Main file:
import theano.tensor as t
from theano import function,scan,shared
import hll
import numpy as np
import random
import sys
import numpy as np
import pickle
import mnist_loader

layers=[3,30,2]
inputs=t.dmatrix('inputs') # 784 x 1
inputs_=inputs
outputs=t.dmatrix('outputs') # 10 x 1
hlls=[]
num_layers=len(layers)
num_hidden_layers=num_layers-1

train_, validation, test=mnist_loader.load_data_wrapper()

inps_=[shared(t_[0]) for t_ in train_] # 784 x 1
outs_=[shared(t_[1]) for t_ in train_] # 10 x 1

inps=t.dtensor3() # list of 784 x 1
opts=t.dtensor3() # list of 10 x 1
bs=10 

#Building the Graph 
for ind,l in enumerate(layers[1:]):
        hh=hll.HiddenLayer(inputs,l,layers[(ind)])
        hlls.append(hh)
        inputs=hh.activations

params=[h.weights for h in hlls]
params+=[h.biases for h in hlls]

cost=(-0.5 * ((outputs-inputs) ** 2)).sum() #MSE
gradients=[t.grad(cost,param) for param in params]

fforward=function([inputs_], inputs, name='fforward')
costt=function([outputs, inputs_],cost) #notice inputs_ not inputs
gradientss=function([outputs,inputs_], gradients)

'''def feedforward(ip):
        return fforward(ip)

#MSE
def cost_func(op,activation):
        return cost(op,activation)


#acc_grad=[shared(np.zeros(c,p)) for c,p in zip(layers[1:], layers[:-1])]

def grad(ip,op):
        return gradientss(op,ip)'''


batch_grads,dumb=scan(lambda ip,op: gradientss, outputs_info=None, sequences
=[inps, opts]) #minibatch gradient

But I end up getting this error message:
ValueError: The return value of your scan lambda expression may only be 
made of lists, tuples, or dictionaries containing Theano variables (or 
`theano.scan_module.until` objects for conditions). In particular if you 
need to use constant values, you can use `tensor.constant` to turn them into 
Theano variables.

tensor.grad should return a list in which case the error shouldn't have 
arose. Please guide me.


2) The benefit of taking a dot product with all the examples in the 
minibatch and weights means calculating the minibatch gradients in one 
shot. But, how will it scale up as the network size increases. Say a 
network having huge number of parameters is being used. By taking the 
entire minibatch at once, we would have minibatch_size * total parameters 
and might actually run out of memory. How is it being handled?

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Using scan to accumulate gradients for gradient descent

Reply via email to