2) Correction: minibatch_size * activations in each layer instead of minibatch_size * total parameters
On Friday, August 5, 2016 at 9:42:25 AM UTC+5:30, [email protected] wrote: > > 1) In the MLP described in example, I realize that the dot product is > taken between between all the examples in the minibatch and the weight > matrix. Before noting was being done in the example, I tried accumulating > the gradient over each example in a given minibatch and repeat the same for > every minibatch, But, I'm unable to accumulate the gradients using scan. > The code is as below: > > Hidden Layer class: > import theano.tensor as t > from theano import shared > import numpy as np > from theano.tensor.nnet.nnet import sigmoid > class HiddenLayer: > def __init__(self,inputs,dim,prev_dim): > self.weights=shared(np.random.randn(dim,prev_dim), borrow= > True) > self.biases=shared(np.random.randn(dim,1), borrow=True) > self.zs=self.weights.dot(inputs)+self.biases > self.activations=sigmoid(self.zs) > > Main file: > import theano.tensor as t > from theano import function,scan,shared > import hll > import numpy as np > import random > import sys > import numpy as np > import pickle > import mnist_loader > > layers=[3,30,2] > inputs=t.dmatrix('inputs') # 784 x 1 > inputs_=inputs > outputs=t.dmatrix('outputs') # 10 x 1 > hlls=[] > num_layers=len(layers) > num_hidden_layers=num_layers-1 > > train_, validation, test=mnist_loader.load_data_wrapper() > > inps_=[shared(t_[0]) for t_ in train_] # 784 x 1 > outs_=[shared(t_[1]) for t_ in train_] # 10 x 1 > > inps=t.dtensor3() # list of 784 x 1 > opts=t.dtensor3() # list of 10 x 1 > bs=10 > > #Building the Graph > for ind,l in enumerate(layers[1:]): > hh=hll.HiddenLayer(inputs,l,layers[(ind)]) > hlls.append(hh) > inputs=hh.activations > > params=[h.weights for h in hlls] > params+=[h.biases for h in hlls] > > cost=(-0.5 * ((outputs-inputs) ** 2)).sum() #MSE > gradients=[t.grad(cost,param) for param in params] > > fforward=function([inputs_], inputs, name='fforward') > costt=function([outputs, inputs_],cost) #notice inputs_ not inputs > gradientss=function([outputs,inputs_], gradients) > > '''def feedforward(ip): > return fforward(ip) > > #MSE > def cost_func(op,activation): > return cost(op,activation) > > > #acc_grad=[shared(np.zeros(c,p)) for c,p in zip(layers[1:], layers[:-1])] > > def grad(ip,op): > return gradientss(op,ip)''' > > > batch_grads,dumb=scan(lambda ip,op: gradientss, outputs_info=None, > sequences=[inps, opts]) #minibatch gradient > > But I end up getting this error message: > ValueError: The return value of your scan lambda expression may only be > made of lists, tuples, or dictionaries containing Theano variables (or > `theano.scan_module.until` objects for conditions). In particular if you > need to use constant values, you can use `tensor.constant` to turn them > into Theano variables. > > tensor.grad should return a list in which case the error shouldn't have > arose. Please guide me. > > > 2) The benefit of taking a dot product with all the examples in the > minibatch and weights means calculating the minibatch gradients in one > shot. But, how will it scale up as the network size increases. Say a > network having huge number of parameters is being used. By taking the > entire minibatch at once, we would have minibatch_size * total parameters > and might actually run out of memory. How is it being handled? > > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
