[theano-users] Re: Using scan to accumulate gradients for gradient descent

srinivasvr . edu Thu, 04 Aug 2016 21:46:00 -0700

2) Correction: minibatch_size * activations in each layer instead of 
minibatch_size * total parameters


On Friday, August 5, 2016 at 9:42:25 AM UTC+5:30, [email protected] 
wrote:
>
> 1) In the MLP described in example, I realize that the dot product is 
> taken between between all the examples in the minibatch and the weight 
> matrix. Before noting was being done in the example, I tried accumulating 
> the gradient over each example in a given minibatch and repeat the same for 
> every minibatch, But, I'm unable to accumulate the gradients using scan. 
> The code is as below:
>
> Hidden Layer class:
> import theano.tensor as t
> from theano import shared
> import numpy as np
> from theano.tensor.nnet.nnet import sigmoid
> class HiddenLayer:
>         def __init__(self,inputs,dim,prev_dim):
>                 self.weights=shared(np.random.randn(dim,prev_dim), borrow=
> True)
>                 self.biases=shared(np.random.randn(dim,1), borrow=True)
>                 self.zs=self.weights.dot(inputs)+self.biases
>                 self.activations=sigmoid(self.zs)
>
> Main file:
> import theano.tensor as t
> from theano import function,scan,shared
> import hll
> import numpy as np
> import random
> import sys
> import numpy as np
> import pickle
> import mnist_loader
>
> layers=[3,30,2]
> inputs=t.dmatrix('inputs') # 784 x 1
> inputs_=inputs
> outputs=t.dmatrix('outputs') # 10 x 1
> hlls=[]
> num_layers=len(layers)
> num_hidden_layers=num_layers-1
>
> train_, validation, test=mnist_loader.load_data_wrapper()
>
> inps_=[shared(t_[0]) for t_ in train_] # 784 x 1
> outs_=[shared(t_[1]) for t_ in train_] # 10 x 1
>
> inps=t.dtensor3() # list of 784 x 1
> opts=t.dtensor3() # list of 10 x 1
> bs=10 
>
> #Building the Graph 
> for ind,l in enumerate(layers[1:]):
>         hh=hll.HiddenLayer(inputs,l,layers[(ind)])
>         hlls.append(hh)
>         inputs=hh.activations
>
> params=[h.weights for h in hlls]
> params+=[h.biases for h in hlls]
>
> cost=(-0.5 * ((outputs-inputs) ** 2)).sum() #MSE
> gradients=[t.grad(cost,param) for param in params]
>
> fforward=function([inputs_], inputs, name='fforward')
> costt=function([outputs, inputs_],cost) #notice inputs_ not inputs
> gradientss=function([outputs,inputs_], gradients)
>
> '''def feedforward(ip):
>         return fforward(ip)
>
> #MSE
> def cost_func(op,activation):
>         return cost(op,activation)
>
>
> #acc_grad=[shared(np.zeros(c,p)) for c,p in zip(layers[1:], layers[:-1])]
>
> def grad(ip,op):
>         return gradientss(op,ip)'''
>
>
> batch_grads,dumb=scan(lambda ip,op: gradientss, outputs_info=None, 
> sequences=[inps, opts]) #minibatch gradient
>
> But I end up getting this error message:
> ValueError: The return value of your scan lambda expression may only be 
> made of lists, tuples, or dictionaries containing Theano variables (or 
> `theano.scan_module.until` objects for conditions). In particular if you 
> need to use constant values, you can use `tensor.constant` to turn them 
> into Theano variables.
>
> tensor.grad should return a list in which case the error shouldn't have 
> arose. Please guide me.
>
>
> 2) The benefit of taking a dot product with all the examples in the 
> minibatch and weights means calculating the minibatch gradients in one 
> shot. But, how will it scale up as the network size increases. Say a 
> network having huge number of parameters is being used. By taking the 
> entire minibatch at once, we would have minibatch_size * total parameters 
> and might actually run out of memory. How is it being handled?
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Using scan to accumulate gradients for gradient descent

Reply via email to