1) In the MLP described in example, I realize that the dot product is taken
between between all the examples in the minibatch and the weight matrix.
Before noting was being done in the example, I tried accumulating the
gradient over each example in a given minibatch and repeat the same for
every minibatch, But, I'm unable to accumulate the gradients using scan.
The code is as below:
Hidden Layer class:
import theano.tensor as t
from theano import shared
import numpy as np
from theano.tensor.nnet.nnet import sigmoid
class HiddenLayer:
def __init__(self,inputs,dim,prev_dim):
self.weights=shared(np.random.randn(dim,prev_dim), borrow=
True)
self.biases=shared(np.random.randn(dim,1), borrow=True)
self.zs=self.weights.dot(inputs)+self.biases
self.activations=sigmoid(self.zs)
Main file:
import theano.tensor as t
from theano import function,scan,shared
import hll
import numpy as np
import random
import sys
import numpy as np
import pickle
import mnist_loader
layers=[3,30,2]
inputs=t.dmatrix('inputs') # 784 x 1
inputs_=inputs
outputs=t.dmatrix('outputs') # 10 x 1
hlls=[]
num_layers=len(layers)
num_hidden_layers=num_layers-1
train_, validation, test=mnist_loader.load_data_wrapper()
inps_=[shared(t_[0]) for t_ in train_] # 784 x 1
outs_=[shared(t_[1]) for t_ in train_] # 10 x 1
inps=t.dtensor3() # list of 784 x 1
opts=t.dtensor3() # list of 10 x 1
bs=10
#Building the Graph
for ind,l in enumerate(layers[1:]):
hh=hll.HiddenLayer(inputs,l,layers[(ind)])
hlls.append(hh)
inputs=hh.activations
params=[h.weights for h in hlls]
params+=[h.biases for h in hlls]
cost=(-0.5 * ((outputs-inputs) ** 2)).sum() #MSE
gradients=[t.grad(cost,param) for param in params]
fforward=function([inputs_], inputs, name='fforward')
costt=function([outputs, inputs_],cost) #notice inputs_ not inputs
gradientss=function([outputs,inputs_], gradients)
'''def feedforward(ip):
return fforward(ip)
#MSE
def cost_func(op,activation):
return cost(op,activation)
#acc_grad=[shared(np.zeros(c,p)) for c,p in zip(layers[1:], layers[:-1])]
def grad(ip,op):
return gradientss(op,ip)'''
batch_grads,dumb=scan(lambda ip,op: gradientss, outputs_info=None, sequences
=[inps, opts]) #minibatch gradient
But I end up getting this error message:
ValueError: The return value of your scan lambda expression may only be
made of lists, tuples, or dictionaries containing Theano variables (or
`theano.scan_module.until` objects for conditions). In particular if you
need to use constant values, you can use `tensor.constant` to turn them into
Theano variables.
tensor.grad should return a list in which case the error shouldn't have
arose. Please guide me.
2) The benefit of taking a dot product with all the examples in the
minibatch and weights means calculating the minibatch gradients in one
shot. But, how will it scale up as the network size increases. Say a
network having huge number of parameters is being used. By taking the
entire minibatch at once, we would have minibatch_size * total parameters
and might actually run out of memory. How is it being handled?
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.