I'm trying to implement SGD in a Neural Net for MNIST task but couldn't for
last 4 days. Any help will be valuable.
The code is as below:
HiddenLayer class:
import theano.tensor as t
from theano import shared
import numpy as np
from theano.tensor.nnet.nnet import sigmoid
class HiddenLayer:
def __init__(self,inputs,dim,prev_dim):
self.weights=shared(np.random.randn(dim,prev_dim))
self.biases=shared(np.random.randn(dim,1))
self.zs=self.weights.dot(inputs)+self.biases
self.activations=sigmoid(self.zs)
Main Program:
import theano.tensor as t
from theano import function,scan,shared
import hll
import numpy as np
import random
import sys
import numpy as np
import pickle
import mnist_loader
layers=[784,30,10] # 1 hidden layer with 30 units
inputs=t.dmatrix('inputs') # 784 x 1
inputs_=inputs # this is important
outputs=t.dmatrix('outputs') # 10 x 1
hlls=[]
num_layers=len(layers)
num_hidden_layers=num_layers-1
train_, validation, test=mnist_loader.load_data_wrapper() #loading the data
inps_=[shared(t_[0]) for t_ in train_] # 784 x 1
outs_=[shared(t_[1]) for t_ in train_] # 10 x 1
inps=t.dtensor3() # list of 784 x 1
opts=t.dtensor3() # list of 10 x 1
bs=10
#Building the Graph
for ind,l in enumerate(layers[1:]):
hh=hll.HiddenLayer(inputs,l,layers[(ind)])
hlls.append(hh)
inputs=hh.activations
#MSE
cost=(-0.5 * ((outputs-inputs) ** 2))
index=t.lscalar()
batch_cost,dumb=scan(lambda ip,op,index: some_func(ip,op), outputs_info=None
, sequences=[inps, opts], non_sequences=index) #minibatch cost
batch_cost_=batch_cost.sum() # summing cost from each example in train.
correponding inputs will not maintained by theano? Disconnected Input
error
#Params to calculate the gradients against
params=[h.weights for h in hlls]
params+=[h.biases for h in hlls]
grads=t.grad(batch_cost_,params)
train=function(inputs=[index],outputs=batch_cost,updates=[(h.weights,h.
weights-g*0.5) for h,g in zip(hlls,grads)], givens={inps:inps_[bs*index:bs*(
index+1)][0], opts:outs_[bs*index:bs*(index+1)][1]}) # how do I specify
tensor variable input and output to be taken from the scan?
I'm unable to wrap my head around to do minibatch. Initial approach was to
calculate gradient per example and accumulate over the minibatch and then
update the weights for each minibatch and couldn't implement that either.
Please help. Thanks in advance.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.