Hi,
I am to optimize an objective function using SGD and when I check the cost
after each step of SGD, it does not changes, it remains the same. I am not
sure if gradients are being calculated properly in my case. I am suspecting
this because when I set my learning rate to be zero the cost is essentially
the same when I set it to some nonzero values. Code for model is :
self.n_user = n_user
self.d = d
self.h = h
self.n_item = n_item
self.Wu = theano.shared(np.random.uniform(low = -
np.sqrt(6.0/float(n_user + d)),\
high = np.sqrt(6.0/float(n_user + d)),\
size=(n_user,d)).astype(theano.config.floatX))
self.W1 = self.Wu
self.W3 = self.Wu
self.Wm1 = theano.shared(np.random.uniform(low=-np.sqrt(6.0/float(h
+ d)),
high = np.sqrt(6.0/float(h+d)),
size=(h,d)).astype(theano.config.floatX))
self.Wp1 = theano.shared(np.random.uniform(low= -
np.sqrt(6.0/float(h + d)),
high = np.sqrt(6.0/float(h+d)),
size=(h,d)).astype(theano.config.floatX))
self.B11 = theano.shared(np.zeros((h,1),
dtype=theano.config.floatX), broadcastable=(False, True))
self.B21 = theano.shared(np.zeros((2,1),
dtype=theano.config.floatX), broadcastable=(False, True))
self.U1 = theano.shared(np.random.uniform(low= -
np.sqrt(6.0/float(2 + h)),\
high = np.sqrt(6.0/float(2 +
h)),
size=(2,h)).astype(theano.config.floatX))
def model(lr = 0.01):
uu = T.imatrix()
yu = T.ivector()
U = self.Wu[uu[:, 0],:]
V = self.Wu[uu[:, 1],:]
hLm = U * V
hLp = abs(U - V)
hL = T.tanh(T.dot(self.Wm1, hLm.T) + T.dot(self.Wp1, hLp.T) +
self.B11)
# Likelihood
l = T.nnet.softmax(T.dot(self.U1, hL) + self.B21)
cost = -T.mean(T.log(l[:, yu]))
grad1 = T.grad(cost, [U,V])
grads = T.grad(cost, self.Params1)
self.W1 = T.set_subtensor(self.W1[uu[:,0], :], self.W1[uu[:,0], :]
- lr * grad1[0])
self.W1 = T.set_subtensor(self.W1[uu[:,1], :], self.W1[uu[:,1], :]
- lr * grad1[1])
updates11 = [(self.Wu, self.W1)]
updates31 = [(param, param - lr * grad) for (param, grad) in
zip(self.Params1, grads)]
updates1 = updates11 + updates31
self.uu_batch = theano.function([uu,yu], cost, updates=updates1)#,
allow_input_downcast=True) #mode=NanGuardMode(nan_is_error=True,
inf_is_error=True, big_is_error=True))
Now when I run uu_batch with training examples, the cost is same for each
step of SGD. Any pointers on this are highly appreciated.
Thanks and advance !!
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.