[theano-users] Gradients not affecting the loss

shashank gupta Wed, 27 Jul 2016 07:33:06 -0700

Hi,

I am to optimize an objective function using SGD and when I check the cost 
after each step of SGD, it does not changes, it remains the same. I am not 
sure if gradients are being calculated properly in my case. I am suspecting 
this because when I set my learning rate to be zero the cost is essentially 
the same when I set it to some nonzero values. Code for model is :


self.n_user = n_user 
        self.d = d
        self.h = h
        self.n_item = n_item 
        
        self.Wu = theano.shared(np.random.uniform(low = - 
np.sqrt(6.0/float(n_user + d)),\
                                   high =  np.sqrt(6.0/float(n_user + d)),\
                                   
size=(n_user,d)).astype(theano.config.floatX))
        
        self.W1 = self.Wu
        self.W3 = self.Wu
        
        self.Wm1 = theano.shared(np.random.uniform(low=-np.sqrt(6.0/float(h 
+ d)),
                                    high = np.sqrt(6.0/float(h+d)),
                                    
size=(h,d)).astype(theano.config.floatX))
        self.Wp1 = theano.shared(np.random.uniform(low= - 
np.sqrt(6.0/float(h + d)),
                                    high = np.sqrt(6.0/float(h+d)),
                                    
size=(h,d)).astype(theano.config.floatX))
        
        self.B11 = theano.shared(np.zeros((h,1), 
dtype=theano.config.floatX), broadcastable=(False, True))
        self.B21 = theano.shared(np.zeros((2,1), 
dtype=theano.config.floatX), broadcastable=(False, True))

        self.U1 = theano.shared(np.random.uniform(low= - 
np.sqrt(6.0/float(2 + h)),\
                                              high = np.sqrt(6.0/float(2 + 
h)),
                                              
size=(2,h)).astype(theano.config.floatX))


def model(lr = 0.01):
        uu = T.imatrix()
        yu = T.ivector()
        U = self.Wu[uu[:, 0],:]
        V = self.Wu[uu[:, 1],:]
        hLm = U * V
        hLp = abs(U - V)
        hL = T.tanh(T.dot(self.Wm1, hLm.T) + T.dot(self.Wp1, hLp.T) + 
self.B11)
        # Likelihood
        l = T.nnet.softmax(T.dot(self.U1, hL) + self.B21)
        cost = -T.mean(T.log(l[:, yu]))
        grad1 = T.grad(cost, [U,V])
        grads = T.grad(cost, self.Params1)
        self.W1 = T.set_subtensor(self.W1[uu[:,0], :], self.W1[uu[:,0], :] 
- lr * grad1[0])
        self.W1 = T.set_subtensor(self.W1[uu[:,1], :], self.W1[uu[:,1], :] 
- lr * grad1[1])
        updates11 = [(self.Wu, self.W1)]
        updates31 = [(param, param - lr * grad) for (param, grad) in 
zip(self.Params1, grads)]
        updates1 = updates11  + updates31
        self.uu_batch = theano.function([uu,yu], cost, updates=updates1)#, 
allow_input_downcast=True) #mode=NanGuardMode(nan_is_error=True, 
inf_is_error=True, big_is_error=True))

Now when I run uu_batch with training examples, the cost is same for each 
step of SGD. Any pointers on this are highly appreciated. 

Thanks and advance !!

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Gradients not affecting the loss

Reply via email to