[theano-users] Very slow compilation with Exponential Gradient Updates

mcomin Thu, 12 Oct 2017 07:40:30 -0700

Hi,


I am implementing the Matrix Exponential Gradient updates from this paper 
<https://papers.nips.cc/paper/2596-matrix-exponential-gradient-updates-for-on-line-learning-and-bregman-projection.pdf>,
 
but it drastically increases the compile time (from <2min to > 1 hour). 
Here is the code :

def sgd_exp(loss_or_grads, params, log_init_params, learning_rate=1e-3):
>
>     grads = lasagne.updates.get_or_compute_grads(loss_or_grads, params)
>     updates = lasagne.updates.OrderedDict()
>
>     for param, grad, init in zip(params, grads, log_init_params):
>         
>         value = param.get_value(borrow=True)
>         shape = value.shape
>         
>         # Not Square Matrix : Normal SGD
>         if len(shape) < 2 or shape[-1] != shape[-2]:
>             updates[param] = param - learning_rate * grad
>
>         # Square Matrix or Tensor : Exponential Gradient
>         else:
>             accu = theano.shared(init, broadcastable=param.broadcastable)
>             accu_new = accu - learning_rate * grad
>             updates[accu] = accu_new
>             
>             # Array of matrices : Reshape to 3-tensor and reshape back
>             if len(shape) > 2:
>                 new_shape = (np.prod(shape[:-2]),) + shape[-2:]
>                 
>                 accu_new = accu_new.reshape(new_shape)
>                 subupdate = T.zeros(new_shape)
>                 
>                 for i in range(new_shape[0]):
>                     w, V = T.nlinalg.eigh(accu_new[i])
>                     expo = T.exp(w - T.max(w))
>                     updt = shape[-1] * T.dot( V, T.dot(T.diag(expo), V.T) 
> ) / T.sum(expo)     
>                     
>                     subupdate = T.set_subtensor(subupdate[i], updt)
>                     
>                 updates[param] = subupdate.reshape(shape)
>                 
>             else:
>                 w, V = T.nlinalg.eigh(accu_new)
>                 expo = T.exp(w - T.max(w))
>                 
>                 updates[param] = shape[-1] * T.dot( V, T.dot(T.diag(expo), 
> V.T) ) / T.sum(expo)
>
>     return updates
>



Is anything wrong in my implementation ?

Any help would be appreciated !
 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[theano-users] Very slow compilation with Exponential Gradient Updates

Reply via email to