Hi,
I am implementing the Matrix Exponential Gradient updates from this paper <https://papers.nips.cc/paper/2596-matrix-exponential-gradient-updates-for-on-line-learning-and-bregman-projection.pdf>, but it drastically increases the compile time (from <2min to > 1 hour). Here is the code : def sgd_exp(loss_or_grads, params, log_init_params, learning_rate=1e-3): > > grads = lasagne.updates.get_or_compute_grads(loss_or_grads, params) > updates = lasagne.updates.OrderedDict() > > for param, grad, init in zip(params, grads, log_init_params): > > value = param.get_value(borrow=True) > shape = value.shape > > # Not Square Matrix : Normal SGD > if len(shape) < 2 or shape[-1] != shape[-2]: > updates[param] = param - learning_rate * grad > > # Square Matrix or Tensor : Exponential Gradient > else: > accu = theano.shared(init, broadcastable=param.broadcastable) > accu_new = accu - learning_rate * grad > updates[accu] = accu_new > > # Array of matrices : Reshape to 3-tensor and reshape back > if len(shape) > 2: > new_shape = (np.prod(shape[:-2]),) + shape[-2:] > > accu_new = accu_new.reshape(new_shape) > subupdate = T.zeros(new_shape) > > for i in range(new_shape[0]): > w, V = T.nlinalg.eigh(accu_new[i]) > expo = T.exp(w - T.max(w)) > updt = shape[-1] * T.dot( V, T.dot(T.diag(expo), V.T) > ) / T.sum(expo) > > subupdate = T.set_subtensor(subupdate[i], updt) > > updates[param] = subupdate.reshape(shape) > > else: > w, V = T.nlinalg.eigh(accu_new) > expo = T.exp(w - T.max(w)) > > updates[param] = shape[-1] * T.dot( V, T.dot(T.diag(expo), > V.T) ) / T.sum(expo) > > return updates > Is anything wrong in my implementation ? Any help would be appreciated ! -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.