Hi Andreas, Is there a particular reason why there is no general purpose optimization module? Most of the optimizers (atleast the first order methods) are general purpose since you just need to feed the gradient. In some special cases, you probably need problem specific formulation for better performance. The advantage of SVRG is that you don't need to store the gradients which costs a storage of order number_of_weights*number_of_samples which is the main problem with SAG and SAGA. Thus, for most neural network models (and even non-NN models) using SAG and SAGA is infeasible on personal computers.
SVRG is not popular in deep learning community but it should be noted that SVRG is different from Adam since it does not tune the step size. Just to clarify, SVRG can be faster than Adam since it decreases the variance to achieve a similar convergence rate as full batch methods while being computationally cheap like SGD/Adam. However, one can combine both methods to obtain an even faster algorithm. Cheers, Touqir On Tue, Sep 4, 2018 at 11:46 AM Andreas Mueller <t3k...@gmail.com> wrote: > Hi Touqir. > We don't usually implement general purpose optimizers in > scikit-learn, in particular because usually different optimizers > apply to different kinds of problems. > For linear models we have SAG and SAGA, for neural nets we have adam. > I don't think the authors claim to be faster than SAG, so I'm not sure > what the > motivation would be for using their method. > > Best, > Andy > > > On 09/04/2018 12:55 PM, Touqir Sajed wrote: > > Hi, > > I have been looking for stochastic optimization algorithms in scikit-learn > that are faster than SGD and so far I have come across Adam and momentum. > Are there other methods implemented in scikit-learn? Particularly, the > variance reduction methods such as SVRG ( > https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf > <https://ml-trckr.com/link/https%3A%2F%2Fpapers.nips.cc%2Fpaper%2F4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf/W7SK8K47xGR7dKCC8Wlv>) > ? These variance reduction methods are the current state of the art in > terms of convergence speed while maintaining runtime complexity of order n > -- number of features. If they are not implemented yet, I think it would be > really great to implement(I am happy to do so) them since nowadays working > on large datasets(where LBGFS may not be practical) is the norm where the > improvements are definitely worth it. > > Cheers, > Touqir > > -- > Computing Science Master's student at University of Alberta, Canada, > specializing in Machine Learning. Website : > https://ca.linkedin.com/in/touqir-sajed-6a95b1126 > <https://ml-trckr.com/link/https%3A%2F%2Fca.linkedin.com%2Fin%2Ftouqir-sajed-6a95b1126/W7SK8K47xGR7dKCC8Wlv> > > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Computing Science Master's student at University of Alberta, Canada, specializing in Machine Learning. Website : https://ca.linkedin.com/in/touqir-sajed-6a95b1126
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn