This is out of the scope of scikit-learn, which is a toolkit meant to be used for easier machine learning. Optimization is a component of machine learning, but not one that is readily-useable by itself.
Gaël On Tue, Sep 04, 2018 at 12:45:09PM -0600, Touqir Sajed wrote: > Hi Andreas, > Is there a particular reason why there is no general purpose optimization > module? Most of the optimizers (atleast the first order methods) are general > purpose since you just need to feed the gradient. In some special cases, you > probably need problem specific formulation for better performance. The > advantage of SVRG is that you don't need to store the gradients which costs a > storage of order number_of_weights*number_of_samples which is the main problem > with SAG and SAGA. Thus, for most neural network models (and even non-NN > models) using SAG and SAGA is infeasible on personal computers. > SVRG is not popular in deep learning community but it should be noted that > SVRG > is different from Adam since it does not tune the step size. Just to clarify, > SVRG can be faster than Adam since it decreases the variance to achieve a > similar convergence rate as full batch methods while being computationally > cheap like SGD/Adam. However, one can combine both methods to obtain an even > faster algorithm. > Cheers, > Touqir > * > On Tue, Sep 4, 2018 at 11:46 AM Andreas Mueller <t3k...@gmail.com> wrote: > Hi Touqir. > We don't usually implement general purpose optimizers in > scikit-learn, in particular because usually different optimizers > apply to different kinds of problems. > For linear models we have SAG and SAGA, for neural nets we have adam. > I don't think the authors claim to be faster than SAG, so I'm not sure > what > the > motivation would be for using their method. > Best, > Andy > On 09/04/2018 12:55 PM, Touqir Sajed wrote: > Hi, > I have been looking for stochastic optimization algorithms in > scikit-learn that are faster than SGD and so far I have come across > Adam and momentum. Are there other methods implemented in > scikit-learn? > Particularly, the variance reduction methods such as SVRG (https:// > papers.nips.cc/paper/ > > 4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf > ) ? These variance reduction methods are the current state of the art > in terms of convergence speed while maintaining runtime complexity of > order n -- number of features. If they are not implemented yet, I > think > it would be really great to implement(I am happy to do so) them since > nowadays working on large datasets(where LBGFS may not be practical) > is > the norm where the improvements are definitely worth it. > Cheers, > Touqir -- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn