> A compromise would be to just implement the Cython routine in a separate > file, while sharing the same file for the pure Python side.
sounds reasonable. > That said, using a separate class for Adagrad would allow to get rid of > irrelevant hyper-parameters. +1 > Some code from the SGD module can probably be > factorized and OVR should ideally re-use functions from the multiclass > module. can you be more specific? >> It could later be improved and extended with schemes that also use feature >> specific learning rates. > > Do you have specific examples in mind? If not I would just call the class > AdaGrad. Early (name) optimization is the root of evil :) AdaDelta ? :) > PS: SGD is not strictly speaking a descent method so optimization people now > usually refer to it as just the stochastic gradient method sure. Alex ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general