> A compromise would be to just implement the Cython routine in a separate
> file, while sharing the same file for the pure Python side.

sounds reasonable.

> That said, using a separate class for Adagrad would allow to get rid of
> irrelevant hyper-parameters.

+1

> Some code from the SGD module can probably be
> factorized and OVR should ideally re-use functions from the multiclass
> module.

can you be more specific?

>> It could later be improved and extended with schemes that also use feature
>> specific learning rates.
>
> Do you have specific examples in mind? If not I would just call the class
> AdaGrad. Early (name) optimization is the root of evil :)

AdaDelta ? :)

> PS: SGD is not strictly speaking a descent method so optimization people now
> usually refer to it as just the stochastic gradient method

sure.

Alex

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to