> I didn't know about that one and Google didn't find it for me (even with > maxent-related keywords). Thanks! > I also only found it by asking Peter ;) >> >>> Like you mentioned, this batch version will not scale very well. One >>> reason for this is the optimization algorithm used (scipy's BFGS in my >>> case). >>> >>> From then on, however, it will be easy for the SGD masters to make the >>> stochastic version: it will just require re-using the function to >>> compute the negative-log-likelihood and its gradient and replace BFGS >>> with SGD! >>> >> Making the SGD handle this case is more or less the only thing that >> requires any real work in my opinion. >> > Ah! I didn't think it was actually an engineering problem. My bad ;-P > >> Integrating the different loss functions with the current, two-class >> loss functions and handling 2d weights is what having multinomial >> logistic regression is about. >> The rest I can write down in<10 minutes ;) >> > Ok, got it; I didn't have the full picture. Thanks for the clarification. > > Btw, what approach do you consider regarding the problem of the 2D label > array? It seems tricky to integrate "cleanly" with previous methods > taking 1D target values. This reminds me of the problems with the > precomputed kernel/affinity matrices... except on the "y" side this time. > > Why do you need a 2d label array? I would use the common integer coding. I'm not sure if multinomial logistic regression loss would be easier to write down with a 1-of-n coding. For Crammer-Singer loss, that is definitely not the case.
> Anyway, IMHO, I still think it's worth having a separate module for > batch multinomial logistic regression. It's a popular method, it > provides an inherently multi-class classifier, some users have asked for > it, and, apparently, you already have an implementation so it should be > straightforward (10 minutes... joking ;-)). Bonus: with kernels (I love > kernels)! > > I also love kernels ;) But logistic regression and kernels don't go together all that well, as all input vectors are "support vectors". > Furthermore, this could be a situation similar to SVMs: hinge loss can > be used with SGD, but you can also use liblinear/libsvm which are in > separate modules. Therefore, users can enjoy batch MLR until the > stochastic version is available, at which point everyone will switch to > SGD of course ;-). > > In general I agree that it would be a nice to have an "exact" solver instead of only a SGD one. That should still be fast, though. Not sure if scipy can do that. We can try, though :) Cheers, Andy ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
