> I didn't know about that one and Google didn't find it for me (even with
> maxent-related keywords). Thanks!
>    
I also only found it by asking Peter ;)
>>      
>>> Like you mentioned, this batch version will not scale very well. One
>>> reason for this is the optimization algorithm used (scipy's BFGS in my
>>> case).
>>>
>>>     From then on, however, it will be easy for the SGD masters to make the
>>> stochastic version: it will just require re-using the function to
>>> compute the negative-log-likelihood and its gradient and replace BFGS
>>> with SGD!
>>>        
>> Making the SGD handle this case is more or less the only thing that
>> requires any real work in my opinion.
>>      
> Ah! I didn't think it was actually an engineering problem. My bad ;-P
>    
>> Integrating the different loss functions with the current, two-class
>> loss functions and handling 2d weights is what having multinomial
>> logistic regression is about.
>> The rest I can write down in<10 minutes ;)
>>      
> Ok, got it; I didn't have the full picture. Thanks for the clarification.
>
> Btw, what approach do you consider regarding the problem of the 2D label
> array? It seems tricky to integrate "cleanly" with previous methods
> taking 1D target values. This reminds me of the problems with the
> precomputed kernel/affinity matrices... except on the "y" side this time.
>
>    
Why do you need a 2d label array? I would use the common integer
coding. I'm not sure if multinomial logistic regression loss would
be easier to write down with a 1-of-n coding.
For Crammer-Singer loss, that is definitely not the case.


> Anyway, IMHO, I still think it's worth having a separate module for
> batch multinomial logistic regression. It's a popular method, it
> provides an inherently multi-class classifier, some users have asked for
> it, and, apparently, you already have an implementation so it should be
> straightforward (10 minutes... joking ;-)). Bonus: with kernels (I love
> kernels)!
>
>    
I also love kernels ;) But logistic regression and kernels don't go
together all that well, as all input vectors are "support vectors".

> Furthermore, this could be a situation similar to SVMs: hinge loss can
> be used with SGD, but you can also use liblinear/libsvm which are in
> separate modules. Therefore, users can enjoy batch MLR until the
> stochastic version is available, at which point everyone will switch to
> SGD of course ;-).
>
>    
In general I agree that it would be a nice to have an "exact"
solver instead of only a SGD one.
That should still be fast, though. Not sure if scipy can do that.

We can try, though :)

Cheers,
Andy

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to