Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

Andreas Tue, 06 Mar 2012 13:26:52 -0800

On 03/06/2012 10:07 PM, Adrien wrote:
> Le 06/03/2012 21:38, Andreas a écrit :
>    
>>> I didn't know about that one and Google didn't find it for me (even with
>>> maxent-related keywords). Thanks!
>>>
>>>        
>> I also only found it by asking Peter ;)
>>      
>>>>          
>>>>> Like you mentioned, this batch version will not scale very well. One
>>>>> reason for this is the optimization algorithm used (scipy's BFGS in my
>>>>> case).
>>>>>
>>>>>       From then on, however, it will be easy for the SGD masters to make 
>>>>> the
>>>>> stochastic version: it will just require re-using the function to
>>>>> compute the negative-log-likelihood and its gradient and replace BFGS
>>>>> with SGD!
>>>>>
>>>>>            
>>>> Making the SGD handle this case is more or less the only thing that
>>>> requires any real work in my opinion.
>>>>
>>>>          
>>> Ah! I didn't think it was actually an engineering problem. My bad ;-P
>>>
>>>        
>>>> Integrating the different loss functions with the current, two-class
>>>> loss functions and handling 2d weights is what having multinomial
>>>> logistic regression is about.
>>>> The rest I can write down in<10 minutes ;)
>>>>
>>>>          
>>> Ok, got it; I didn't have the full picture. Thanks for the clarification.
>>>
>>> Btw, what approach do you consider regarding the problem of the 2D label
>>> array? It seems tricky to integrate "cleanly" with previous methods
>>> taking 1D target values. This reminds me of the problems with the
>>> precomputed kernel/affinity matrices... except on the "y" side this time.
>>>
>>>
>>>        
>> Why do you need a 2d label array? I would use the common integer
>> coding. I'm not sure if multinomial logistic regression loss would
>> be easier to write down with a 1-of-n coding.
>> For Crammer-Singer loss, that is definitely not the case.
>>      
> With a 2D label array you can also handle the case where you're labels
> are probability estimates or weights, obtained by another model for
> instance. You can also handle multi-class multi-label problems, however
> the objective is not convex anymore...
>    
>>      
>>> Anyway, IMHO, I still think it's worth having a separate module for
>>> batch multinomial logistic regression. It's a popular method, it
>>> provides an inherently multi-class classifier, some users have asked for
>>> it, and, apparently, you already have an implementation so it should be
>>> straightforward (10 minutes... joking ;-)). Bonus: with kernels (I love
>>> kernels)!
>>>
>>>
>>>        
>> I also love kernels ;) But logistic regression and kernels don't go
>> together all that well, as all input vectors are "support vectors".
>>      
> That's true... but it's often the same for non-linear SVMs.
>
> In practice, and in my limited experience, when working with real-world
> challenging datasets, I have observed that most points end up as support
> vectors with non-linear SVMs. One possible reason is that these
> datasets, especially the ones I personally have the pleasure to deal
> with, have a small number of training samples. Therefore, the best
> solutions on these datasets tend to be over-fitting (high C)...
>
> I, however, have no experience with kernel logistic regression.
>    
>>      
>>> Furthermore, this could be a situation similar to SVMs: hinge loss can
>>> be used with SGD, but you can also use liblinear/libsvm which are in
>>> separate modules. Therefore, users can enjoy batch MLR until the
>>> stochastic version is available, at which point everyone will switch to
>>> SGD of course ;-).
>>>        
>> In general I agree that it would be a nice to have an "exact"
>> solver instead of only a SGD one.
>> That should still be fast, though. Not sure if scipy can do that.
>>      
> I just tried my simple implementation, relying on scipy's BFGS, and it
> took approx. 1s to train on an artificial dataset with (n_samples=10000,
> n_features=20, n_classes=10), 15s on (n_samples=10000, n_features=100,
> n_classes=10). So I think, it can be ok for medium scale.
>
>


Alex, Gael, what do you think about that?
Having some base implementation that can be improved
with a better optimizer later seems as a reasonable starting point.


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

Reply via email to