Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-20 Thread Paolo Di Prodi
What about using a distance metric like this one? http://en.wikipedia.org/wiki/Normalized_Google_distance From: Joel Nothman [joel.noth...@gmail.com] Sent: 19 February 2014 22:50 To: scikit-learn-general Subject: Re: [Scikit-learn-general] Logistic

Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-20 Thread Lars Buitinck
2014-02-19 20:57 GMT+01:00 Pavel Soriano : > I thought about using the values of the coefficients of the fitted logit > equation to get a glimpse of what words in the vocabulary, or what style > features, affect the most to the classification decision. Is it correct to > assume that if the coeffici

Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-19 Thread Tobias Günther
Sounds like you're on the right path. Looking at the misclassified documents and the feature coefficients is a common way to debug a classifier, especially if you use boolean features. If you're using a sklearn vectorizer this might be of interest to you: http://stackoverflow.com/questions/669

Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-19 Thread Joel Nothman
It is correct to assume that a positive coefficient contributes positively to a decision. However, because the features are interdependent, the raw strength of a feature isn't always straightforward to interpret. For example, it might give a big positive coefficient to "Tel" and a similar negative

[Scikit-learn-general] Logistic regression coefficients analysis

2014-02-19 Thread Pavel Soriano
Hello scikit! I need some insights into what I am doing. Currently I am doing a text classifier (2 classes) using unigrams (word level) and some writing style features. I am using a Logistic Regression model, with L1 regularization. I have a decent performance (around .70 f-measure) for the given

Re: [Scikit-learn-general] Logistic Regression coefficients

2012-03-21 Thread David Warde-Farley
On 2012-03-21, at 4:57 AM, Olivier Grisel wrote: > I think the docstring is wrong. Anybody can confirm? Ran into this myself last night while answering the other thread. Yeah, it appears to be. David -- This SF email is

Re: [Scikit-learn-general] Logistic Regression coefficients

2012-03-21 Thread Mathieu Blondel
On Wed, Mar 21, 2012 at 5:57 PM, Olivier Grisel wrote: > If there are only two classes, 0 or -1 is treated as negative and 1 is > treated as positive. To complement Olivier's answer, by convention in scikit-learn, the negative label is in self.classes_[0] and the positive one is in self.classes_

Re: [Scikit-learn-general] Logistic Regression coefficients

2012-03-21 Thread Kerui Min
Although I haven't check the code, I guess this is the usual way to store the coefficients. To calculate P(C=i|x), we can use the formula: exp(sum_j Coef([i,j])/Z, where Z=sum_i exp(\sum_j Coef[i,j]). Sincerely, Kerui Min On Wed, Mar 21, 2012 at 4:57 PM, Olivier Grisel wrote: > Le 21 mars 2012

Re: [Scikit-learn-general] Logistic Regression coefficients

2012-03-21 Thread Olivier Grisel
Le 21 mars 2012 07:49, Andrew Cepheus a écrit : > The LogisticRegression class holds a coef_ attribute which is said to hold > the coefficients in the decision function. > High (positive) coefficients mean more correlation with the class, while low > (negative) ones mean an opposite correlation wi

[Scikit-learn-general] Logistic Regression coefficients

2012-03-21 Thread Andrew Cepheus
The LogisticRegression class holds a coef_ attribute which is said to hold the coefficients in the decision function. High (positive) coefficients mean more correlation with the class, while low (negative) ones mean an opposite correlation with the class. - Assuming that I have two class in that ta