What about using a distance metric like this one?
http://en.wikipedia.org/wiki/Normalized_Google_distance
From: Joel Nothman [joel.noth...@gmail.com]
Sent: 19 February 2014 22:50
To: scikit-learn-general
Subject: Re: [Scikit-learn-general] Logistic
2014-02-19 20:57 GMT+01:00 Pavel Soriano :
> I thought about using the values of the coefficients of the fitted logit
> equation to get a glimpse of what words in the vocabulary, or what style
> features, affect the most to the classification decision. Is it correct to
> assume that if the coeffici
Sounds like you're on the right path. Looking at the misclassified
documents and the feature coefficients is a common way to debug a
classifier, especially if you use boolean features.
If you're using a sklearn vectorizer this might be of interest to you:
http://stackoverflow.com/questions/669
It is correct to assume that a positive coefficient contributes positively
to a decision.
However, because the features are interdependent, the raw strength of a
feature isn't always straightforward to interpret. For example, it might
give a big positive coefficient to "Tel" and a similar negative
Hello scikit!
I need some insights into what I am doing.
Currently I am doing a text classifier (2 classes) using unigrams (word
level) and some writing style features. I am using a Logistic Regression
model, with L1 regularization. I have a decent performance (around .70
f-measure) for the given
On 2012-03-21, at 4:57 AM, Olivier Grisel wrote:
> I think the docstring is wrong. Anybody can confirm?
Ran into this myself last night while answering the other thread. Yeah, it
appears to be.
David
--
This SF email is
On Wed, Mar 21, 2012 at 5:57 PM, Olivier Grisel
wrote:
> If there are only two classes, 0 or -1 is treated as negative and 1 is
> treated as positive.
To complement Olivier's answer, by convention in scikit-learn, the
negative label is in self.classes_[0]
and the positive one is in self.classes_
Although I haven't check the code, I guess this is the usual way to store
the coefficients. To calculate P(C=i|x), we can use the formula: exp(sum_j
Coef([i,j])/Z, where Z=sum_i exp(\sum_j Coef[i,j]).
Sincerely,
Kerui Min
On Wed, Mar 21, 2012 at 4:57 PM, Olivier Grisel wrote:
> Le 21 mars 2012
Le 21 mars 2012 07:49, Andrew Cepheus a écrit :
> The LogisticRegression class holds a coef_ attribute which is said to hold
> the coefficients in the decision function.
> High (positive) coefficients mean more correlation with the class, while low
> (negative) ones mean an opposite correlation wi
The LogisticRegression class holds a coef_ attribute which is said to hold
the coefficients in the decision function.
High (positive) coefficients mean more correlation with the class, while
low (negative) ones mean an opposite correlation with the class.
- Assuming that I have two class in that ta
10 matches
Mail list logo