hi,
I am trying to build a classifier with a minimal rate of false positives
(i.e., these have a high cost for me and I would rather have many false
negatives than any false positive). I started with the LogisticRegression
classifier because it is fairly simple to interpret its output for me.
Now, because I do not have much experience with machine learning in general,
I was hoping to be able to input to the thing the full cost matrix to allow
the underlying optimization algorithm to pick the best model/weights.
Instead, I think that the LogisticRegression class accepts as input only a
class_weight which appears to be a dictionnary that allows me to state which
of the output classes I feel are best (exactly how it is interpreted is
unclear to me).
Namely, I think that when I write this:
classifier = LogisticRegression(C = 1000000)
classifier.fit(data[train], scores[train], class_weight={True : 0.1, False:
1})
I am effectively saying, make sure that you predict True less often than you
would predict it normally and this results, as expected, in a lower false
positive rate than if I did not say anything. However, I really do wonder if
it is not possible to have finer-grained control over how the classifier is
going to make its decisions based on a complete contigency table.
Mathieu
--
Mathieu Lacage <[email protected]>
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general