2012/7/20 Andreas Müller <[email protected]>:
> Hi Sicco.

Indeed, hi, and nice to see you've picked scikit-learn :)

> This is desired behavior.

Then again, we could introduce a min_classes parameter to determine
how many classes should be returned at least. This is commonly what
you want when predicting multiple tags (think StackOverflow questions,
where at least one tag is required).

> If you want to always get a label, you could have a look at the 
> decision_function
> and just predict the label with the highest score if no label was predicted.

In some more detail, you can find out which class gets the highest
score for a sample vector x using

    clf.label_binarizer_.classes_[numpy.argmax([e.decision_function(x)
for e in clf.estimators_])]

This is arguably a hack; the OvR estimator is a bit rough around the
edges. It doesn't play well with the Pipeline either, since you have
to vectorize the document yourself. Without a Pipeline, the training
procedure would be

    vect = TfidfVectorizer()  # or Vectorizer in older versions; this
combines CountVectorizer and TfidfTransformer
    clf = OneVsRestClassifier(LinearSVC())
    X = vect.fit_transform(train_txt)
    clf.fit(X, train_labels)

And prediction would become (showing the procedure for one document at
a time now)

    x = vect.transform([one_document])
    [labels] = clf.predict(x)
    if len(labels) == 0:
        # apply the trick I described above

Good luck,
Lars

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to