Hi Bilal.
As far as I can see, the OneVsRestClassifier
decides whether to do multi-class or multi-label by
looking a the training set. This is exactly what you observe:
If you only have on label per datapoint in the training set,
you will only get one label back.
Looking at the OneVsRestClassifier code,
I'm not sure where this happens. Maybe someone
who is more familiar with this estimator can comment.
Cheers,
Andy
On 05/16/2012 03:52 AM, Bilal Allawala wrote:
> Hi Andy
>
> Thanks for your email.
> I'm not sure how to post back to the main mailing list.
>
> I tried using the OneVsRestClassifier that you mentioned.
> Unless I manually train every combination, i only get one label per
> text. Is there anything im missing? (my code is below)
> Thanks again for all your help
>
> Code:
>
> target_names = ['new york','london','spain']
> X_train = np.array(["new york is a hell of a town",
> "new york was originally dutch",
> "the big apple is great",
> "new york is also called the big apple",
> "nyc is nice",
> "people abbreviate new york city as nyc",
> "the capital of great britain is london",
> "london is in the uk",
> "london is in england",
> "london is in great britain",
> "it rains a lot in london",
> "london hosts the british museum",
> "london and new york are both cities" #I get multi
> labels only if I have this in the training set, which includes both london
> and new york
> ])
> y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1]] #also for
> multilabel i need the [0,1]
> X_test = np.array(['nice day in nyc',
> 'welcome to london',
> 'hello welcome to new york. It has theaters like
> london'])
>
>
> classifier = Pipeline([
> ('vectorizer', CountVectorizer(min_n=1,max_n=2)),
> ('tfidf', TfidfTransformer()),
> ('clf', OneVsRestClassifier(LinearSVC()))])
>
> classifier.fit(X_train, y_train)
> predicted = classifier.predict(X_test)
>
> for item, labels in zip(X_test, predicted):
> print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))
> returns:
> > nice day in nyc => new york
> > welcome to london => london
> > hello welcome to new york. It has theaters like london => new york, london
> but if i take out the "london and new york are both cities" and [0,1]
> from the training, the classifier returns:
> > nice day in nyc => new york
> > welcome to london => london
> > hello welcome to new york. It has theaters like london => new york
>
> > test string.
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general