HI
I updated to using the label binarizer, but i'm not sure where to go from
there.
I am still only getting one category per text
Below is my update code
y_train = ('New York','London')
train_set = ("new york nyc big apple", "london uk great britain")
vocab = {'new york' :0,'nyc':1,'big apple':2,'london' : 3, 'uk': 4, 'great
britain' : 5}
count = CountVectorizer(analyzer=WordNGramAnalyzer(min_n=1,
max_n=2),vocabulary=vocab)
test_set = ('nice day in nyc','london town','hello welcome to the big
apple. enjoy it here and london too')
X_vectorized = count.transform(train_set).todense()
smatrix2 = count.transform(test_set).todense()
Y_indicator = LabelBinarizer().fit(y_train).transform(y_train)
base_clf = MultinomialNB(alpha=1)
clf = OneVsRestClassifier(base_clf).fit(X_vectorized, Y_indicator)
Y_pred = clf.predict(smatrix2)
print Y_pred
Thanks for your time and help
bilal
On Fri, May 11, 2012 at 5:30 AM, Olivier Grisel <[email protected]>wrote:
> Hi,
>
> To make OneVsRest work at a multilabel classifier instead of
> multiclass classifier you need to "binarize" the label representation
> using LabelBinarizer as demonstrated in this example:
>
>
> http://scikit-learn.org/dev/auto_examples/plot_multilabel.html#example-plot-multilabel-py
>
> Also be ware that most classifiers in scikit-learn expect integer
> labels instead of strings (you need to define a mapping from one
> representation to another) although string labels might work for some
> of them. A new LabelEncoder class will soon be merged in master to
> help with that kind of representation switch.
>
> If you have further questions please ask them on the project mailing
> list so that other scikit-learn user can benefit from your experience.
>
> Best,
>
> --
> Olivier
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general