Hi Bilal.
For multi-label classification, the easiest approach
is to train one classifier per label.
There is an easy way to do this with sklearn using
the OneVsRestClassifier, as described in the user guide:
scikit-learn.org/dev/modules/multiclass.html
<http://scikit-learn.org/dev/modules/multiclass.html>
Cheers,
Andy

On 05/15/2012 04:46 AM, Bilal Allawala wrote:
Hi

I am trying to classify text by places. A piece of text can be belong to one or more places.

My code (attached below) returns:
nice day in nyc => new york
welcome to london => london
hello welcome to new york. It has theaters like london => new york, london

but if i take out the "london and new york are both cities" and [0,1] from the training, the classifier only returns "new york" for the last test string. Is it possible to classify into multiple labels without training all combinations of the labels. I have a list of about a thousand places and creating training sets with all possible label combinations would be extremely hard.

Thanks for your help :)

Code:

target_names = ['new york','london','spain']
X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "the big apple is great",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "people abbreviate new york city as nyc",
                    "the capital of great britain is london",
                    "london is in the uk",
                    "london is in england",
                    "london is in great britain",
                    "it rains a lot in london",
                    "london hosts the british museum",
"london and new york are both cities" #I get multi labels only if I have this in the training set, which includes both london and new york
                   ])
y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1]] #also for multilabel i need the [0,1]
X_test = np.array(['nice day in nyc',
                   'welcome to london',
'hello welcome to new york. It has theaters like london'])


classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_n=1,max_n=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)

for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to