Hi,

I am using the below to vectorized a corpus.

self.vectorizer =
CountVectorizer(tokenizer=self.custom_tokenizer,lowercase=self.lowercase,binary=self.is_binary)
self.X = self.vectorizer.fit_transform(self.corpus)

The output is a sparse matrix in csr format.  Great!


Anyway I have multiclass data.   Need to implement OvR using the
libsvm flavor.  Not liblinear and not OvO.

Per the below from
http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC

 It is possible to implement one vs the rest with SVC by using the
sklearn.multiclass.OneVsRestClassifier wrapper. Finally SVC can fit
dense data without memory copy if the input is C-contiguous. Sparse
data will still incur memory copy though.


I am confused on how to use OvR using the libsvm SVC.

1) Do I have to have my feature vectors in dense format? from X =
self.vectorizer.fit_transform(self.corpus) ==> X.todense()  Or, is
sparse OK i.e. X.tocsr()

2) How do fit?

clf = sklearn.multiclass.OneVsRestClassifier(svm.SVC())
clf.fit(X, Y)

Will this use the libsvm flavor of OvR?

Thanks

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to