Hi, I am using the below to vectorized a corpus.
self.vectorizer = CountVectorizer(tokenizer=self.custom_tokenizer,lowercase=self.lowercase,binary=self.is_binary) self.X = self.vectorizer.fit_transform(self.corpus) The output is a sparse matrix in csr format. Great! Anyway I have multiclass data. Need to implement OvR using the libsvm flavor. Not liblinear and not OvO. Per the below from http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC It is possible to implement one vs the rest with SVC by using the sklearn.multiclass.OneVsRestClassifier wrapper. Finally SVC can fit dense data without memory copy if the input is C-contiguous. Sparse data will still incur memory copy though. I am confused on how to use OvR using the libsvm SVC. 1) Do I have to have my feature vectors in dense format? from X = self.vectorizer.fit_transform(self.corpus) ==> X.todense() Or, is sparse OK i.e. X.tocsr() 2) How do fit? clf = sklearn.multiclass.OneVsRestClassifier(svm.SVC()) clf.fit(X, Y) Will this use the libsvm flavor of OvR? Thanks ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
