What would be the best approach to classify a large dataset with sparse
features, into multiple categories. I referred to the multiclass page in the
sklearn documentation, but was not sure on which one to use for multiclass
probabilities [top n probabilities would be nice].
I tried using different classifiers but see some issues:
SGDClassifier: get good result but see "Not Implemented" error when I use
predict_proba
LinearSVC: No method to get probabilities
LDA: get an exception "A sparse matrix was passed, but dense data is
required.
Use X.todense() to convert to dense." upon doing which doesnt work well.
Presently I have 64g limitations on memory and 100g disc space.
Any suggestions?
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general