What would be the best approach to classify a large dataset with sparse 
features, into multiple categories. I referred to the multiclass page in the 
sklearn documentation, but was not sure on which one to use for multiclass 
probabilities [top n probabilities would be nice]. 
    I tried using different classifiers but see some issues:
SGDClassifier: get good result but see "Not Implemented" error when I use
               predict_proba
LinearSVC: No method to get probabilities
LDA: get an exception   "A sparse matrix was passed, but dense data is 
required. 
Use X.todense() to convert to dense." upon doing which doesnt work well.

Presently I have 64g limitations on memory and 100g disc space.
Any suggestions?           




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to