1.6e6 * 66e3 * 8 == 844e9 bytes. That won't fit in RAM on a 64GB
machine even by switching to single precision floats. Furthermore
processing so many zero values would make it intractable to fit a
model on a single machine any way.

Work is under way [1] to add native support to sparse data input for
decision trees.

That being said, linear models will in general reach the same level of
accuracy and are much faster to train on bag of words data. Suitable
classifiers for text can be found in the text classification example
[2].

[1] https://github.com/scikit-learn/scikit-learn/pull/2984
[2] 
http://scikit-learn.org/stable/auto_examples/document_classification_20newsgroups.html


-- 
Olivier

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to