1.6e6 * 66e3 * 8 == 844e9 bytes. That won't fit in RAM on a 64GB machine even by switching to single precision floats. Furthermore processing so many zero values would make it intractable to fit a model on a single machine any way.
Work is under way [1] to add native support to sparse data input for decision trees. That being said, linear models will in general reach the same level of accuracy and are much faster to train on bag of words data. Suitable classifiers for text can be found in the text classification example [2]. [1] https://github.com/scikit-learn/scikit-learn/pull/2984 [2] http://scikit-learn.org/stable/auto_examples/document_classification_20newsgroups.html -- Olivier ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
