2012/5/11 Rafael Calsaverini <rafael.calsaver...@gmail.com>: > Any of the algorithms implemented in scikit-learn can be incrementally > trained? > > Three particular things are interesting to me: classifying texts, > unsupervised clustering analysis of texts and hierarchical clustering > analysis of texts. But my set of texts is just too big to load in memory all > at once even with a sparse representation. I can't train the classifier or > apply the clustering methods without having a MemoryError exception thrown, > even when working with a fraction of the texts (I tried the Multinomial > Naive Bayes, the Linear SVM and some of the clustering algorithms). > > Does anybody have any tips of what I can do before going all the way to > using things like Hadoop? Any of the algorithms can be trained > incrementally?
Some algorithms implement the "partial_fit" method for out-of-core incremental learning. From the tip of my head: Perceptron, SGDClassier / SGDRegressor for supervised learning, MinibatchKMeans for clustering, MinibatchDictionaryLearning for decomposition (to find a basis of components suitable for sparse coding for instance). -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general