2012/5/11 Rafael Calsaverini <rafael.calsaver...@gmail.com>:
> Any of the algorithms implemented in scikit-learn can be incrementally
> trained?
>
> Three particular things are interesting to me: classifying texts,
> unsupervised clustering analysis of texts and hierarchical clustering
> analysis of texts. But my set of texts is just too big to load in memory all
> at once even with a sparse representation. I can't train the classifier or
> apply the clustering methods without having a MemoryError exception thrown,
> even when working with a fraction of the texts (I tried the Multinomial
> Naive Bayes, the Linear SVM and some of the clustering algorithms).
>
> Does anybody have any tips of what I can do before going all the way to
> using things like Hadoop? Any of the algorithms can be trained
> incrementally?

Some algorithms implement the "partial_fit" method for out-of-core
incremental learning. From the tip of my head:

Perceptron, SGDClassier / SGDRegressor for supervised learning,
MinibatchKMeans for clustering, MinibatchDictionaryLearning for
decomposition (to find a basis of components suitable for sparse
coding for instance).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to