2012/5/9 Olivier Grisel <[email protected]>: > > In scikit-learn there are some model that should scale OK-ish with > large number of samples (like SGD-based classification and regression > models and minibatch kmeans (with random init). In both case you will > need. Those models are by now way battle tested on large data as > vowpal wabbit is. Furthermore in scikit-learn you would have to use > the `partial_fit` method to incrementally update the model to fit from > chunks of data read from the disk or your database, for instance 1000 > documents at a time so as to control memory consumption.
Hum, let me rewrite this paragraph I wrote it too quickly and was not careful enough: In scikit-learn there are some models that should scale OK-ish with large number of samples, for instance: SGD-based classification and regression models and minibatch kmeans with random init for clustering problems with limited number of clusters (e.g. a couple of hundreds max)). Those models are by no means as battle tested on large datasets as vowpal wabbit is. Furthermore in scikit-learn you would have to use the `partial_fit` method to incrementally update the model to fit from chunks of data read from the disk or your database, for instance 1000 documents at a time so as to control memory consumption. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
