2012/5/9 Olivier Grisel <[email protected]>:
>
> In scikit-learn there are some model that should scale OK-ish with
> large number of samples (like SGD-based classification and regression
> models and minibatch kmeans (with random init). In both case you will
> need. Those models are by now way battle tested on large data as
> vowpal wabbit is. Furthermore in scikit-learn you would have to use
> the `partial_fit` method to incrementally update the model to fit from
> chunks of data read from the disk or your database, for instance 1000
> documents at a time so as to control memory consumption.

Hum, let me rewrite this paragraph I wrote it too quickly and was not
careful enough:

In scikit-learn there are some models that should scale OK-ish with
large number of samples, for instance: SGD-based classification and
regression models and minibatch kmeans with random init for
clustering problems with limited number of clusters (e.g. a couple of
hundreds max)).

Those models are by no means as battle tested on large datasets as
vowpal wabbit is. Furthermore in scikit-learn you would have to use
the `partial_fit` method to incrementally update the model to fit from
chunks of data read from the disk or your database, for instance 1000
documents at a time so as to control memory consumption.


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to