Good day,

Can anyone perhaps give me an idea of how large datasets scikit-learn
algorithms typically can handle?

I have about 4 TB of structured data. I might be able to normalize that
down to say 1 TB if necessary. The tasks would typically be logistic
regression, Naive Bayes, k-Means and possible others.

Will scikit-learn algorithms be able to handle this on a fairly powerful
hardware setup?

At which point does it become necessary to look at distributed ML platforms
e.g. Mahout instead?

Best regards,
Helge
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to