2012/1/26 Oliver Mitevski <[email protected]>: > Hi Olivier, > > A very good substitute for Hadoop to consider would be discoproject > http://discoproject.org/ It's core is implemented in erlang, but the jobs > are written in python. > It's much easier to configure than Hadoop, and would be relatively easy to > parallelize the sklearn algorithms. With this sklearn could easily beat > mahout on large scale machine learning.
The disco distributed file system is very interesting: http://discoproject.org/doc/howto/ddfs.html However the MapReduce part is not that interesting for scaling iterative machine learning as I already stated earlier. Better benefit of the flexibility of interprocess communication with arbitrary topologies and synchronization / barriers as provided by IPython.parallel + 0MQ. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
