2012/1/26 Oliver Mitevski <[email protected]>:
> Hi Olivier,
>
> A very good substitute for Hadoop to consider would be discoproject
> http://discoproject.org/  It's core is implemented in erlang, but the jobs
> are written in python.
> It's much easier to configure than Hadoop, and would be relatively easy to
> parallelize the sklearn algorithms. With this sklearn could easily beat
> mahout on large scale machine learning.

The disco distributed file system is very interesting:
http://discoproject.org/doc/howto/ddfs.html

However the MapReduce part is not that interesting for scaling
iterative machine learning as I already stated earlier. Better benefit
of the flexibility of interprocess communication with arbitrary
topologies and synchronization / barriers as provided by
IPython.parallel + 0MQ.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to