On Thu, Jan 26, 2012 at 17:21, Olivier Grisel <[email protected]> wrote: > 2012/1/26 Oliver Mitevski <[email protected]>: >> Hi Olivier, >> >> A very good substitute for Hadoop to consider would be discoproject >> http://discoproject.org/ It's core is implemented in erlang, but the jobs >> are written in python. >> It's much easier to configure than Hadoop, and would be relatively easy to >> parallelize the sklearn algorithms. With this sklearn could easily beat >> mahout on large scale machine learning. > > The disco distributed file system is very interesting: > http://discoproject.org/doc/howto/ddfs.html > > However the MapReduce part is not that interesting for scaling > iterative machine learning as I already stated earlier. Better benefit > of the flexibility of interprocess communication with arbitrary > topologies and synchronization / barriers as provided by > IPython.parallel + 0MQ.
There is also GraphLab, which provides an abstraction geared for iterative machine learning: http://graphlab.org/ Unfortunately, its Python support is, rather ludicrously, going through their Java bindings (it's natively a C++ project) to use Jython. This should be fixable by writing a C++->Python embedding/wrapping. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
