2012/1/26 Robert Kern <[email protected]>: > On Thu, Jan 26, 2012 at 17:21, Olivier Grisel <[email protected]> > wrote: >> 2012/1/26 Oliver Mitevski <[email protected]>: >>> Hi Olivier, >>> >>> A very good substitute for Hadoop to consider would be discoproject >>> http://discoproject.org/ It's core is implemented in erlang, but the jobs >>> are written in python. >>> It's much easier to configure than Hadoop, and would be relatively easy to >>> parallelize the sklearn algorithms. With this sklearn could easily beat >>> mahout on large scale machine learning. >> >> The disco distributed file system is very interesting: >> http://discoproject.org/doc/howto/ddfs.html >> >> However the MapReduce part is not that interesting for scaling >> iterative machine learning as I already stated earlier. Better benefit >> of the flexibility of interprocess communication with arbitrary >> topologies and synchronization / barriers as provided by >> IPython.parallel + 0MQ. > > There is also GraphLab, which provides an abstraction geared for > iterative machine learning: > > http://graphlab.org/ > > Unfortunately, its Python support is, rather ludicrously, going > through their Java bindings (it's natively a C++ project) to use > Jython. This should be fixable by writing a C++->Python > embedding/wrapping.
Yes I have the graphlab papers on my kindle waiting near the top of the list. I plan to read them before the sprint to be sure not to overlook state of the art architectural design for distributed machine learning. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
