2012/1/26 Robert Kern <[email protected]>:
> On Thu, Jan 26, 2012 at 17:21, Olivier Grisel <[email protected]> 
> wrote:
>> 2012/1/26 Oliver Mitevski <[email protected]>:
>>> Hi Olivier,
>>>
>>> A very good substitute for Hadoop to consider would be discoproject
>>> http://discoproject.org/  It's core is implemented in erlang, but the jobs
>>> are written in python.
>>> It's much easier to configure than Hadoop, and would be relatively easy to
>>> parallelize the sklearn algorithms. With this sklearn could easily beat
>>> mahout on large scale machine learning.
>>
>> The disco distributed file system is very interesting:
>> http://discoproject.org/doc/howto/ddfs.html
>>
>> However the MapReduce part is not that interesting for scaling
>> iterative machine learning as I already stated earlier. Better benefit
>> of the flexibility of interprocess communication with arbitrary
>> topologies and synchronization / barriers as provided by
>> IPython.parallel + 0MQ.
>
> There is also GraphLab, which provides an abstraction geared for
> iterative machine learning:
>
>  http://graphlab.org/
>
> Unfortunately, its Python support is, rather ludicrously, going
> through their Java bindings (it's natively a C++ project) to use
> Jython. This should be fixable by writing a C++->Python
> embedding/wrapping.

Yes I have the graphlab papers on my kindle waiting near the top of
the list. I plan to read them before the sprint to be sure not to
overlook state of the art architectural design for distributed machine
learning.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to