On Thu, Jan 26, 2012 at 17:21, Olivier Grisel <[email protected]> wrote:
> 2012/1/26 Oliver Mitevski <[email protected]>:
>> Hi Olivier,
>>
>> A very good substitute for Hadoop to consider would be discoproject
>> http://discoproject.org/  It's core is implemented in erlang, but the jobs
>> are written in python.
>> It's much easier to configure than Hadoop, and would be relatively easy to
>> parallelize the sklearn algorithms. With this sklearn could easily beat
>> mahout on large scale machine learning.
>
> The disco distributed file system is very interesting:
> http://discoproject.org/doc/howto/ddfs.html
>
> However the MapReduce part is not that interesting for scaling
> iterative machine learning as I already stated earlier. Better benefit
> of the flexibility of interprocess communication with arbitrary
> topologies and synchronization / barriers as provided by
> IPython.parallel + 0MQ.

There is also GraphLab, which provides an abstraction geared for
iterative machine learning:

  http://graphlab.org/

Unfortunately, its Python support is, rather ludicrously, going
through their Java bindings (it's natively a C++ project) to use
Jython. This should be fixable by writing a C++->Python
embedding/wrapping.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to