On Fri, Jan 27, 2012 at 03:44:31PM +0100, Andreas wrote:
> as it could be. So I was wondering whether there would be a
> non-intrusive way to make sklearn parallelize over the cluster.

This is a very legitimate question. Basically, it boils down to: how can
we extend the parallelism model in scikit-learn.

The way I see it, we would need to define a basic API for parallel
computing that we need. We could start from what we have, that is
parallel maps.

I believe that this mechanism should not live in scikit-learn, because it
is general-purpose, and not specific to our needs. We could put it in
joblib: right now joblib doesn't really do much for parallelism, it is a
layer on top of multiprocessing that gives syntactic sugar for a specific
pattern of parallelism.

We could offer to use IPython as a backend in joblib, rather than
multiprocessing. I have actually been thinking of doing this for quite a
while. Off course, we would want as much code as possible to live in
joblib, only what's needed to give a homegeneous API. Any improvement
should go into IPython (and I think that the Pycon sprint will help in
this regard).

That way, scikit-learn gets IPython parallelism for free, and can use
multiprocessing as a fallback.

That's my vision. I lack man-power to develop it. If people are
interested, we can discuss a bit more technical details about how to
implement it.

Any takers? Their's probably a fair amount of work.

Gael

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to