On Fri, Jan 27, 2012 at 03:44:31PM +0100, Andreas wrote: > as it could be. So I was wondering whether there would be a > non-intrusive way to make sklearn parallelize over the cluster.
This is a very legitimate question. Basically, it boils down to: how can we extend the parallelism model in scikit-learn. The way I see it, we would need to define a basic API for parallel computing that we need. We could start from what we have, that is parallel maps. I believe that this mechanism should not live in scikit-learn, because it is general-purpose, and not specific to our needs. We could put it in joblib: right now joblib doesn't really do much for parallelism, it is a layer on top of multiprocessing that gives syntactic sugar for a specific pattern of parallelism. We could offer to use IPython as a backend in joblib, rather than multiprocessing. I have actually been thinking of doing this for quite a while. Off course, we would want as much code as possible to live in joblib, only what's needed to give a homegeneous API. Any improvement should go into IPython (and I think that the Pycon sprint will help in this regard). That way, scikit-learn gets IPython parallelism for free, and can use multiprocessing as a fallback. That's my vision. I lack man-power to develop it. If people are interested, we can discuss a bit more technical details about how to implement it. Any takers? Their's probably a fair amount of work. Gael ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
