On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: > I started working with IPython.parallel for training the trees using joblib. > It works in principal, but it is SLOW. > The time between starting and the jobs arriving at the engines is really > long. > I'm sending around 20.000x2000 float64 matrices, but this is gigabit > ethernet and I wouldn't > expect it to take like 10-20 seconds (haven't measured exactly).
IPython uses pickling, which is really slow. We could use a modifying version of joblib's numpy pickler to pickle in memory and shortcut any other pickler. Basically, it means that on https://github.com/amueller/joblib/blob/ipython_refactoring/joblib/parallel.py#L514 we need to do something more clever than "lambda x: x[0](*x[1], **x[2]), iterable". For instance, I would envisage using a function wrapper that works on joblib pickled arguments, and giving it our pickles. Maybe it would be better to fix this in IPython by porting part of the cleverness of joblib's pickler to IPython. That's something to be discussed with the IPython team. Actually, my gutt feeling is also that I would want to use a 'SafeFunction' here, rather than a lamba. There are many other remarks that come to mind, for instance the fact that you are casting the iterable to a list (line 513) will blow the memory. You will need a dispatch mechanism to avoid blowing the memory, but that's a lot more work. In other words, there is still quite some work to make this scale. A quick fix (for your deadline :$) would be to use a transformer that transforms URI (for instance filenames) to datasets by loading them from a data store. That way you are doing the GridSearchCV on a fairly small volume of data, simply the URIs, and the heavy loading of the data would be delayed to the workers. Gaël ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
