On 02/01/2012 04:03 PM, Gael Varoquaux wrote: > On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: > >> I started working with IPython.parallel for training the trees using joblib. >> It works in principal, but it is SLOW. >> The time between starting and the jobs arriving at the engines is really >> long. >> I'm sending around 20.000x2000 float64 matrices, but this is gigabit >> ethernet and I wouldn't >> expect it to take like 10-20 seconds (haven't measured exactly). >> > IPython uses pickling, which is really slow. > > Really? I thought it would handle Numpy arrays explicitly. That is how I understood http://ipython.org/ipython-doc/stable/parallel/parallel_details.html#caveats (section "What is sendable?")
> Actually, my gutt feeling is also that I would want to use a > 'SafeFunction' here, rather than a lamba. > > > There are many other remarks that come to mind, for instance the fact > that you are casting the iterable to a list (line 513) will blow the > memory. You will need a dispatch mechanism to avoid blowing the memory, > but that's a lot more work. > > Definitely. That is one of the many hacks I did. > In other words, there is still quite some work to make this scale. > > I totally agree. That's why it is not a pull request ;) > A quick fix (for your deadline :$) would be to use a transformer that > transforms URI (for instance filenames) to datasets by loading them from > a data store. That way you are doing the GridSearchCV on a fairly small > volume of data, simply the URIs, and the heavy loading of the data would > be delayed to the workers. > > That is a good idea in general, but doesn't apply to the trees. Thanks any way :) ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
