On 02/01/2012 04:03 PM, Gael Varoquaux wrote:
> On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote:
>    
>> I started working with IPython.parallel for training the trees using joblib.
>> It works in principal, but it is SLOW.
>> The time between starting and the jobs arriving at the engines is really
>> long.
>> I'm sending around 20.000x2000 float64 matrices, but this is gigabit
>> ethernet and I wouldn't
>> expect it to take like 10-20 seconds (haven't measured exactly).
>>      
> IPython uses pickling, which is really slow.
>
>    
Really? I thought it would handle Numpy arrays explicitly.
That is how I understood 
http://ipython.org/ipython-doc/stable/parallel/parallel_details.html#caveats
(section "What is sendable?")

> Actually, my gutt feeling is also that I would want to use a
> 'SafeFunction' here, rather than a lamba.
>
>    
> There are many other remarks that come to mind, for instance the fact
> that you are casting the iterable to a list (line 513) will blow the
> memory. You will need a dispatch mechanism to avoid blowing the memory,
> but that's a lot more work.
>
>    
Definitely. That is one of the many hacks I did.
> In other words, there is still quite some work to make this scale.
>
>    
I totally agree. That's why it is not a pull request ;)
> A quick fix (for your deadline :$) would be to use a transformer that
> transforms URI (for instance filenames) to datasets by loading them from
> a data store. That way you are doing the GridSearchCV on a fairly small
> volume of data, simply the URIs, and the heavy loading of the data would
> be delayed to the workers.
>
>    
That is a good idea in general, but doesn't apply to the trees.
Thanks any way :)

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to