Andreas: you should do some timing tests for data transfer using the
plain numpy + IPython.parallel API (without scikit-learn nor joblib)
to check that you are able to broadcast your data efficiently without
memory copy.

Once you have optimal time check that you can build an application in
reverse: introduce joblib.Parallel and check you are able to reproduce
the same timings on dummy examples (e.g. with a function that does
nothing) and if it works reintroduce scikit-learn's GridSearchCV API.

I suspect that at some level or another an abstraction is wrapping the
arrays into a more complex objects that prevents IPython.parallel to
send the data efficiently. By starting from the ground up you should
be able to pin-point the culprit abstraction.

Also if you use the scipy.sparse datastructures you will need some
boilerplate code to unwrap the underlying arrays.

-- 
Olivier

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to