2012/1/27 Andreas <[email protected]>: I would advise you to start by experimenting with your own version of GridSearchCV (by deriving from the version of sklearn) and passing a LoadBalancedView instance as argument to the constructor and use it in the fit method instead of calling joblib.
The same could be followed for the ensemble meta-estimators. If you can get something working in an efficient way on your cluster, put this on a gist and send an email on the mailing list and we will discuss how to best factorize this. This might be done by extending joblib to be able to deal with distributed infrastructure, or this could be done at sklearn level by refactoring the existing classes to make them more pluggable with IPython.parallel or this could be done by starting a new github repo for scikit-learn-cluster or something to contribute utilities to train and evaluate sklearn models on a HPC or cloud cluster. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
