2013/11/29 Michal Romaniuk <michal.romaniu...@imperial.ac.uk>: > I was wondering what would work better for distributing cross-validation > jobs: IPython parallel or Spark? I tried with IPython parallel in the > past but remember having some issues with jobs crashing etc.
Spark is higher level than IPython.parallel and more fault tolerant primitives. However I have never tried to implement cross-validation and grid search on top of spark. Also it might not be trivial to implement for batch models that do not provide a partial_fit method. It's possible to buffer copies of the streamed dataset on each worker but then making it memory efficient is probably harder than with IPython parallel where it's possible to memory map the input data. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general