Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol 46, Issue 37

Olivier Grisel Fri, 29 Nov 2013 10:04:43 -0800

2013/11/29 Michal Romaniuk <[email protected]>:
> I was wondering what would work better for distributing cross-validation
> jobs: IPython parallel or Spark? I tried with IPython parallel in the
> past but remember having some issues with jobs crashing etc.


Spark is higher level than IPython.parallel and more fault tolerant
primitives. However I have never tried to implement cross-validation
and grid search on top of spark. Also it might not be trivial to
implement for batch models that do not provide a partial_fit method.
It's possible to buffer copies of the streamed dataset on each worker
but then making it memory efficient is probably harder than with
IPython parallel where it's possible to memory map the input data.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol 46, Issue 37

Reply via email to