Could you please elaborate how drivers can be restarted automatically ? Thanks,
On Mon, Apr 14, 2014 at 10:30 AM, Aaron Davidson <ilike...@gmail.com> wrote: > Master and slave are somewhat overloaded terms in the Spark ecosystem (see > the glossary: > http://spark.apache.org/docs/latest/cluster-overview.html#glossary). Are > you actually asking about the Spark "driver" and "executors", or the > standalone cluster "master" and "workers"? > > To briefly answer for either possibility: > (1) Drivers are not fault tolerant but can be restarted automatically, > Executors may be removed at any point without failing the job (though > losing an Executor may slow the job significantly), and Executors may be > added at any point and will be immediately used. > (2) Standalone cluster Masters are fault tolerant and failure will only > temporarily stall new jobs from starting or getting new resources, but does > not affect currently-running jobs. Workers can fail and will simply cause > jobs to lose their current Executors. New Workers can be added at any point. > > > > On Mon, Apr 14, 2014 at 11:00 AM, Ian Ferreira <ianferre...@hotmail.com>wrote: > >> Folks, >> >> I was wondering what the failure support modes where for Spark while >> running jobs >> >> >> 1. What happens when a master fails >> 2. What happens when a slave fails >> 3. Can you mid job add and remove slaves >> >> >> Regarding the install on Meso, if I understand correctly the Spark master >> is behind a Zookeeper quorum so that isolates the slaves from a master >> failure, but what about the masters behind quorum? >> >> Cheers >> - Ian >> >> >