Re: Spark resilience

Manoj Samel Mon, 14 Apr 2014 14:09:27 -0700

Could you please elaborate how drivers can be restarted automatically ?

Thanks,



On Mon, Apr 14, 2014 at 10:30 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> Master and slave are somewhat overloaded terms in the Spark ecosystem (see
> the glossary:
> http://spark.apache.org/docs/latest/cluster-overview.html#glossary). Are
> you actually asking about the Spark "driver" and "executors", or the
> standalone cluster "master" and "workers"?
>
> To briefly answer for either possibility:
> (1) Drivers are not fault tolerant but can be restarted automatically,
> Executors may be removed at any point without failing the job (though
> losing an Executor may slow the job significantly), and Executors may be
> added at any point and will be immediately used.
> (2) Standalone cluster Masters are fault tolerant and failure will only
> temporarily stall new jobs from starting or getting new resources, but does
> not affect currently-running jobs. Workers can fail and will simply cause
> jobs to lose their current Executors. New Workers can be added at any point.
>
>
>
> On Mon, Apr 14, 2014 at 11:00 AM, Ian Ferreira <ianferre...@hotmail.com>wrote:
>
>> Folks,
>>
>> I was wondering what the failure support modes where for Spark while
>> running jobs
>>
>>
>>    1. What happens when a master fails
>>    2. What happens when a slave fails
>>    3. Can you mid job add and remove slaves
>>
>>
>> Regarding the install on Meso, if I understand correctly the Spark master
>> is behind a Zookeeper quorum so that isolates the slaves from a master
>> failure, but what about the masters behind quorum?
>>
>> Cheers
>> - Ian
>>
>>
>

Re: Spark resilience

Reply via email to