How does Apache Spark handles system failure when deployed in YARN?

Matthias Kricke Wed, 16 Jul 2014 00:22:26 -0700

Hello @ the mailing list,

We think of using spark in one of our projects in a Hadoop cluster. During 
evaluation several questions remain which are stated below.


Preconditions
Let's assume Apache Spark is deployed on a hadoop cluster using YARN. 
Furthermore a spark execution is running. How does spark handle the situations 
listed below?
Cases & Questions
1.     One node of the hadoop clusters fails due to a disc error. However 
replication is high enough and no data was lost.
*        What will happen to tasks that where running at that node?
2.     One node of the hadoop clusters fails due to a disc error. Replication 
was not high enough and data was lost. Simply spark couldn't find a file 
anymore which was pre-configured as resource for the work flow.
*        How will it handle this situation?
3.     During execution the primary namenode fails over.
*        Did spark automatically use the fail over namenode?
*        What happens when the secondary namenode fails as well?
4.     For some reasons during a work flow the cluster is totally shut down.
*        Will spark restart with the cluster automatically?
*        Will it resume to the last "save" point during the work flow?

Thanks in advance. :)
Best regards
Matthias Kricke

How does Apache Spark handles system failure when deployed in YARN?

Reply via email to