Hello @ the mailing list, We think of using spark in one of our projects in a Hadoop cluster. During evaluation several questions remain which are stated below.
Preconditions Let's assume Apache Spark is deployed on a hadoop cluster using YARN. Furthermore a spark execution is running. How does spark handle the situations listed below? Cases & Questions 1. One node of the hadoop clusters fails due to a disc error. However replication is high enough and no data was lost. * What will happen to tasks that where running at that node? 2. One node of the hadoop clusters fails due to a disc error. Replication was not high enough and data was lost. Simply spark couldn't find a file anymore which was pre-configured as resource for the work flow. * How will it handle this situation? 3. During execution the primary namenode fails over. * Did spark automatically use the fail over namenode? * What happens when the secondary namenode fails as well? 4. For some reasons during a work flow the cluster is totally shut down. * Will spark restart with the cluster automatically? * Will it resume to the last "save" point during the work flow? Thanks in advance. :) Best regards Matthias Kricke