Hi,As I know when spark run a job, it will build a DAG. Then split the DAG into stages. Each stage has task set, the taskscheduler will send the task set to workers and wait until the stage complete then start the next stage. If it is true, I want to know when one worker is failed, the taskscheduler can resubmit the lineage information and the input RDD to another worker. If the input RDD is not on the failed worker node, it is worked that new worker node can fetch the RDD from the node which contain the input RDD. However, if the input RDD is in failed worker node, how do the fault-tolerant function? Is it use checkpoint to store the input RDD to file(such as HDFS)?Thanks
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Fault-tolerant-question-tp1008.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
