hi all, I have a job that runs about for 15 mins, at some point I get an error on both nodes (all executors) saying:
14/10/02 23:14:38 WARN TaskSetManager: Lost task 80.0 in stage 3.0 (TID 253, backend-tes): ExecutorLostFailure (executor lost) In the end, it seems that the job recovers and completes the task. Just wondering what is the best way to understand why these tasks failed (couldn't seem to find anything in the logs), and how to avoid in the future? thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-debug-ExecutorLostFailure-tp15646.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org