Our Spark cluster is configured to write application history event logging to a directory on HDFS. This all works fine. (I've tested it with Spark shell.)

However, on a large, long-running job that we ran tonight, one of our machines at the cloud provider had issues and had to be terminated and replaced in the middle of the job.

The job completed correctly, and shows in state FINISHED in the "Completed Applications" section of the Spark GUI. However, when I try to look at the application's history, the GUI says "Application history not found" and "Application ... is still in progress".

The reason appears to be the machine that was terminated. When I click on the executor list for that job, Spark is showing the executor from the terminated machine as still in state RUNNING.

Any solution/workaround for this?  BTW, I'm running Spark v1.3.0.

Thanks,

DR

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to