Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2311#discussion_r17951431 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -91,7 +94,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, if (sc != null) { logInfo("Invoking sc stop from shutdown hook") sc.stop() - finish(FinalApplicationStatus.SUCCEEDED) + } + + // Shuts down the AM. + if (!finished) { --- End diff -- Yeah, unless we want to add more logic about trying to figure out what shouldn't be retried, the easy thing is to just retry on any failure. Obviously if the machine dies this code won't be running, its more that something weird happens causing it to crash or exit badly. There are actually some potential issues with rerunning the AM though. One is what we refer to as split brain (one AM losing connection from RM but still running so it starts a second AM) and both write to the same output dir and cause issues with the output data. I filed a jira for this to try to handle in Spark AM. The second occurs if the fist run had committed its output and we rerun it when shouldn't. The reason we don't want that to happen is to prevent data corruption. Many times in MR one job will start once anothers output is committed, so if it was to get changed out from under them by a rerun of the AM it could lose data. I'm not sure that same kind of check is as easy with Spark. MR handles both of those cases. Obviously if your MR job is writing to some other service or using custom fileoutput or has some other side effects its up to the user to guarantee that it can be rerun. I'm assuming its the users responsibility with Spark since spark can rerun tasks/stages on failure. Any input on that Matei?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org