Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/186#discussion_r11101486
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -116,21 +119,30 @@ class DAGScheduler(
private val metadataCleaner =
new MetadataCleaner(MetadataCleanerType.DAG_SCHEDULER, this.cleanup,
env.conf)
- taskScheduler.setDAGScheduler(this)
-
/**
- * Starts the event processing actor. The actor has two
responsibilities:
- *
- * 1. Waits for events like job submission, task finished, task failure
etc., and calls
- * [[org.apache.spark.scheduler.DAGScheduler.processEvent()]] to
process them.
- * 2. Schedules a periodical task to resubmit failed stages.
- *
- * NOTE: the actor cannot be started in the constructor, because the
periodical task references
- * some internal states of the enclosing
[[org.apache.spark.scheduler.DAGScheduler]] object, thus
- * cannot be scheduled until the
[[org.apache.spark.scheduler.DAGScheduler]] is fully constructed.
+ * Starts the event processing actor within the supervisor. The
eventProcessingActor
+ * waits for events like job submission, task finished, task failure
etc., and calls
+ * [[org.apache.spark.scheduler.DAGScheduler.processEvent()]] to process
them.
*/
- def start() {
- eventProcessActor = env.actorSystem.actorOf(Props(new Actor {
+ env.actorSystem.actorOf(Props(new Actor {
+
+ override val supervisorStrategy =
+ OneForOneStrategy() {
+ case x: Exception => {
+ logError("eventProcesserActor failed due to the error %s;
shutting down SparkContext"
+ .format(x.getMessage))
+ doCancelAllJobs()
+ sc.stop()
+ Stop
--- End diff --
Right, which may be enough as long as all we are trying to accomplish is a
clean shutdown of the whole system, not restarting all or part of it while
retaining state and partial results from running jobs. TaskManager's messages
won't go anywhere but the /deadLetters synthetic actor, but I think that's fine
as long as we avoid throwing uncaught exceptions etc. while trying to shutdown.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---