Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/186#discussion_r11101232
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -116,21 +119,30 @@ class DAGScheduler(
private val metadataCleaner =
new MetadataCleaner(MetadataCleanerType.DAG_SCHEDULER, this.cleanup,
env.conf)
- taskScheduler.setDAGScheduler(this)
-
/**
- * Starts the event processing actor. The actor has two
responsibilities:
- *
- * 1. Waits for events like job submission, task finished, task failure
etc., and calls
- * [[org.apache.spark.scheduler.DAGScheduler.processEvent()]] to
process them.
- * 2. Schedules a periodical task to resubmit failed stages.
- *
- * NOTE: the actor cannot be started in the constructor, because the
periodical task references
- * some internal states of the enclosing
[[org.apache.spark.scheduler.DAGScheduler]] object, thus
- * cannot be scheduled until the
[[org.apache.spark.scheduler.DAGScheduler]] is fully constructed.
+ * Starts the event processing actor within the supervisor. The
eventProcessingActor
+ * waits for events like job submission, task finished, task failure
etc., and calls
+ * [[org.apache.spark.scheduler.DAGScheduler.processEvent()]] to process
them.
*/
- def start() {
- eventProcessActor = env.actorSystem.actorOf(Props(new Actor {
+ env.actorSystem.actorOf(Props(new Actor {
+
+ override val supervisorStrategy =
+ OneForOneStrategy() {
+ case x: Exception => {
+ logError("eventProcesserActor failed due to the error %s;
shutting down SparkContext"
+ .format(x.getMessage))
+ doCancelAllJobs()
+ sc.stop()
+ Stop
--- End diff --
After the stages get cancelled, there are a bunch of calls back to the dag
scheduler in the TaskSetManager (because in the normal case, we tell the
DAGScheduler when all tasks end), so you're right that this seems potentially
problematic. @CodingCat what happens in this case (i.e., when the DAGScheduler
loop gets more events, after it has been stopped)?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---