Mark Hamstra created SPARK-10666:
------------------------------------

             Summary: Use properties from ActiveJob associated with a Stage
                 Key: SPARK-10666
                 URL: https://issues.apache.org/jira/browse/SPARK-10666
             Project: Spark
          Issue Type: Bug
          Components: Scheduler, Spark Core
    Affects Versions: 1.5.0, 1.4.1
            Reporter: Mark Hamstra
            Assignee: Mark Hamstra


This issue was addressed in #5494, but the fix in that PR, while safe in the 
sense that it will prevent the SparkContext from shutting down, misses the 
actual bug. The intent of submitMissingTasks should be understood as "submit 
the Tasks that are missing for the Stage, and run them as part of the ActiveJob 
identified by jobId". Because of a long-standing bug, the jobId parameter was 
never being used. Instead, we were trying to use the jobId with which the Stage 
was created -- which may no longer exist as an ActiveJob, hence the crash 
reported in SPARK-6880.

The correct fix is to use the ActiveJob specified by the supplied jobId 
parameter, which is guaranteed to exist at the call sites of submitMissingTasks.

This fix should be applied to all maintenance branches, since it has existed 
since 1.0.

Tasks for a Stage that was previously part of a Job that is no longer active 
would be re-submitted as though they were part of the prior Job and with no 
properties set. Since properties are what are used to set an other-than-default 
scheduling pool, this would affect FAIR scheduler usage, but it would also 
affect anything else that depends on the settings of the properties (which 
would be just user code at this point, since Spark itself doesn't really use 
the properties for anything else other than Job Group and Description, which 
end up in the WebUI, can be used to kill by JobGroup, etc.) Even the default, 
FIFO scheduling would be affected, however, since the resubmission of the Tasks 
under the earlier jobId would effectively give them a higher priority/greater 
urgency than the ActiveJob that now actually needs them. In any event, the 
Tasks would generate correct results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to