Github user squito commented on the issue:
https://github.com/apache/spark/pull/19194
Sorry if I was unclear earlier on the issue w/ the active Job ID. So, I
agree, that if a user actually gets into this situation, where they've got two
different jobs for the same stage, with different max concurrent tasks, its
mostly a toss-up which one they'll get, as the users jobs are probably racing
to get to that stage. Still, I think its important that it pulls the max
concurrent tasks from the active job, just so that users can understand what is
going on and for consistency and debugability. The TaskSetManager gets the
property from the active job, which actually submitted the stage, so the
ExecutorAllocationManager should do the same.
I think the best way to ensure that is to add activeJobId to
SparkListenerStageSubmitted. Then you'd go back to just keeping a
jobIdToMaxConcurrentTasks map when handling onJobStart, and in
onStageSubmitted, you'd then figure out the max number of tasks for that stage,
given the job which actually submitted it.
@tgravescs what do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]