Github user dhruve commented on the issue: https://github.com/apache/spark/pull/19194 @squito Thanks for pointing that out. What you mentioned makes sense and I did dig some more on the `DAGScheduler` and `activeJobForStage` to gather more context. We could take into account the properties of the active job when the stage is submitted, however this behavior is indeterministic. Let's say if we have two jobs from two different job groups with different threshold of task concurrency, the one that's submitted first wins as the stage won't be recomputed for the second job. In this case, there is no control over which job can get submitted first before the second one (unless the user explicitly serializes them). The problem aggravates when the difference between the task concurrency threshold is large for the two jobs. In such a case, having a wrong value can completely take down your remote service. For a deterministic behavior, I believe the best way to tackle this would be to handle in the stage properties as was the ask. However, since it involves an API change, I didn't go that route as the scope for that could be much broader. If we have more fundamental use cases which require adding something like this on the stage level, we should continue in that direction if the community is open and welcomes an API change.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org