ajithme opened a new pull request #27234: [SPARK-23626][CORE] DAGScheduler blocked due to JobSubmitted event URL: https://github.com/apache/spark/pull/27234 ### What changes were proposed in this pull request? Forcing partition evaluation in `callsite` thread before sending `org.apache.spark.scheduler.JobSubmitted` event to `org.apache.spark.scheduler.DAGScheduler#eventProcessLoop` can help in mitigation against job submission event blocking the `DAGScheduler` thread ### Why are the changes needed? `DAGScheduler` becomes a bottleneck in cluster when multiple `JobSubmitted` events has to be processed as `DAGSchedulerEventProcessLoop` is single threaded and it will block other tasks in queue like `TaskCompletion`. The `JobSubmitted` event is time consuming depending on the nature of the job (Example: calculating parent stage dependencies, shuffle dependencies, partitions) and thus it blocks all the events to be processed. Similarly in my cluster some jobs partition calculation is time consuming (Similar to stack at SPARK-2647) hence it slows down the spark `DAGSchedulerEventProcessLoop` which results in user jobs to slowdown, even if its tasks are finished within seconds, as `TaskCompletion` Events are processed at a slower rate due to blockage. Refer: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Scheduler-Spark-DAGScheduler-scheduling-performance-hindered-on-JobSubmitted-Event-td23562.html I see multiple JIRA referring to this behavior https://issues.apache.org/jira/browse/SPARK-2647 https://issues.apache.org/jira/browse/SPARK-4961 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added UT to reproduce and evaluate fix.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
