ajithme opened a new pull request #27234: [SPARK-23626][CORE] DAGScheduler 
blocked due to JobSubmitted event
URL: https://github.com/apache/spark/pull/27234
 
 
   ### What changes were proposed in this pull request?
   Forcing partition evaluation in `callsite` thread before sending 
`org.apache.spark.scheduler.JobSubmitted` event to 
`org.apache.spark.scheduler.DAGScheduler#eventProcessLoop` can help in 
mitigation against job submission event blocking the `DAGScheduler` thread
   
   ### Why are the changes needed?
   `DAGScheduler` becomes a bottleneck in cluster when multiple `JobSubmitted` 
events has to be processed as `DAGSchedulerEventProcessLoop` is single threaded 
and it will block other tasks in queue like `TaskCompletion`.
   The `JobSubmitted` event is time consuming depending on the nature of the 
job (Example: calculating parent stage dependencies, shuffle dependencies, 
partitions) and thus it blocks all the events to be processed.
   
   Similarly in my cluster some jobs partition calculation is time consuming 
(Similar to stack at SPARK-2647) hence it slows down the spark 
`DAGSchedulerEventProcessLoop` which results in user jobs to slowdown, even if 
its tasks are finished within seconds, as `TaskCompletion` Events are processed 
at a slower rate due to blockage.
   
   Refer: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Scheduler-Spark-DAGScheduler-scheduling-performance-hindered-on-JobSubmitted-Event-td23562.html
   
   I see multiple JIRA referring to this behavior
   https://issues.apache.org/jira/browse/SPARK-2647
   https://issues.apache.org/jira/browse/SPARK-4961
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   Added UT to reproduce and evaluate fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to