baohe-zhang opened a new pull request #31871:
URL: https://github.com/apache/spark/pull/31871


   ### What changes were proposed in this pull request?
   Allow ExecutoMetricsPoller to keep stage entries in stageTCMP until a 
heartbeat occurs even if the entries have task count = 0.
   
   ### Why are the changes needed?
   This is a bug fix. 
   
   The current implementation of ExecutoMetricsPoller uses task count in each 
stage to decide whether to keep a stage entry or not. In the case of the 
executor only has 1 core, it may have these issues:
   
   1. Peak metrics missing (due to stage entry being removed within a heartbeat 
interval)
   2. Unnecessary and frequent hashmap entry removal and insertion.
   
   The detailed workflows of how entry removal causes the issue can be found on 
the SPARK-34779.
   
   This patch avoids peak polling metrics missing and helps 
ExecutoMetricsPoller report more accurate peak metrics for each active stage.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Manual test this patch by running jobs with custom metrics polling interval 
and observe the debug logs of ExecutoMetricsPoller.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to