baohe-zhang opened a new pull request #31945:
URL: https://github.com/apache/spark/pull/31945


   ### What changes were proposed in this pull request?
   In ProcfsMetricsGetter.scala, propogating IOException from 
addProcfsMetricsFromOneProcess to computeAllMetrics when the child pid's proc 
stat file is unavailable. As a result, the for-loop in computeAllMetrics() can 
terminate earlier and return an all-0 procfs metric.
   
   ### Why are the changes needed?
   In the case of a child pid's stat file missing and the subsequent child 
pids' stat files exist, ProcfsMetricsGetter.computeAllMetrics() will return 
partial metrics (the sum of a subset of child pids), which can be misleading 
and is undesired per the existing code comments in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala#L214.
   
   Also, a side effect of this bug is that it can lead to a verbose warning log 
if many pids' stat files are missing. An early terminating can make the warning 
logs more concise.
   
   The unit test can also explain the bug well.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   A unit test is added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to