dhruve commented on pull request #31945:
URL: https://github.com/apache/spark/pull/31945#issuecomment-805934562


   The most common scenario where this could happen is that a pyspark daemon is 
launched and its pid captured while iterating through the parent pid to get all 
subsequent child pids. But by the time we stat the child pid, it is reclaimed 
for being idle (1 minute in this case). So the chances of this happening seem 
to be rare but it is a possibility.
   
   Apart from that, it would be interesting to know when this would happen. 
@baohe-zhang  what scenario did you encounter this?
   
   We can also update the comment for it to reflect that this is a best effort 
mechanism to report the metrics. Having some data is better than having no data 
IMO. Consider a situation where this happens in the lifetime of a container and 
we would see the metrics reported as => V1 ... V2 ... 0 (A sudden blip) ... V3 
... V4. Having a partial value seems to be still a better option here as the 
data is more useful, it might not be 100% accurate but still close enough to 
real usage unless this is caused by a nasty bug which happens to often to 
ignore the actual usage of the container.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to