dhruve commented on pull request #31945: URL: https://github.com/apache/spark/pull/31945#issuecomment-805934562
The most common scenario where this could happen is that a pyspark daemon is launched and its pid captured while iterating through the parent pid to get all subsequent child pids. But by the time we stat the child pid, it is reclaimed for being idle (1 minute in this case). So the chances of this happening seem to be rare but it is a possibility. Apart from that, it would be interesting to know when this would happen. @baohe-zhang what scenario did you encounter this? We can also update the comment for it to reflect that this is a best effort mechanism to report the metrics. Having some data is better than having no data IMO. Consider a situation where this happens in the lifetime of a container and we would see the metrics reported as => V1 ... V2 ... 0 (A sudden blip) ... V3 ... V4. Having a partial value seems to be still a better option here as the data is more useful, it might not be 100% accurate but still close enough to real usage unless this is caused by a nasty bug which happens to often to ignore the actual usage of the container. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
