Zhankun Tang created YARN-8823:
----------------------------------
Summary: Monitor the healthy state of GPU
Key: YARN-8823
URL: https://issues.apache.org/jira/browse/YARN-8823
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Zhankun Tang
We have GPU resource discovered when the NM bootstrap but not updated through
later heatbeat with RM. There should be a monitoring mechanism to check GPU
healthy status from time to time and also the corresponding handling.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]