[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127025#comment-17127025 ]
Eric Badger commented on YARN-9809: ----------------------------------- Patch 001 adds the feature but makes it opt-in via the config {{yarn.nodemanager.health-checker.run-before-startup}}. I didn't put in the retries flag for shutting down the NM if there are a certain number of failures. I can do that in a subsequent patch if you'd like. But I tested this patch out and it seems to work. > NMs should supply a health status when registering with RM > ---------------------------------------------------------- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Eric Badger > Assignee: Eric Badger > Priority: Major > Attachments: YARN-9809.001.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org