[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201136#comment-17201136 ]
Jim Brennan commented on YARN-9809: ----------------------------------- I finished a first pass. Here are my comments: NodeHealthScriptRunner * Need to add code to Nodemanager to get the runBeforeStartup conf and pass it to constructor. * Need to make startup run optional based on runBeforeStartup RegisterNodeManagerRequest * See the trunk version of the patch. You should only have to add the new parameter to the last newInstance() interface, and have the second to last pass null. * This might reduce the number of tests you need to modify. RMNodeImpl * addNodeTransition - I think this line should this line be removed? {noformat} // Increment activeNodes explicitly because this is a new node. ClusterMetrics.getMetrics().incrNumActiveNodes(); {noformat} * updateMetricsForRejoinedNode - think we need to remove metrics.incrNumActiveNodes(); TestRMNodeTransitions * new testAddUnhealthyNode() test is not here These should not be needed if you fix constructors for RegisterNodeManagerRequest * TestProtocolRecords * TestRegisterNodeManagerRequest * TestResourceTrackerOnHA * TestYarnServerApiClasses > NMs should supply a health status when registering with RM > ---------------------------------------------------------- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Eric Badger > Assignee: Eric Badger > Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, > YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, > YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org