[ 
https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201136#comment-17201136
 ] 

Jim Brennan commented on YARN-9809:
-----------------------------------

I finished a first pass.  Here are my comments:

NodeHealthScriptRunner
* Need to add code to Nodemanager to get the runBeforeStartup conf and pass it 
to constructor.
* Need to make startup run optional based on runBeforeStartup

RegisterNodeManagerRequest
* See the trunk version of the patch. You should only have to add the new 
parameter to the last newInstance() interface, and have the second to last pass 
null.
* This might reduce the number of tests you need to modify.

RMNodeImpl
* addNodeTransition - I think this line should this line be removed?
{noformat}
// Increment activeNodes explicitly because this is a new node.
ClusterMetrics.getMetrics().incrNumActiveNodes();
{noformat}
* updateMetricsForRejoinedNode - think we need to remove 
metrics.incrNumActiveNodes();

TestRMNodeTransitions
* new testAddUnhealthyNode() test is not here

These should not be needed if you fix constructors for 
RegisterNodeManagerRequest
* TestProtocolRecords
* TestRegisterNodeManagerRequest
* TestResourceTrackerOnHA
* TestYarnServerApiClasses


> NMs should supply a health status when registering with RM
> ----------------------------------------------------------
>
>                 Key: YARN-9809
>                 URL: https://issues.apache.org/jira/browse/YARN-9809
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, 
> YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, 
> YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch
>
>
> Currently if the NM registers with the RM and it is unhealthy, it can be 
> scheduled many containers before the first heartbeat. After the first 
> heartbeat, the RM will mark the NM as unhealthy and kill all of the 
> containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to