[
https://issues.apache.org/jira/browse/YARN-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236488#comment-16236488
]
Jian He commented on YARN-7371:
-------------------------------
Patch looks good to me overall, some comments:
- This method can be removed as it’s only used by this class itself
{code}
public Token createContainerToken(ContainerId containerId,
int containerVersion, NodeId nodeId, String appSubmitter,
Resource capability, Priority priority, long createTime,
LogAggregationContext logAggregationContext, String nodeLabelExpression,
ContainerType containerType) {
return createContainerToken(containerId, containerVersion, nodeId,
appSubmitter, capability, priority, createTime, null, null,
ContainerType.TASK, ExecutionType.GUARANTEED, -1);
}
{code}
- For testRecoverComponentsAfterRMRestart, can you also check that the
containers retrieved by serviceClient#getStatus are old containers of the 1st
attempt, i.e. no containers are getting relaunched because of AM restart.
> NPE in ServiceMaster after RM is restarted and then the ServiceMaster is
> killed
> -------------------------------------------------------------------------------
>
> Key: YARN-7371
> URL: https://issues.apache.org/jira/browse/YARN-7371
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Major
> Attachments: YARN-7371-yarn-native-services.001.patch,
> YARN-7371-yarn-native-services.002.patch,
> YARN-7371-yarn-native-services.003.patch,
> YARN-7371-yarn-native-services.004.patch,
> YARN-7371-yarn-native-services.005.patch
>
>
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.service.ServiceScheduler.recoverComponents(ServiceScheduler.java:313)
> at
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:265)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:150)
> Steps:
> 1. Stopped RM and then started it
> 2. Application was still running
> 3. Killed the ServiceMaster to check if it recovers
> 4. Next attempt failed with the above exception
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]