[
https://issues.apache.org/jira/browse/YARN-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lujie updated YARN-9238:
------------------------
Description:
See
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate
{code:java}
// Allocate OPPORTUNISTIC containers.
171. SchedulerApplicationAttempt appAttempt =
172. ((AbstractYarnScheduler)rmContext.getScheduler())
173. .getApplicationAttempt(appAttemptId);
174.
175. OpportunisticContainerContext oppCtx =
176. appAttempt.getOpportunisticContainerContext();
177. oppCtx.updateNodeList(getLeastLoadedNodes());
{code}
MRAppmaster crashes before before allocate#171, ResourceManager will start the
new appAttempt and do
{code:java}
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T
currentAttempt){
this.currentAttempt = currentAttempt;
}{code}
hence the allocate#171 will get the new appAttmept and its field
OpportunisticContainerContext hasn't been initialized.
so oopCtx ==null at and null pointer happens at line 177
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177)
at
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code}
was:
See
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate
{code:java}
// Allocate OPPORTUNISTIC containers.
171. SchedulerApplicationAttempt appAttempt =
172. ((AbstractYarnScheduler)rmContext.getScheduler())
173. .getApplicationAttempt(appAttemptId);
174.
175. OpportunisticContainerContext oppCtx =
176. appAttempt.getOpportunisticContainerContext();
177. oppCtx.updateNodeList(getLeastLoadedNodes());
{code}
if "allocate" arrive at line#171 and MRAppmaster crashes, ResourceManager will
start the new appAttempt and do
{code:java}
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T
currentAttempt){
this.currentAttempt = currentAttempt;
}{code}
the new appAttmept hasn't init its field OpportunisticContainerContext , hence
oopCtx ==null and null pointer happens at line 177
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177)
at
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code}
> Allocate on previous or removed or non existent application attempt
> -------------------------------------------------------------------
>
> Key: YARN-9238
> URL: https://issues.apache.org/jira/browse/YARN-9238
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: lujie
> Assignee: lujie
> Priority: Critical
> Attachments: YARN-9238_1.patch, YARN-9238_2.patch, YARN-9238_3.patch,
> hadoop-test-resourcemanager-hadoop11.log
>
>
> See
> org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate
> {code:java}
> // Allocate OPPORTUNISTIC containers.
> 171. SchedulerApplicationAttempt appAttempt =
> 172. ((AbstractYarnScheduler)rmContext.getScheduler())
> 173. .getApplicationAttempt(appAttemptId);
> 174.
> 175. OpportunisticContainerContext oppCtx =
> 176. appAttempt.getOpportunisticContainerContext();
> 177. oppCtx.updateNodeList(getLeastLoadedNodes());
> {code}
> MRAppmaster crashes before before allocate#171, ResourceManager will start
> the new appAttempt and do
> {code:java}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T
> currentAttempt){
> this.currentAttempt = currentAttempt;
> }{code}
> hence the allocate#171 will get the new appAttmept and its field
> OpportunisticContainerContext hasn't been initialized.
> so oopCtx ==null at and null pointer happens at line 177
> {code:java}
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177)
> at
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]