[ 
https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874223#comment-15874223
 ] 

Rohith Sharma K S commented on YARN-6207:
-----------------------------------------

Thanks [~sunilg] for summarizing the discussion.
Adding more points what we discussed offline, we found many more corner cases 
that can create an issue in move application. It time to handle all those 
corner scenarios. And also we discussed that move application should 
*never-fail* if application attempt is not added yet. So, fix can be made from 
scheduler so that application can be moved even if attempt is NOT-added OR 
called done application attempt during attempt removal. 
May be new flag is required to identify in SchedulerApplicationAttempt that 
done-app-attempt is called-off or not. This will ensure move application need 
not fail if application is already in accepted state. 

> Move application can  fail when attempt add event is delayed
> ------------------------------------------------------------
>
>                 Key: YARN-6207
>                 URL: https://issues.apache.org/jira/browse/YARN-6207
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: YARN-6207.001.patch, YARN-6207.002.patch
>
>
> *Steps to reproduce*
> 1.Submit application  and delay attempt add to Scheduler
> (Simulate using debug at EventDispatcher for SchedulerEventDispatcher)
> 2. Call move application to destination queue.
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659)
>       at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1429)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1339)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115)
>       at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398)
>       ... 16 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to