[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bibin A Chundatt updated YARN-6207: ----------------------------------- Attachment: YARN-6207.004.patch Thank you [~naganarasimha...@apache.org]/[~sunilg]/[~rohithsharma] for comments {quote} I think !app.isStopped() can be done at upper level along with null check. if (null != app || !app.isStopped() ) nit : change null check with java code style i.e app!=null. {quote} Incase of application submitted with transferFromPreviousAttempt in app context. Live containers metrics needs to be updated in queues {quote} 1. app.move(dest); is invoked event when app is STOPPED. Internally it updates queue metrics in source queue and also in appScheduling info (which also is stopped). I think if app is stopped, we can assume that all internal metrics of the app is released from source queue. Hence we may not need to do the same again in move. please check once . {quote} Since live container metrics need to be updated {{app.move}} we can skip only the appSchedulingInfo update when stopped. {quote} 2. abstractUsersManager.deactivateApplication(user, applicationId); this is invoked from app.move(). So do we need to call LQ.finishApplication() except the fact that queue may have to be moved to STOPPED if it was draining. {quote} As mentioned appSchedulingInfo update we can skip incase of stopped attempts. {quote} 3. FS also need a null check for attempt, correct? {quote} SchedulerApplication null check i have handled in latest patch. Incase if the comment is regarding Fair Scheduler currently we will handle only Capacity scheduler cases in this jira {quote} 1.one corner case when ClientRMService validates app state is still running but when it reaches scheduler application might have got completed hence to be safe just we can check whether scheduler application is not null for appId. {quote} Done {quote} Can we think of moving dest.submitApplication(appId, user, destQueueName); below if (null != app) block so that its better we finish handling all attempt related stuff and then updated the application related modifcations ? {quote} As discussed offline validation for application submit to queue is done in queue.submitApplication. Only when limits are reached we should update attempt level metrics.This is part of existing flow so no need to change. {quote} ln 2058, i think we can directly get application.getCurrentAppAttempt {quote} Done {quote} ln 2103, Was wondering the queue partition information needs to be checked even if the attempt doesn't exist, thoughts? {quote} When Fica app is null means the schedulingInfo based partition we will not be able get as per current implementation .This we can skip probably another jira we can file for the same. {quote} comment at ln 2067 can be moved to just before {{source.finishApplicationAttempt} {quote} Done > Move application can fail when attempt add event is delayed > ------------------------------------------------------------ > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org