[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074366#comment-14074366 ]
Rohith commented on YARN-2209: ------------------------------ Hi [~jianhe], I reviewed patch and found some comments 1. Missing lastResponseID=0 in RMContainerAllocator#getResources(). {code} catch (ApplicationMasterNotRegisteredException e) { LOG.info("ApplicationMaster is out of sync with ResourceManager," + " hence resync and send outstanding requests."); // RM may have restarted, re-register with RM. register(); addOutstandingRequestOnResync(); return null; } {code} 2. In AMRMClientAsyncImpl, below code may loose one response since it is not adding back to responseQueue when InterruptedException ocure. This may be worst case, but still it can ocure may because java itself Interrupting or os may be Interrupting. Can we add reponse back to responseQueue on InterruptedException? {code} if (response != null) { try { responseQueue.put(response); break; } catch (InterruptedException ex) { LOG.debug("Interrupted while waiting to put on response queue", ex); } {code} > Replace allocate#resync command with ApplicationMasterNotRegisteredException > to indicate AM to re-register on RM restart > ------------------------------------------------------------------------------------------------------------------------ > > Key: YARN-2209 > URL: https://issues.apache.org/jira/browse/YARN-2209 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch > > > YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate > application to re-register on RM restart. we should do the same for > AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.2#6252)