Rohith commented on YARN-2209:

Hi [~jianhe], I reviewed patch and found some comments

1. Missing lastResponseID=0 in RMContainerAllocator#getResources(). 
catch (ApplicationMasterNotRegisteredException e) {
      LOG.info("ApplicationMaster is out of sync with ResourceManager,"
          + " hence resync and send outstanding requests.");
      // RM may have restarted, re-register with RM.
      return null;

2. In AMRMClientAsyncImpl, below code may loose one response since it is not 
adding back to responseQueue when InterruptedException ocure. This may be worst 
case, but still it can ocure may because java itself Interrupting or os may be 
Can we add reponse back to responseQueue on InterruptedException?

          if (response != null) {
             try {
             } catch (InterruptedException ex) {
               LOG.debug("Interrupted while waiting to put on response queue", 

> Replace allocate#resync command with ApplicationMasterNotRegisteredException 
> to indicate AM to re-register on RM restart
> ------------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2209
>                 URL: https://issues.apache.org/jira/browse/YARN-2209
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.

This message was sent by Atlassian JIRA

Reply via email to