Bikas Saha commented on YARN-1366:

Why are we returning the old allocateResponse to the user? What is the user 
expected to do with this allocateResponse that has a RESYNC command in it? 
Should we make a second call to allocate (after re-registering) and then send 
that response back up to the user?
{code}+        // re register with RM
+        registerApplicationMaster();
+        return allocateResponse;
+      }{code}

There needs to be some clear documentation that if the user has not removed 
container requests that have already been satisfied, then the re-register may 
end up sending the entire ask list to the RM (including matched requests). 
Which would mean the RM could end up giving it a lot of new allocated 

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -------------------------------------------------------------------------------------
>                 Key: YARN-1366
>                 URL: https://issues.apache.org/jira/browse/YARN-1366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Rohith
>         Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
> YARN-1366.11.patch, YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, 
> YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, 
> YARN-1366.9.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
> YARN-1366.prototype.patch
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.

This message was sent by Atlassian JIRA

Reply via email to