[ 
https://issues.apache.org/jira/browse/YARN-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544216#comment-13544216
 ] 

Jason Lowe commented on YARN-308:
---------------------------------

bq. Would changing it to be a pure absolute or pure delta protocol break 
things? It seems like this would be easier to understand.

Pure absolute will put extra load on the network and RM.  Many of our 
applications run with thousands of tasks, most of which are requested upon 
application startup.  The AM heartbeats every second, and having to send all 
those requests *every* heartbeat will take more processing (and generate more 
garbage if it blindly replaces).  We also run clusters with hundreds or 
thousands of running applications, and that's a lot of requests to plow through 
every second if they're all sending absolutes.

Pure delta is also a bit tricky re: race conditions.  For example:

# AM asks for a container on host A, incrementing the ask for that host from 0 
to 1.
# RM eventually allocates a container on host A, decrementing the ask for that 
host down to 0 and waits for the next heartbeat to inform the AM.
# However the next heartbeat has the AM deciding it no longer needs the 
container on host A so it decrements the ask count by 1.  If we're not careful, 
the ask count will be computed incorrectly due to the stale value the AM sees 
relative to the RM.

I think we can make it work pure-delta, but it has its own set of interesting 
cases.

                
> Improve documentation about what "asks" means in AMRMProtocol
> -------------------------------------------------------------
>
>                 Key: YARN-308
>                 URL: https://issues.apache.org/jira/browse/YARN-308
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: api, resourcemanager
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>             Fix For: 2.0.3-alpha
>
>
> It's unclear to me from reading the javadoc exactly what "asks" means when 
> the AM sends a heartbeat to the RM.  Is the AM supposed to send a list of all 
> resources that it is waiting for?  Or just inform the RM about new ones that 
> it wants?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to