[jira] [Comment Edited] (YARN-7373) The atomicity of container update in RM is not clear

Arun Suresh (JIRA) Fri, 20 Oct 2017 14:41:15 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213280#comment-16213280
 ]


Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM:
-------------------------------------------------------------

[~haibochen] / [[email protected]]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is 
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} 
method in the SchedulerApplicationAttempt - during which the thread has 
acquired a write lock on the application. You don't need a lock on the queue 
and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the 
Container is running might have heart-beaten in - but that operation, 
releaseContainer, tries to take a lock on the app too, which will have to 
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we 
are good there
# It is possible that multiple container update requests (say container 
increase requests) for containers running on the same node can come in 
concurrently - but the flow is such that the actual resource allocation for the 
update is internally treated as a new (temporary) container container 
allocation - and just like any normal container allocation in the scheduler, 
they are serialized.
# It is possible that multiple container requests for the SAME container can 
come in too - but we have a container version that takes care of that.

Although - I do have to mention, that the code you pasted above - which is part 
of the changes in YARN-4511 can cause a few problems, since you are updating 
the node as well, and you might need a lock on the node before you do that.


was (Author: asuresh):
[~haibochen] / [[email protected]]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is 
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} 
method in the SchedulerApplicationAttempt - during which the thread has 
acquired a write lock on the application. You don't need a lock on the queue 
and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the 
Container is running might have heart-beaten in - but that operation, 
releaseContainer, tries to take a lock on the app too, which will have to 
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we 
are good there
# It is possible that multiple container update requests (say container 
increase requests) for containers running on the same node can come in 
concurrently - but the flow is such that the actual resource allocation for the 
update is internally treated as a new (temporary) container container 
allocation - and just like any normal container allocation in the scheduler, 
they are serialized.
# It is possible that multiple container requests for the SAME container can 
come in too - but we have a container version that takes care of that.

> The atomicity of container update in RM is not clear
> ----------------------------------------------------
>
>                 Key: YARN-7373
>                 URL: https://issues.apache.org/jira/browse/YARN-7373
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342       // notify schedulerNode of the update to correct resource accounting
> 343       node.containerUpdated(existingRMContainer, existingContainer);
> 344   
> 345       
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346       // notify SchedulerNode of the update to correct resource accounting
> 347       node.containerUpdated(tempRMContainer, tempContainer);
> 348   
> {code}
> bq. I think that it would be nicer to lock around these two calls to become 
> atomic.
> Container update, and thus container swap as part of that, is atomic 
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more 
> conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (YARN-7373) The atomicity of container update in RM is not clear

Reply via email to