[
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213280#comment-16213280
]
Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM:
-------------------------------------------------------------
[~haibochen] / [[email protected]]
So, like I mentioned in the earlier JIRA, what we have in trunk currently is
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}}
method in the SchedulerApplicationAttempt - during which the thread has
acquired a write lock on the application. You don't need a lock on the queue
and since there are no changes to the node, there is not need for that either.
# The only concurrent action that can happen, is that the Node where the
Container is running might have heart-beaten in - but that operation,
releaseContainer, tries to take a lock on the app too, which will have to
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we
are good there
# It is possible that multiple container update requests (say container
increase requests) for containers running on the same node can come in
concurrently - but the flow is such that the actual resource allocation for the
update is internally treated as a new (temporary) container container
allocation - and just like any normal container allocation in the scheduler,
they are serialized.
# It is possible that multiple container requests for the SAME container can
come in too - but we have a container version that takes care of that.
Although - I do have to mention, that the code you pasted above - which is part
of the changes in YARN-4511 can cause a few problems, since you are updating
the node as well, and you might need a lock on the node before you do that.
was (Author: asuresh):
[~haibochen] / [[email protected]]
So, like I mentioned in the earlier JIRA, what we have in trunk currently is
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}}
method in the SchedulerApplicationAttempt - during which the thread has
acquired a write lock on the application. You don't need a lock on the queue
and since there are no changes to the node, there is not need for that either.
# The only concurrent action that can happen, is that the Node where the
Container is running might have heart-beaten in - but that operation,
releaseContainer, tries to take a lock on the app too, which will have to
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we
are good there
# It is possible that multiple container update requests (say container
increase requests) for containers running on the same node can come in
concurrently - but the flow is such that the actual resource allocation for the
update is internally treated as a new (temporary) container container
allocation - and just like any normal container allocation in the scheduler,
they are serialized.
# It is possible that multiple container requests for the SAME container can
come in too - but we have a container version that takes care of that.
> The atomicity of container update in RM is not clear
> ----------------------------------------------------
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Reporter: Haibo Chen
> Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that
> {code:java}
> 342 // notify schedulerNode of the update to correct resource accounting
> 343 node.containerUpdated(existingRMContainer, existingContainer);
> 344
> 345
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346 // notify SchedulerNode of the update to correct resource accounting
> 347 node.containerUpdated(tempRMContainer, tempContainer);
> 348
> {code}
> bq. I think that it would be nicer to lock around these two calls to become
> atomic.
> Container update, and thus container swap as part of that, is atomic
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more
> conservative.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]