[ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213280#comment-16213280 ]
Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM: ------------------------------------------------------------- [~haibochen] / [~miklos.szeg...@cloudera.com] So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because: # the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. # The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there # It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized. # It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that. Although - I do have to mention, that the code you pasted above - which is part of the changes in YARN-4511 can cause a few problems, since you are updating the node as well, and you might need a lock on the node before you do that. was (Author: asuresh): [~haibochen] / [~miklos.szeg...@cloudera.com] So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because: # the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. # The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there # It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized. # It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that. > The atomicity of container update in RM is not clear > ---------------------------------------------------- > > Key: YARN-7373 > URL: https://issues.apache.org/jira/browse/YARN-7373 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Reporter: Haibo Chen > Assignee: Haibo Chen > > While reviewing YARN-4511, Miklos noticed that > {code:java} > 342 // notify schedulerNode of the update to correct resource accounting > 343 node.containerUpdated(existingRMContainer, existingContainer); > 344 > 345 > ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); > 346 // notify SchedulerNode of the update to correct resource accounting > 347 node.containerUpdated(tempRMContainer, tempContainer); > 348 > {code} > bq. I think that it would be nicer to lock around these two calls to become > atomic. > Container update, and thus container swap as part of that, is atomic > according to [~asuresh]. > It'd be nice to discuss this in more details to see if we want to be more > conservative. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org