[
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726075#comment-14726075
]
MENG DING commented on YARN-1651:
---------------------------------
Thanks [~leftnoteasy] for the explanation.
bq. For the example 1, I think it's not very confusing. IIRC, we have discussed
this before. We should track latest container increase/decrease status in
AMRMClient. If AM send increase token to NM after it asks decrease the same
container, it's AM's fault
This makes sense to me now.
bq. For example 2, actually this is handled by existing code:
AbstractYarnScheduler#containerIncreasedOnNode. Killing the container after
increase expires is just a temp solution. I did this only because existing
patch is too big, I don't want to add more confusing to it .
bq. We should fix it as soon as we can, and it should be in the same release of
YARN-1197. Do you think does it still confuse you if we fix the killing
container after increase expires issue?
This should be fine as long as we handle resource rollback correctly for
multiple increase requests expiration. For example:
1. Initially a container uses 4GB
2. The first increase request increases it to 6GB (token1)
3. The second increase request increases it to 8GB (token2)
If only token2 is used, expiration of token1 will be ignored
If token1 is used, but token2 is expired, scheduler should be able to roll back
to 6GB.
If both token1 and token2 expire, scheduler must be able to roll back to 4GB.
In other words, the increase expiration logic must remember all ongoing
increase requests for a container, not just the latest one.
bq. Frankly I don't like existing naming as well, but I don't get a better
name.
I think it is the name of {{CombinedContainerAllocator}} that is most confusing
to me :-). How about just naming {{CombinedContainerAllocator}} as
{{ContainerAllocator}}, and then renaming the current {{ContainerAllocator}} as
{{AbstractContainerAllocator}}? Anyway, maybe other people will have better
suggestions.
I also have another suggestion regarding {{newlyDecreasedContainers}}. Right
now, as soon as the scheduler decrease the container resource, it will add the
container to this list, which will be pulled by AM in the next heartbeat. I
think this doesn't have too much value. I was wondering, if it can be enhanced
such that the list will only be updated when the decreased containers list has
been sent to NM? Implementation wise, it can be similar to the idea of
RMAppAttemptImpl.justFinishedContainers, and
RMAppAttemptImpl.finishedContainersSentToAM. Basically RMNodeImpl keeps the
containers from the *toBeDecreasedContainers* list in the memory for one more
heartbeat cycle, and once RM receives the next heartbeat request from NM, it
implies that the toBeDecreasedContainers from last heartbeat response has been
successfully received by NM. The benefit of this approach is that once AM
receives the decreased list, it knows that the decrease has been received by
NM. Maybe we can take a step further to make the decrease action synchronous in
NodeStatusUpdaterImpl (involves one line code change, and we already do
blocking increase on NM), then once AM receives the decreased list, it knows
that decrease has been completed in NM, eliminating the needs for status
polling. Thoughts?
> CapacityScheduler side changes to support increase/decrease container
> resource.
> -------------------------------------------------------------------------------
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager, scheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)