[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731505#comment-14731505
 ] 

Wangda Tan commented on YARN-1651:
----------------------------------

Hi Meng,
Thanks for comments:

bq. We probably need to address this properly in the JIRA that tracks container 
resource increase roll back. (I think Container resource increase expiration 
should be tracked as a Scheduler Event, e.g., 
SchedulerEventType.CONTAINER_INCREASE_EXPIRE)
I think we can do that either via adding CONTAINER_INCREASE_EXPIRE or directly 
call decrease of scheduler from RMContainer, I'm not sure which one is better, 
let's figure it out when doing it.

bq. It seems that this function throws exception whenever there is a duplicated 
id. Shall we handle the case where if there are both increase and decrease 
requests for the same id, we can ignore the increase but keep the decrease 
request?
I have thought about this before, I think it's hard to decide when two request 
with same containerId but different target resource exists, which one will be 
chosen. And it's not like an expected allocate request as well. So I prefer to 
reject both.

bq. Will it be better to combine all sanity checks into one function
Done

bq. For validateIncreaseDecreaseRequest, we don't check minimum allocation now, 
is it intended?
Yes it's intended, because we will normalize it later, so no need to throw 
exception.

bq. This function is used by both pullNewlyIncreasedContainers(), and 
pullNewlyDecreasedContainers(). Why do we need to call 
updateContainerAndNMToken for decreased containers? It also unnecessarily send 
a ACQUIRE_UPDATED_CONTAINER event for every decreased container?
This is majorly makes logic correct and consistent, we don't use the container 
token for now, but I think we should make it updated before return to app. 
Unless we have any performance issue of doing this, I prefer to keep existing 
behavior.

bq. We should probably check null before adding updatedContainer?
It seems no need to do the null check here. When it becomes null? I prefer to 
keep it as-is and it will throw NPE if any fatal issue happens.

bq. RMNodeImpl.pullNewlyIncreasedContainers()
Implemented.

bq. AppSchedulingInfo#notifyContainerStopped not being used.
Removed, we handled this in LeafQueue#completedContainer.

bq. I think the following is a typo, should be if (cannotAllocateAnything), 
right?
Correct, fixed.

bq. Not sure if I understand the logic. Why only break when 
node.getReservedContainer() == null? Shouldn't we break out of the loop here no 
matter what?
Nice catch! I fixed this, we should break when we allocated or reserved 
anything.

bq. I think earlier in the allocateIncreaseRequest() function, if a new 
increase is successfully allocated, 
application.increaseContainer(increaseRequest) will have removed the increase 
request already?
Another nice catch! Yes we should already handled it in 
application.increaseContainer

bq. RMContainerImpl...Shouldn't it be changed to...
Yes, it should do as you said, updated.

bq. Also, is container.containerIncreased really needed?
It's needed when we don't know if a acquired event is for an increasedContainer 
or decreasedContainer, added isIncreaseContainer to acquire event (now it's 
RMContainerUpdatesAcquiredEvent). And removed 
RMContainerImpl.containerIncreased.

Typos: fixed.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch, YARN-1651-2.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to