[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725859#comment-14725859
 ] 

Wangda Tan commented on YARN-1651:
----------------------------------

Hi [~mding],
Thanks for reviewing patch, it's very helpful!

*1)*
Regarding to:
bq. Recall during the design discussion, we agreed that as long as an increase 
has not yet completed for a container, we should not process any other 
increase/decrease requests for the same container. It seems that this patch 
will still process decrease/increase requests even an increase action is 
ongoing?

For the example 1, I think it's not very confusing. IIRC, we have discussed 
this before. We should track latest container increase/decrease status in 
AMRMClient. If AM send increase token to NM after it asks decrease the same 
container, it's AM's fault.

For example 2, actually this is handled by existing code: 
{{AbstractYarnScheduler#containerIncreasedOnNode}}
{code}
else if (Resources.fitsIn(nmContainerResource, rmContainerResource)) {
      // when rmContainerResource >= nmContainerResource, we won't do anything,
      // it is possible a container increased is issued by RM, but AM hasn't
      // told NM.
}
{code}
I think it's fine to me. Killing the container after increase expires is just a 
temp solution. I did this only because existing patch is too big, I don't want 
to add more confusing to it :). 

We should fix it as soon as we can, and it should be in the same release of 
YARN-1197. Do you think does it still confuse you if we fix the killing 
container after increase expires issue?

*2)*
bq. For the following, I was wondering why not handling duplicate containers in 
ApplicationMasterService? This way, the list passed to CapacityScheduler will 
be unique, and we can avoid unnecessary insert/remove?
Good suggestion, will fix.

*3)*
bq. In RMContainerImpl.java, I think reusing the ACQUIRED, and EXPIRE events 
for container increase could be a little confusing? Can we make it very clear 
by using a separate event, like INCREASE_ACQUIRED
Agree, will fix.

*4)*
bq. For the following, I think the writeLock is not needed, as the transition 
is already guarded with writeLock in handle(RMNodeEvent event)
You're correct, I'm under wrong impression that handle uses synchronized lock, 
will fix.

*5)*
bq. How about renaming CombinedContainerAllocator to GeneralContainerAllocator, 
and renaming RegularContainerAllocator to NewContainerAllocator?
I disucssed with [~jianhe] about this long time ago. NewContainerAllocator was 
used in early patches of YARN-3983. Actually NewContainerAllocator is same 
confusing, it looks like some "OldContainerAllocator" exists. 
GeneralContainerAllocator cannot reflect it's the parent container allocator 
for "regular-" and "increase-". Frankly I don't like existing naming as well, 
but I don't get a better name. 

Please let me know if your have any other thoughts.


> CapacityScheduler side changes to support increase/decrease container 
> resource.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to