Jian He commented on YARN-1651:

bq. I think we may need add such information to AMRMProtocol to make sure AM 
will be notified. For now, we can keep them as-is. Users can still get such 
information from RM logs.
I think for now we can fail the allocate call explicitly on those very clear 
situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation 
that rmContainer doesn't exist That's more explicit to users. Digging through 
logs is not an easy thing for application writer.

thanks for updating, Wangda ! some more comments focusing on decreasing code 

- this may be not correct, because reserve event can happen on RESERVE state 
too, i.e. reReservation
      if (container.getState() != RMContainerState.NEW) {
        container.hasIncreaseReservation = true;
 - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? 
and therefore NodeHeartBeatResponse and Impl change is not needed; similarly 
nmReportedIncreasedContainers can be a list.
 - When decreasing a container, should it send RMNodeDecreaseContainerEvent too 
 - revert ContainerManagerImpl change
 - Remove SchedulerApplicationAttempt#getIncreaseRequests
 - In AbstractYarnScheduler#deceraseContainers() move 
checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same 
place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for 
- this if condition is not needed.
  public boolean unreserve(Priority priority,
      FiCaSchedulerNode node, RMContainer rmContainer) {
    if (rmContainer.hasIncreaseReservation()) {
 - looks like when decreasing reservedIncreasedContainer, it will unreserve the 
*whole* extra reserved resource, should it only unreserve the extra resources 
being decresed ?
 - In general, I think we should be able to decrease/increase a regular 
reserved container or a increasedReservedContainer ? 
- In ParentQueue, this null check is not needed.
  public void decreaseContainer(Resource clusterResource,
      SchedContainerChangeRequest decreaseRequest,
      FiCaSchedulerApp app) {
    if (app != null) {

- allocate call is specifically marked as noLock, but now every allocate call 
holds the global scheduler lock which is too expensive. we can move 
decreaseContainer to application itself.  
{code}   protected synchronized void decreaseContainer( {code}
It is also now holding queue Lock on allocate, which is also expensive, because 
that means a bunch of malicious AMs can effectively block queue's execuation.  

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> -------------------------------------------------------------------------------
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch

This message was sent by Atlassian JIRA

Reply via email to