[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729706#comment-14729706
 ] 

MENG DING commented on YARN-1651:
---------------------------------

Hi, [~leftnoteasy]

I think it is fine to reuse Expire for now for container increase expiration. 
We probably need to address this properly in the JIRA that tracks container 
resource increase roll back. (I think Container resource increase expiration 
should be tracked as a Scheduler Event, e.g., 
SchedulerEventType.CONTAINER_INCREASE_EXPIRE)

I have a few more comments or questions regarding the patch:

* Regarding sanity checks:
** The following functions can be removed? 
{{ApplicationMaster.checkDuplicatedIncreaseDecreaseRequest()}}
** About {{RMServerUtils.checkDuplicatedIncreaseDecreaseRequest()}}. It seems 
that this function throws exception whenever there is a duplicated id. Shall we 
handle the case where if there are both increase and decrease requests for the 
same id, we can ignore the increase but keep the decrease request?
** Will it be better to combine all sanity checks into one function, e.g., 
{{validateIncreaseDecreaseRequest(List<ContainerResourceChangeRequest> 
incRequests, List<ContainerResourceChangeRequest> decRequests)}}, such that it 
will check both duplicated IDs, and the resource validity for increase and 
decrease requests? 
** For {{validateIncreaseDecreaseRequest}}, we don't check minimum allocation 
now, is it intended? I see that later on you normalize the request so that it 
will be at least minimum allocation. Just want to confirm.

* For {{SchedulerApplicationAttempt.pullNewlyUpdatedContainers}}. 
** This function is used by both pullNewlyIncreasedContainers(), and 
pullNewlyDecreasedContainers(). Why do we need to call 
{{updateContainerAndNMToken}} for decreased containers?  It also unnecessarily 
send a ACQUIRE_UPDATED_CONTAINER event for every decreased container?
** We should probably check null before adding updatedContainer?
{code:title=pullNewlyUpdatedContainers}
      Container updatedContainer = updateContainerAndNMToken(rmContainer, 
false);
      returnContainerList.add(updatedContainer);
{code}

* It seems {{RMNodeImpl.pullNewlyIncreasedContainers()}} is empty?

* The following function doesn't seem to be used?
{code:title=AppSchedulingInfo}
  public synchronized void notifyContainerStopped(RMContainer rmContainer) {
    // remove from pending increase request map if it exists
    removeIncreaseRequest(rmContainer.getAllocatedNode(),
        rmContainer.getAllocatedPriority(), rmContainer.getContainerId());
  }
{code}

* In {{IncreaseContainerAllocator.assignContainers}}:
** I think the following is a typo, should be {{if (cannotAllocateAnything)}}, 
right?
{code}
          if (shouldUnreserve) {
            LOG.debug("We cannot allocate anything because of low headroom, "
                + "headroom=" + resourceLimits.getHeadroom());
          }
{code}
** Not sure if I understand the logic. Why only break when 
node.getReservedContainer() == null? Shouldn't we break out of the loop here no 
matter what?
{code}
       while (iter.hasNext()) {
          ...
          ...
          // Try to allocate the increase request
          assigned = allocateIncreaseRequest(node, increaseRequest);
          if (node.getReservedContainer() == null) {
            // if it's not a reserved increase request, we will record
            // priority/containerId so that we can remove the request later
            increasedContainerPriority = priority;
            increasedContainerId = rmContainer.getContainerId();
            break;
          }
       }  
{code}
** Is the following needed? 
 {code}
      if (increasedContainerId != null) {
        // If we increased (not reserved) a new increase request, we should
        // remove it from request map.
        application.removeIncreaseRequest(nodeId, increasedContainerPriority,
            increasedContainerId);
      }
{code}
I think earlier in the {{allocateIncreaseRequest()}} function, if a new 
increase is successfully allocated, 
{{application.increaseContainer(increaseRequest)}} will have removed the 
increase request already?
* In {{RMContainerImpl.java}}
IIUC, {{containerIncreased}} indicates that a increase is done in scheduler, 
and {{containerIncreasedAndAcquired}} indicates that a increase has been 
acquired by AM. 
If so, then in {{NMReportedContainerChangeIsDoneTransition}}
{code}
    public void transition(RMContainerImpl container, RMContainerEvent event) {
      if (container.containerIncreased) {
        // If container is increased but not acquired by AM, we will start
        // containerAllocationExpirer for this container in this transition.
        container.containerAllocationExpirer.unregister(event.getContainerId());
        container.containerIncreasedAndAcquired = false;
      }
    }
{code}
Shouldn't it be changed to:
{code}
    public void transition(RMContainerImpl container, RMContainerEvent event) {
      if (container.containerIncreasedAndAcquired ) {                           
      <=======================
        // If container is increased but not acquired by AM, we will start
        // containerAllocationExpirer for this container in this transition.
        container.containerAllocationExpirer.unregister(event.getContainerId());
        container.containerIncreasedAndAcquired = false;
        container.containerIncreased = false;                                   
           <=======================
      }
    }
{code}
Also, is *container.containerIncreased* really needed?

* A few typos:
*updateNodeHeartbeatResposneForContainersDecreasing*
*cancelInceaseReservation*


> CapacityScheduler side changes to support increase/decrease container 
> resource.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch, YARN-1651-2.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to