[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736779#comment-14736779
 ] 

Jian He commented on YARN-1651:
-------------------------------

Thanks Wangda, some comments on my side:
- Remove ApplicationMasterService#checkDuplicatedIncreaseDecreaseRequest
- revert ClientRMService change
- AbstractYarnScheduler#checkAndNormalizeContainerChangeRequests
 -- attempt parameter is not used
 -- this will never be null ?
{code}
if (null == sr) {
  continue;
}
{code}
 --  pass exception as a parameter, instead of string concatenation ?
{code}
LOG.warn("Error happens when checking increase request:" + e
    + ". Ignoring 
this request");
{code}
 -- the invalid resize requests are simply ignored. How will the AM know that 
their resize requests are ignored and so lost ? Similar thing for the 
toBeRemovedRequests inside IncreaseContainerAllocator.
- updateIncreaseRequests: originalResource-> prevChangeRequest; similarly for 
the logging {{original capacity = }}
- use decreaseRequest#getSchedulerNode instead of 
csContext.getNode(decreaseRequest.getNodeId())
{code}
unreserveContainerIncreaseRequest(clusterResource, app,
    
csContext.getNode(decreaseRequest.getNodeId()), rmContainer);
csContext.getNode(decreaseRequest.getNodeId())
    
.decreaseContainer(decreaseRequest.getContainerId(), absDelta);
{code}
- remove SchedulerApplicationAttempt#hasIncreaseRequestForContainer
- newlyIncreasedContainers will not be null
{code}
List<Container> newlyIncreasedContainers =
  nm.pullNewlyIncreasedContainers();
  
if (null != newlyIncreasedContainers) {
{code}
- why is the decision made to give increase request highest priority ? I think 
we need to augment the ContainerResourceChangeRequest API to have the priority 
too so that it can be compared against regular resource requests?
{code}
/*
 * Try to allocate increase container first, and if we failed to allocate
 * 
anything, we will try to allocate regular container
 */
{code}

- SchedulerContainerResourceChangeRequest, this name is so long that it reduces 
code readability a lot as it’s used in so many places, how about 
SchedContainerChangeRequest ? or any other concise name you can think of ?
- It is true that node resource would change. But changing node resource is 
such a rare event that the probability of hitting this much lower than hitting 
it in the allocate call. I suggest we should also fail the allocate call in 
this case.
{code}
// The reason of doing check here instead of adding increase request
// to 
scheduler because node's resource could be updated after
// request added.
{code}
- why equal ?
{code}
if (null == request
    || reservedContainer.getContainerState() != 
ContainerState.RUNNING
    || 
Resources.equals(reservedContainer.getReservedResource(),
        
request.getDeltaCapacity())) {

{code}
- allocateIncreaseRequestFromReservedContainer- remove the unused parameter
- I think killing a container in this case is too harsh. Nothing is wrong with 
the container itself. The container can run fine. I think we should revert the 
increased resource and have some way to signal back that the increaseRequest 
failed, maybe in the allocate response ?
{code}
// When the container expired, and it has a pending increased request, we
// 
will kill the container.

new KillTransition().transition(container, event);
{code}
- I think the container.containerIncreasedAndAcquired boolean flag is not 
needed?
- unreserveContainerIncreaseRequest-> unreserveIncreasedContainer
- internalReleaseResource: remove unused application parameter
- fix the format 
{code}
 RMAppAttempt appAttempt =
        rmContext
            .getRMApps()
            .get(
                container.getId().getApplicationAttemptId()
                    .getApplicationId()).getCurrentAppAttempt();
{code}

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to