[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736779#comment-14736779 ]
Jian He commented on YARN-1651: ------------------------------- Thanks Wangda, some comments on my side: - Remove ApplicationMasterService#checkDuplicatedIncreaseDecreaseRequest - revert ClientRMService change - AbstractYarnScheduler#checkAndNormalizeContainerChangeRequests -- attempt parameter is not used -- this will never be null ? {code} if (null == sr) { continue; } {code} -- pass exception as a parameter, instead of string concatenation ? {code} LOG.warn("Error happens when checking increase request:" + e + ". Ignoring this request"); {code} -- the invalid resize requests are simply ignored. How will the AM know that their resize requests are ignored and so lost ? Similar thing for the toBeRemovedRequests inside IncreaseContainerAllocator. - updateIncreaseRequests: originalResource-> prevChangeRequest; similarly for the logging {{original capacity = }} - use decreaseRequest#getSchedulerNode instead of csContext.getNode(decreaseRequest.getNodeId()) {code} unreserveContainerIncreaseRequest(clusterResource, app, csContext.getNode(decreaseRequest.getNodeId()), rmContainer); csContext.getNode(decreaseRequest.getNodeId()) .decreaseContainer(decreaseRequest.getContainerId(), absDelta); {code} - remove SchedulerApplicationAttempt#hasIncreaseRequestForContainer - newlyIncreasedContainers will not be null {code} List<Container> newlyIncreasedContainers = nm.pullNewlyIncreasedContainers(); if (null != newlyIncreasedContainers) { {code} - why is the decision made to give increase request highest priority ? I think we need to augment the ContainerResourceChangeRequest API to have the priority too so that it can be compared against regular resource requests? {code} /* * Try to allocate increase container first, and if we failed to allocate * anything, we will try to allocate regular container */ {code} - SchedulerContainerResourceChangeRequest, this name is so long that it reduces code readability a lot as it’s used in so many places, how about SchedContainerChangeRequest ? or any other concise name you can think of ? - It is true that node resource would change. But changing node resource is such a rare event that the probability of hitting this much lower than hitting it in the allocate call. I suggest we should also fail the allocate call in this case. {code} // The reason of doing check here instead of adding increase request // to scheduler because node's resource could be updated after // request added. {code} - why equal ? {code} if (null == request || reservedContainer.getContainerState() != ContainerState.RUNNING || Resources.equals(reservedContainer.getReservedResource(), request.getDeltaCapacity())) { {code} - allocateIncreaseRequestFromReservedContainer- remove the unused parameter - I think killing a container in this case is too harsh. Nothing is wrong with the container itself. The container can run fine. I think we should revert the increased resource and have some way to signal back that the increaseRequest failed, maybe in the allocate response ? {code} // When the container expired, and it has a pending increased request, we // will kill the container. new KillTransition().transition(container, event); {code} - I think the container.containerIncreasedAndAcquired boolean flag is not needed? - unreserveContainerIncreaseRequest-> unreserveIncreasedContainer - internalReleaseResource: remove unused application parameter - fix the format {code} RMAppAttempt appAttempt = rmContext .getRMApps() .get( container.getId().getApplicationAttemptId() .getApplicationId()).getCurrentAppAttempt(); {code} > CapacityScheduler side changes to support increase/decrease container > resource. > ------------------------------------------------------------------------------- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)