[
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648785#comment-14648785
]
Jian He commented on YARN-3983:
-------------------------------
- I suggest not set the state implicitly in the constructor. It’s quite
confusing which constructor indicates which state. Setting it explicitly by
caller makes code easier to read.
{code}
public ContainerAllocation(RMContainer containerToBeUnreserved) {
this(containerToBeUnreserved, null, AllocationState.QUEUE_SKIPPED);
}
public ContainerAllocation(RMContainer containerToBeUnreserved,
Resource resourceToBeAllocated) {
this(containerToBeUnreserved, resourceToBeAllocated,
AllocationState.SUCCEEDED);
}
{code}
- reserved container returns SUCCEEDED state?
{code}
ContainerAllocation result =
new ContainerAllocation(null,
request.getCapability());
result.reserved = true;
result.containerNodeType =
type;
{code}
Earlier below code will not be invoked for reserved container, now it gets
invoked
{code}
if (allocationResult.state == AllocationState.SUCCEEDED) {
// Don't reset scheduling opportunities for offswitch assignments
// otherwise the app will be delayed for each non-local assignment.
// This helps apps with many off-cluster requests schedule faster.
if (allocationResult.containerNodeType != NodeType.OFF_SWITCH) {
if (LOG.isDebugEnabled()) {
LOG.debug("Resetting scheduling opportunities");
}
application.resetSchedulingOpportunities(priority);
}
// Non-exclusive scheduling opportunity is different: we need reset
// it every time to make sure non-labeled resource request will be
// most likely allocated on non-labeled nodes first.
application.resetMissedNonPartitionedRequestSchedulingOpportunity(priority);
}
{code}
- AllocationState#SUCCEEDED -> ALLOCTED.
- reserved boolean flag can be changed to be Allocateion#RESERVED state.
- CSAssignment#NULL_ASSIGNMENT not used, remove
- comments does not match method name
{code} * doAllocation needs to handle following stuffs: {code} .
- Below code was originally outside of the priorities loop
{code}
if (SchedulerAppUtils.isBlacklisted(application, node, LOG)) {
return
ContainerAllocation.APP_SKIPPED;
}
{code}
- below code can be changed to use null check ?
{code}
if (Resources.greaterThan(rc, clusterResource,
assigned.getResourceToBeAllocated(), Resources.none())) {
{code}
- move the for loop into {{applicationContainerAllocator.allocate}} method
{code}
for (Priority priority : getPriorities()) {
{code}
- return ContainerAllocation.QUEUE_SKIPPED, though logically correct,
semantically is incorrect, it should return Priority_Skipped
{code}
private ContainerAllocation assignNodeLocalContainers(
Resource clusterResource, ResourceRequest nodeLocalResourceRequest,
FiCaSchedulerNode node, Priority priority, RMContainer reservedContainer,
SchedulingMode schedulingMode, ResourceLimits currentResoureLimits) {
if (canAssign(priority, node, NodeType.NODE_LOCAL, reservedContainer)) {
return assignContainer(clusterResource, node, priority,
nodeLocalResourceRequest, NodeType.NODE_LOCAL, reservedContainer,
schedulingMode, currentResoureLimits);
}
return ContainerAllocation.QUEUE_SKIPPED;
}
// check if the resource request can access the label
if (!SchedulerUtils.checkResourceRequestMatchingNodePartition(request,
node.getPartition(), schedulingMode)) {
// this is a reserved container, but we cannot allocate it now according
// to label not match. This can be caused by node label changed
// We should un-reserve this container.
return new ContainerAllocation(rmContainer);
}
{code}
- APP_SKIPPED? original code seems skipping the priority
{code}
// Does the application need this resource?
if (allocatedContainer == null) {
// Skip this app if we failed to allocate.
ContainerAllocation ret =
new ContainerAllocation(allocationResult.containerToBeUnreserved);
ret.state = AllocationState.APP_SKIPPED;
return ret;
}
{code}
> Make CapacityScheduler to easier extend application allocation logic
> --------------------------------------------------------------------
>
> Key: YARN-3983
> URL: https://issues.apache.org/jira/browse/YARN-3983
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-3983.1.patch, YARN-3983.2.patch
>
>
> While working on YARN-1651 (resource allocation for increasing container), I
> found it is very hard to extend existing CapacityScheduler resource
> allocation logic to support different types of resource allocation.
> For example, there's a lot of differences between increasing a container and
> allocating a container:
> - Increasing a container doesn't need to check locality delay.
> - Increasing a container doesn't need to build/modify a resource request tree
> (ANY->RACK/HOST).
> - Increasing a container doesn't need to check allocation/reservation
> starvation (see {{shouldAllocOrReserveNewContainer}}).
> - After increasing a container is approved by scheduler, it need to update an
> existing container token instead of creating new container.
> And there're lots of similarities when allocating different types of
> resources.
> - User-limit/queue-limit will be enforced for both of them.
> - Both of them needs resource reservation logic. (Maybe continuous
> reservation looking is needed for both of them).
> The purpose of this JIRA is to make easier extending CapacityScheduler
> resource allocation logic to support different types of resource allocation,
> make common code reusable, and also better code organization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)