[
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wangda Tan updated YARN-3361:
-----------------------------
Attachment: YARN-3361.3.patch
Thanks for your comments, [~vinodkv]/[~jianhe]:
* Main code comments from Vinod: *
bq. checkNodeLabelExpression: NPEs on labelExpression can happen?
No, I removed checkings
bq. FiCaSchedulerNode: exclusive, setters, getters -> exclusivePartition
They're not used by anybody, removed
bq. ExclusiveType renames
Done
bq. AbstractCSQueue:
1. Change to nodePartitionToLookAt: Done
2. Now all queues checks needResources
3. Renamed to hasPendingResourceRequest as suggested by Jian
bq. checkResourceRequestMatchingNodeLabel can be moved into the application?
Moved to SchedulerUtils
bq. checkResourceRequestMatchingNodeLabel nodeLabelToLookAt arg is not used
anywhere else.
Done (merged it in SchedulerUtils.checkResourceRequestMatchingNodePartition)
bq. addNonExclusiveSchedulingOpportunity
Renamed to reset/addMissedNonPartitionedRequestSchedulingOpportunity
bq. It seems like we are not putting absolute max-capacities on the individual
queues when not-respecting-partitions. Describe why? Similarly, describe as to
why user-limit-factor is ignored in the not-respecting-paritions mode.
Done
* Test code comments from Vinod: *
bq. testNonExclusiveNodeLabelsAllocationIgnoreAppSubmitOrder
Done
bq. testNonExclusiveNodeLabelsAllocationIgnorePriority
Rename to testPreferenceOfNeedyPrioritiesUnderSameAppTowardsNodePartitions
bq. Actually, now that I rename it that way, this may not be the right
behavior. Not respecting priorities within an app can result in scheduling
deadlocks:
This will not lead deadlock, because we separately count resource usage under
each partition, priority=1 goes first on partition=y before priority=0 all
satisifed only because priority=1 is the lowest priority asks for partition=y.
bq. testLabeledResourceRequestsGetPreferrenceInHierarchyOfQueue
Renamed to testPreferenceOfQueuesTowardsNodePartitions
bq. testNonLabeledQueueUsesLabeledResource
Done
bq. Let's move all these node-label related tests into their own test-case.
Moved to TestNodeLabelContainerAllocation
Add more tests:
1. Added testAMContainerAllocationWillAlwaysBeExclusive to make sure AM will be
always excluisve.
2. Added testQueueMaxCapacitiesWillNotBeHonoredWhenNotRespectingExclusivity to
make sure max-capacities on individual queues ignored when doing ignore
exclusivity allocation
* Main code comments from Jian: *
bq. Merge queue#needResource and application#needResource
Done, now moved common implementation to
SchedulerUtils.hasPendingResourceRequest
bq. Merge queue#needResource and application#needResource
Done
bq. Some methods like canAssignToThisQueue where both nodeLabels and
exclusiveType are passed, it may be simplified by passing the current
partitionToAllocate to simplify the internal if/else check.
Actually, it will not simplify logic too much, I checked there're only few
places can leverage nodePartitionToLookAt, I perfer to keep semantics of
SchedulingMode
bq. The following may be incorrect, as the current request may be not the AM
container request, though null == rmAppAttempt.getMasterContainer()
I understand masterContainer could be async initialized in RMApp, but the
interval could be ignored, doing the null check here can make sure AM container
isn't get allocated.
bq. below if/else can be avoided if passing the nodePartition into
queueCapacities.getAbsoluteCapacity(nodePartition),
Done
bq. the second limit won’t be hit?
Yeah, it will not be hit, but set it to be "maxUserLimit" will enhance
readability.
bq. nonExclusiveSchedulingOpportunities#setCount -> add(Priority)
Done
Attached new patch (ver.3)
> CapacityScheduler side changes to support non-exclusive node labels
> -------------------------------------------------------------------
>
> Key: YARN-3361
> URL: https://issues.apache.org/jira/browse/YARN-3361
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacityscheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-3361.1.patch, YARN-3361.2.patch, YARN-3361.3.patch
>
>
> According to design doc attached in YARN-3214, we need implement following
> logic in CapacityScheduler:
> 1) When allocate a resource request with no node-label specified, it should
> get preferentially allocated to node without labels.
> 2) When there're some available resource in a node with label, they can be
> used by applications with following order:
> - Applications under queues which can access the label and ask for same
> labeled resource.
> - Applications under queues which can access the label and ask for
> non-labeled resource.
> - Applications under queues cannot access the label and ask for non-labeled
> resource.
> 3) Expose necessary information that can be used by preemption policy to make
> preemption decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)