[
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382864#comment-16382864
]
Wangda Tan commented on YARN-7494:
----------------------------------
Thanks [~sunilg] for updating the patch, reviewed overall workflow of the
patch, comments:
1) PartitionBasedCandidateNodeSet: It doesn't look like added anything other
than SimpleCandidateNodeSet, maybe reuse the same class?
2) CS#getCandidateNodeSet, the get_node_nodes_from_give_partition operation is
very expensive, maybe cache partition -> nodes inside ClusterNodeTracker?
3) Ideally, MultiNodeLookupPolicy initialization should not determined by
Scheduler, and implementation of MultiNodeSortingManager should not bind to
specific scheduler.
{code}
CapacityScheduler cs = ((CapacityScheduler) rmContext.getResourceManager()
.getResourceScheduler());
nodeLookupPolicy = (MultiNodeLookupPolicy<N>) cs.getMultiNodeSortingManager();
{code}
3.1 type cast from MultiNodeSortingManager to MultiNodeLookupPolicy looks
incorrect.
3.2 Inside MultiNodeSortingManager, the type cast to {{CapacityScheduler}}
looks unnecessary since we already made class to be a generic type: {{<N
extends SchedulerNode>}}
4) Suggestions to MultiNodeLookupPolicy (and other new added interfaces) design:
In order to support application specify its own sorting policy, we may need to
take application's input to initialize policy.
- Existing implementation of MultiNodeLookupPolicy initialization code assumed
it is shared by apps. (All apps with the same policy can get the same sorting
order). However, we need both: for example, load balance sorter can be shared
by apps, but locality sorter should be different for different apps. So to make
it work, I suggest to add a MultiNodeLookupPolicyFactory class, depends on user
requested policy class. It can either return whatever cached in
{{MultiNodeSortingManager}} (which assumes it will be shared to apps), or
initialize new policies. Class of MultiNodeLookupPolicy should be determined by
{{applicationSchedulingEnvs}}.
- Add MultiNodeLookupPolicy to {{AppPlacementAllocator#initialize}} parameter
list.
- {{MultiNodeLookupPolicy#initPolicy}} should take
{{AppSchedulingInfo#applicationSchedulingEnvs}} as input to initialize its own
fields. The multiNodeSortingAlgorithm should not be a separate item in the
initPolicy parameter list.
- {{MultiNodeLookupPolicy#addAndRefreshNodesSet}} Should take CandidateNodeSet
as input, we can include version to let MultiNodeLookupPolicy if it need to
invalidated cached sorted result or not.
5) Typo:
bq. LOG.error("Exception raised while executing preemption"
> Add muti node lookup support for better placement
> -------------------------------------------------
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler
> Reporter: Sunil G
> Assignee: Sunil G
> Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch,
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup
> based on partition to start with.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]