[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382864#comment-16382864
 ] 

Wangda Tan commented on YARN-7494:
----------------------------------

Thanks [~sunilg] for updating the patch, reviewed overall workflow of the 
patch, comments: 
1) PartitionBasedCandidateNodeSet: It doesn't look like added anything other 
than SimpleCandidateNodeSet, maybe reuse the same class?

2) CS#getCandidateNodeSet, the get_node_nodes_from_give_partition operation is 
very expensive, maybe cache partition -> nodes inside ClusterNodeTracker?

3) Ideally, MultiNodeLookupPolicy initialization should not determined by 
Scheduler, and implementation of MultiNodeSortingManager should not bind to 
specific scheduler.
{code} 
CapacityScheduler cs = ((CapacityScheduler) rmContext.getResourceManager()
.getResourceScheduler());
nodeLookupPolicy = (MultiNodeLookupPolicy<N>) cs.getMultiNodeSortingManager();
{code}

3.1 type cast from MultiNodeSortingManager to MultiNodeLookupPolicy looks 
incorrect.
3.2 Inside MultiNodeSortingManager, the type cast to {{CapacityScheduler}} 
looks unnecessary since we already made class to be a generic type: {{<N 
extends SchedulerNode>}}

4) Suggestions to MultiNodeLookupPolicy (and other new added interfaces) design:

In order to support application specify its own sorting policy, we may need to 
take application's input to initialize policy.
- Existing implementation of MultiNodeLookupPolicy initialization code assumed 
it is shared by apps. (All apps with the same policy can get the same sorting 
order). However, we need both: for example, load balance sorter can be shared 
by apps, but locality sorter should be different for different apps. So to make 
it work, I suggest to add a MultiNodeLookupPolicyFactory class, depends on user 
requested policy class. It can either return whatever cached in 
{{MultiNodeSortingManager}} (which assumes it will be shared to apps), or 
initialize new policies. Class of MultiNodeLookupPolicy should be determined by 
{{applicationSchedulingEnvs}}.
- Add MultiNodeLookupPolicy to {{AppPlacementAllocator#initialize}} parameter 
list.
- {{MultiNodeLookupPolicy#initPolicy}} should take 
{{AppSchedulingInfo#applicationSchedulingEnvs}} as input to initialize its own 
fields. The multiNodeSortingAlgorithm should not be a separate item in the 
initPolicy parameter list.
- {{MultiNodeLookupPolicy#addAndRefreshNodesSet}} Should take CandidateNodeSet 
as input, we can include version to let MultiNodeLookupPolicy if it need to 
invalidated cached sorted result or not.

5) Typo:
bq. LOG.error("Exception raised while executing preemption"


> Add muti node lookup support for better placement
> -------------------------------------------------
>
>                 Key: YARN-7494
>                 URL: https://issues.apache.org/jira/browse/YARN-7494
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Major
>         Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to