[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346992#comment-16346992
 ] 

Sunil G commented on YARN-7494:
-------------------------------

Thanks [~leftnoteasy] [~cheersyang] [~Tao Yang] for comments.

Overall i ll summarize the suggestions including my thoughts
 # *multi-node-lookup* is enabled in cluster level now, we could also make it 
enable at application level and in other subsequent levels. I will make use of 
scheduling env's for this now. I think lets not add a new element in 
RegisterApplicationMasterRequest as it might need a bit of changes from 
Applications. Instead we can use the scheduling env added in other patch. Once 
we have the type from app (as app level OR queue OR cluster), we will pass this 
to a factory to get correct child class of {{CandidateNodeSet (Simple/Partition 
Based)}}
 ** Expose a node lookup SCOPE option from app in scheduling env as 
[SCOPE:APP/QUEUE/CLUSTER].
 ** SCOPE as APP level will enable multi-node-lookup-policy which will be 
explained in section2. SCOPE as QUEUE will fetch default config of 
multi-node-placement-enabled at each queue. CLUSTER means the value of 
yarn.capacity.scheduler.multi-node-placement-enabled.
 ** SCOPE enables option to lookup in multiple nodes. Given SCOPE as QUEUE, and 
at QUEUE level multi-node-lookup is disabled, then we will still look at one 
node at a time.
 # {{yarn.capacity.sorting-nodes.policy.class}} at cluster level/queue 
level/app level gives flexibility to choose correct node lookup policy given 
multi-node-placement-enabled is enabled in each level. So as [~cheersyang] 
mentioned, app can override queue level policy.
 #  Given we have the abstraction to select {{MultiNodePolicy}} , sorting 
optimization could be done at a central manager. I initially thought abt this 
to avoid computation cost, however had some concerns.

 ** Each time when a node is added/removed or capacity change happens, we need 
to always refresh the node set. Its not desirable to have a timer and refresh 
periodically as stale data for such a critical DS is not good design
 ** Number of nodes in a cluster always goes up, hence we may have some 
duplicated copy for each policy (given app level policy).
 # *[Proposal for #3]* Hence we can think abt an interim layer. We already have 
{{ClusterNodeTracker}} and {{NodeFilter}} interface. Hence we can query to this 
manager with any kind of Filter we need. 
 ## Now each MultiNodePolicy (NodeUsageBasedPolicy OR running container based 
etc)  will have a reference of original nodes retrieved from 
{{ClusterNodeTracker#getNodes}}(NodeFilter). A master {{map <MultiNodePolicy, 
Set<SchedulerNode>>}} will be the  master cache. This cache will be invalidate 
on a node change event.
 ## Since we have a master cache, each app's MultiNodePolicy will just fetch 
the reference from master map (NodeUsageBasedPolicy will have its entry of 
nodes sorted in that mode)
 ## Invalidating cache is tricky. I ll improve  ClusterNodeTracker to register 
a call back to invalidate master cache.

[~cheersyang] [~leftnoteasy] [~Tao Yang], pls check this and share your 
thoughts. Once we have consensus i ll change my patch. Or if a call is needed, 
we can quickly plan that same also.

> Add muti node lookup support for better placement
> -------------------------------------------------
>
>                 Key: YARN-7494
>                 URL: https://issues.apache.org/jira/browse/YARN-7494
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Major
>         Attachments: YARN-7494.001.patch, YARN-7494.v0.patch, 
> YARN-7494.v1.patch
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to