[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141349#comment-15141349
 ] 

Carlo Curino commented on YARN-4412:
------------------------------------

Hi [~asuresh], I really like the direction of the patch. I had several 
conversations with folks writing "smarter" applications, which are asking for 
some "visibility" in what happens in the cluster. 
If I understand the structure correctly, with what you do we could either: 
 # pass this extra info between RM and AMRMProxy, but stripping them out before 
it reach the app (thus preserving existing behavior, and hiding potentially 
sensitive info about the cluster load).
 # have the AMRMProxy forward this info to a "smarter" app. 

I think it is important to be able to *enforce* the above behaviors, i.e., a 
sneaky AM should not be able to talk directly with the RM, pretending to be the 
AMRMProxy and grabbing those extra fields. This could be accomplished with 
an extra round of "tokens" that allow to talk the extended protocol vs just the 
basic one. A trusted app can receive this tokens, while a untrusted app, will 
not. The AMRMProxy is part of infrastructure, so should have this special 
tokens.
Does this make sense?

Given the above demand from app writers, I think it would be nice to 
*generalize* what you are doing. It would be nice to have a more general 
purpose {{ClusterStatus}} object to be passed down. A specific instantiation of 
which is your 
{{QueuedContainersStatus}}, which specifically returns a top-k of queueing 
behavior at nodes. Just to make an example, I can easily see a latency-critical 
serving service trying to figure out where best to place its tasks, to ask for 
information 
about average CPU/NET/DISK utilization of all available nodes, before 
requesting to run on a few that are (according to this service custom metrics) 
the best fit. This shouldn't be too hard, I am just proposing a more general 
wrapper object,
which can allows us later on to leverage this very same mechanism, for more 
than what you guys do today. I think it would make it a very valuable service 
to provide to apps writers, especially as we head towards more and more 
services.

Nits: 
 * I think documentation on the various classes would help. e.g., you introduce 
a {{DistributedSchedulingService}}, that from other discussions I understand is 
useful, but just staring at the code is hard to get why we need all this.
 * In {{ResourceManager}}, if you are factoring out all the "guts" of 
SchedulerEventDispatcher, can't we simply move the class out? There is nothing 
left in RM, other than a local rename? 
 * Can you clarify what happens in {{DistributedSchedulingService.getServer()}} 
? The comment has double negations and I am not clear on what what a 
reflectiveBlockingService does. 
 * {{registerApplicationMasterForDistributedScheduling}} assumes resources will 
have only cpu/mem. This might change soon. Is there any better way to load this 
info from configuration? It would be nice to have  a 
config.getResource("blah"), which takes care of this.
 * I see tests for {{TopKNodeSelector}}, but for nothing else. Is this enough? 


> Create ClusterMonitor to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4412
>                 URL: https://issues.apache.org/jira/browse/YARN-4412
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-4412-yarn-2877.v1.patch, 
> YARN-4412-yarn-2877.v2.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual 
> Node Managers and computes an ordered list of preferred Node managers to be 
> used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to