[
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148261#comment-15148261
]
Arun Suresh commented on YARN-4412:
-----------------------------------
Many thanks for the detailed review [~curino].
# I totally agree with your point in explicitly authorizing AMs to allow them
to send and receive cluster information via the extended protocol : YARN-4631
has been raised to track this.
# With regard to generalizing {{QueuedContainersStatus}} into a
{{ClusterStatus}}, Please note.. this is actually metadata sent from the NM to
the RM, therefore *ClusterStatus* might not apply here. But I agree, we
probably can add more cluster information to the
{{DistributedSchedulingProtocol}}, which we introduced in YARN-2885. Also the
node heartbeat does already contain both Container as well as aggregate Node
resource utilization information. {{QueuedContainersStatus}} is just another
utilization metric used by the {{ClusterMonitor}} running on the RM and used by
the DistributedScheduling framework to gauge the relative load on a Node based
on the state of the queue (maintained by the {{ContainersMonitor}} which queues
OPPORTUNISTICS container requests)
bq. ..documentation on the various classes would help. e.g., you introduce a
DistributedSchedulingService, ..
Agreed, I have added some class level docs to some of the new classes
introduced here.
bq. ... if you are factoring out all the "guts" of SchedulerEventDispatcher,
can't we simply move the class out? ..
Agreed..
bq. Can you clarify what happens in DistributedSchedulingService.getServer()
?...
Fixed the comment to explain this.
bq. ..assumes resources will have only cpu/mem...Is there any better way to
load this info from configuration? It would be nice to have a
config.getResource("blah"), which takes care of this...
Good point.. unfortunately, currently the Configuration object does not support
{{getResource()}}.. Once the generalized resource model lands, will circle back
to this.
bq. I see tests for TopKNodeSelector, but for nothing else. Is this enough?
Definitely not.. but we have to wait for the actual changes in the
{{ContainerManager}} and {{ContainersMonitor}} class, handled in YARN-2883 to
test this end-to-end. In the mean time, I will add tests to verify that extra
fields in the protobuff are handled correctly.
> Create ClusterMonitor to compute ordered list of preferred NMs for
> OPPORTUNITIC containers
> ------------------------------------------------------------------------------------------
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, resourcemanager
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch,
> YARN-4412-yarn-2877.v2.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual
> Node Managers and computes an ordered list of preferred Node managers to be
> used as target Nodes for OPPORTUNISTIC container allocations.
> This list can be pushed out to the Node Manager (specifically the AMRMProxy
> running on the Node) via the Allocate Response. This will be used to make
> local Scheduling decisions
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)