[
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103138#comment-17103138
]
Manikandan R commented on YARN-6492:
------------------------------------
Thanks [~jhung] and [~epayne] for your support.
I would like to clear some things at high level, especially on
scope/requirements of this Jira and revisit the background (thought process) on
how all these related Jira has been created to ensure that we are on same page:
YARN-6467 computes metrics only for default partition. It has been created as
an interim step towards the major goal of "Providing Metrics at Partition
Level" to the customers. Major goal is nothing but this JIRA. Since YARN-6467
is stepping stone for this JIRA, it has been coded in a way that it should
easily accomodate this Jira changes in a simplistic way (For example, Just
removing if(partition is default) check inside each metric computation method
expected to take care most of the things and no more changes required on collar
side).
Though YARN-6467 covers some aspects, it had created confusion (for the queue's
associated with multiple partitions) as well. Original QueueMetrics computation
behaviour has been changed. Original QueueMetrics computation is nothing but
the metrics computation only from Queue perspective irrespective of how many
partitions it has been associated to and nothing to do with Partitions. It
started providing metrics only for "default" partition by replacing the
original behaviour. Another reasons for taking up this path is, this Jira
expected to go into the trunk immediately after YARN-6467 (as planned :) ) and
hence there won't be any inconsistency in original queue metrics computation
behaviour, but it didn't happened.
So, whenever we said "backward compatibility" we referred to this Original
QueueMetrics computation, not "existing QueueMetrics should still only contain
metrics for default partition". In other words, Original QueueMetrics
computation is nothing but the code/behaviour before YARN-6467.
Now, let me explain scope of this JIRA. We would like to achieve the following
things:
# Partition * Queue Metrics:
A partition can be associated with many queues. So we need to break up, hence
we need Partition * Queue metrics.
Proposed structure is
PartitionMetric (labelX)
QueueMetric (A)
metrics
Usermetrics
QueueMetric (A1)
metrics
Usermetrics
QueueMetric (A2)
metrics
Usermetrics
QueueMetric (B)
metrics
Usermetrics
PartitionMetric (labelY)
QueueMetric (A)
QueueMetric (A1)
QueueMetric (A2)
QueueMetric (B)
…
{{QueueMetrics#getPartitionQueueMetrics }} takes care of this registration into
Metric system and use this object for all metric computations.
Sample JMX o/p is
{code:java}
"name" :
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x,q0=root,q1=a"
...{code}
2. Partition metrics:
Partition level metrics computation. This can help Admins to analyse the usage
at Partition level.
Proposed structure is
PartitionMetric (labelX)
metrics
PartitionMetric (labelY)
metrics
{{PartitionQueueMetrics#getPartitionQueueMetrics }} takes care of this
registration into Metric system and use this object for all metric computations.
Sample JMX o/p is
{code:java}
"name" :
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x"
...{code}
In addition to these 2 changes, we would like to retain the Original
QueueMetrics computation behaviour.
Hope the above explanation explains why the below assert has been changed:
{noformat}
assertEquals(10 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
is changed to
{noformat}
assertEquals(22 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
This assert has been added as part of YARN-9596 to ensure YARN-6467 works
correctly. YARN-9767 exhaustive unit test changes explain the difference
between Partition * Queue Metrics, Partition Metrics and Original QueueMetrics
very clearly.
What changes this patch should contain?
Yes, there is some confusion as some changes are in YARN-9767. #2 described in
YARN-9767 should be in this patch. Otherwise, this patch is incomplete from
feature rollout perspective. [~jhung] said #1 described in YARN-9767 was there
even before. My understanding, it should be happening after YARN-6467 only.
Would it be better if we handle that too here? What do you think?
Please share your opinions. Post that, will post the proper patch first and
then reviews can be taken up on the same.
{quote}In order to be consistent with other API responses like
{{/ws/v1/cluster/scheduler}}, I think this should just be an empty string. So,
I would expect the JMX response to look like the following for
DEFAULT_PARTITION:\{quote}
Yes [~epayne], we should handle this for sure.
> Generate queue metrics for each partition
> -----------------------------------------
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Reporter: Jonathan Hung
> Assignee: Manikandan R
> Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt,
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt,
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch,
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch,
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each
> queue has one QueueMetrics object which captures metrics either in default
> partition or across all partitions. (After YARN-6467 it will be in default
> partition)
> But having the partition metrics would be very useful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]