[
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112648#comment-17112648
]
Jonathan Hung commented on YARN-6492:
-------------------------------------
Thank you [[email protected]]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath +
userName", can we change this to "partition + '.' + userName + '.' + queuePath"
? In particular the queuePath + userName part could cause conflicts (e.g. queue
named "root.auser" could conflict with user metrics under queue "root.a" and
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I
don't think we need to add the metrics object to QUEUE_METRICS, since we're
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I
don't think we need to add queue path to the key, since the users map is not
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these
changes needed?
* I don't think setAvailableResourcesToQueue is handled correctly. It appears
to update partition metrics no matter which queue this method is invoked for.
Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0, 100 * GB, 100, 2 * GB, 2,
2);{noformat}
should be
{noformat}checkResources(partitionSource, 0, 0, 0, 200 * GB, 200, 2 * GB, 2,
2);{noformat}
Perhaps we should only update partition metrics in setAvailableResourcesToQueue
if the queue is root?
* Delete {noformat}println System.out.println(" final is " +
parentQueueSource_X.toString());{noformat}
* Same in TestQueueMetrics, there should not be capacity scheduler specific
logic here, can we remove these changes?
* On line 2539 of TestNodeLabelContainerAllocation, should
{noformat}assertEquals(2 * GB, queueAUserMetrics.getAvailableMB(),
delta);{noformat}
be
{noformat}assertEquals(1.5 * GB, queueAUserMetrics.getAvailableMB(),
delta);{noformat}
?
* Do we need the tests after line 2551 on TestNodeLabelContainerAllocation? The
stuff removed seems to be non-exclusive node label functionality (default
partition node heartbeating, and checking queue metrics are correct), so we
probably want to keep these tests.
* On line 2566, how is node1 getting 8 containers if queue A's max capacity is
only 50% of 10GB = 5GB?
> Generate queue metrics for each partition
> -----------------------------------------
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Reporter: Jonathan Hung
> Assignee: Manikandan R
> Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt,
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt,
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch,
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch,
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch,
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each
> queue has one QueueMetrics object which captures metrics either in default
> partition or across all partitions. (After YARN-6467 it will be in default
> partition)
> But having the partition metrics would be very useful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]