[
https://issues.apache.org/jira/browse/YARN-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated YARN-7341:
--------------------------------
Attachment: YARN-7341.001.patch
It turns out that this is a real bug introduced by YARN-7095.
{{RouterWebServiceUtil#mergeMetrics}} takes into two sets of metrics and merges
them into the first one. However, for a number of the metrics, it actually
simply doubles the first metric. For example
{code:java}
metrics.setTotalNodes(metrics.getTotalNodes() + metrics.getTotalNodes());
{code}
should be
{code:java}
metrics.setTotalNodes(metrics.getTotalNodes() +
metricsResponse.getTotalNodes());
{code}
This should have failed every time, but the test also had a "flaw", which only
made it flakey. The test initializes two sets of metrics to random values
using different {{Random}} objects using {{System.getCurrentTimeMillis()}} for
the seed. However, the code is fast enough that it often takes less than 1ms,
causing the two objects to use the same seed. When this happens, the two sets
of metrics have the same values, and will mask the bug I described earlier. If
the code is slower (e.g. GC pause, swapping, adding a log statement for the
seed, etc), then you'll get different seed values and the test will (correctly)
fail.
The 001 patch fixes the bug by using the correct metric in
{{RouterWebServiceUtil#mergeMetrics}}. And it fixes the test by ensuring that
the two seeds will be different. It also cleans up some formatting and logs
the seed for better debugability.
> TestRouterWebServiceUtil#testMergeMetrics is flakey
> ---------------------------------------------------
>
> Key: YARN-7341
> URL: https://issues.apache.org/jira/browse/YARN-7341
> Project: Hadoop YARN
> Issue Type: Bug
> Components: federation
> Affects Versions: 3.0.0-beta1
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: YARN-7341.001.patch
>
>
> {{TestRouterWebServiceUtil#testMergeMetrics}} is flakey. It sometimes fails
> with something like:
> {noformat}
> Running org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec <<<
> FAILURE! - in
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil
> testMergeMetrics(org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil)
> Time elapsed: 0.005 sec <<< FAILURE!
> java.lang.AssertionError: expected:<1092> but was:<584>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil.testMergeMetrics(TestRouterWebServiceUtil.java:473)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]