Stig Rohde Døssing created STORM-3121: -----------------------------------------
Summary: Fix flaky metrics tests in storm-core Key: STORM-3121 URL: https://issues.apache.org/jira/browse/STORM-3121 Project: Apache Storm Issue Type: Bug Components: storm-core Affects Versions: 2.0.0 Reporter: Stig Rohde Døssing Assignee: Stig Rohde Døssing The tests are flaky, but only rarely fail. I've only seen them fail on Travis when Travis is under load. Example failures: {code} classname: org.apache.storm.metrics-test / testname: test-custom-metric-with-multi-tasks expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" "my-custom-metric") 0 N__3207__auto__)) actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0])) at: test_runner.clj:105 {code} {code} classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2 expected: (clojure.core/= [1 1] (clojure.core/subvec (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "myspout" "__emit-count/default") 0 N__3207__auto__)) actual: (not (clojure.core/= [1 1] [1 0])) at: test_runner.clj:105 {code} The problem is that the tests increment metrics counters in the executor async loops, then expect the counters to end up in exact metrics buckets. The creation of a bucket is triggered by the metrics timer. The timer is included in time simulation and LocalCluster.waitForIdle, but the executor async loop isn't. There isn't any guarantee that the executor async loop gets to run when the test does a sequence like {code} Time.advanceClusterTime cluster.waitForIdle {code} because the waitForIdle check doesn't know about the executor async loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)