Stig Rohde Døssing created STORM-3121:
-----------------------------------------

             Summary: Fix flaky metrics tests in storm-core
                 Key: STORM-3121
                 URL: https://issues.apache.org/jira/browse/STORM-3121
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 2.0.0
            Reporter: Stig Rohde Døssing
            Assignee: Stig Rohde Døssing


The tests are flaky, but only rarely fail. I've only seen them fail on Travis 
when Travis is under load.

Example failures:
{code}
classname: org.apache.storm.metrics-test / testname: 
test-custom-metric-with-multi-tasks
expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec 
(org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" 
"my-custom-metric") 0 N__3207__auto__))
  actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
      at: test_runner.clj:105
{code}
{code}
classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
expected: (clojure.core/= [1 1] (clojure.core/subvec 
(org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! 
"myspout" "__emit-count/default") 0 N__3207__auto__))
  actual: (not (clojure.core/= [1 1] [1 0]))
      at: test_runner.clj:105
{code}

The problem is that the tests increment metrics counters in the executor async 
loops, then expect the counters to end up in exact metrics buckets. The 
creation of a bucket is triggered by the metrics timer. The timer is included 
in time simulation and LocalCluster.waitForIdle, but the executor async loop 
isn't. There isn't any guarantee that the executor async loop gets to run when 
the test does a sequence like
{code}
Time.advanceClusterTime
cluster.waitForIdle
{code}
because the waitForIdle check doesn't know about the executor async loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to