Mmuzaf commented on code in PR #3102:
URL: https://github.com/apache/cassandra/pull/3102#discussion_r1714167570


##########
test/unit/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoirTest.java:
##########
@@ -604,6 +614,115 @@ public void testSize()
             assertEquals(2, toSnapshot.apply(histogram).size());
         }
 
+        /**
+         * This is a test that exposes CASSANDRA-19365 race condition
+         * The idea is to update a histogram from multiple threads
+         * and observe if the reported p99 doesn't get stuck at a low value or 
p50 at a high value
+         * due to update with high weight being inserted after the buckets are 
rescaled
+         * <p>
+         * The load has 95% of 42, and 5% of the time it's 1109. Despite that 
the histogram may be convinced for a long time
+         * that p99 is 42 or that p50 is 1109.
+         * The reason may be seen in the snapshot dump, where after rescale 
the bucket values may get
+         * very big due to the race condition and too big weight of the 
inserted samples.
+         * The values were picked to match bucket boundaries, but that's only 
for aesthetics.
+         * <p>
+         * In production the rescale happens every 30 minutes. In this test 
time flies roughly 1000 times faster to
+         * hit the condition in a reasonable time.
+         */
+        @Ignore("This test exposes a specific CASSANDRA-19365 race condition; 
it doesn't make sense to run it in CI")

Review Comment:
   I don't think adding tests that have been ignored by default is good 
practice. Here is what we can do:
   - Add a new test to the `burn` package like ConnectionBurnTest and test that 
the race between update and rescale doesn't cause problems with the buckets 
flooding (only some operations are missed)
   - Remove the `@Ignore` here and do the same here for a period of 30-60 
seconds and check the state at the end as we do now. The test shouldn't be 
flaky for 10k runs.
   
   We can run on the CircleCI this test thousands of times, to verify that the 
race doesn't cause any problems. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to