[PR] Fix AutoRepair Flaky InJvm dtest [cassandra]

via GitHub Mon, 05 May 2025 11:31:49 -0700


jaydeepkumar1984 opened a new pull request, #4139:
URL: https://github.com/apache/cassandra/pull/4139


   The AutoRepair InJvm flaky dtest has been found. It is tough to reproduce 
the issue, but here is the theory:
   
   **Problem:** The InJvm dtest relies on a check to see if 
_nodeRepairTimeInSec_ metric is > 0 or not to. In most cases, the repair would 
take some time, so the metric would be "> 0" all the time. But there can be a 
corner-case scenario in that the repair finishes, say in 900 ms, and in that 
case, the metric will remain 0
   
   **Fix**
   1. The AutoRepair already leverages SLEEP_IF_REPAIR_FINISHES_QUICKLY for 
such cases, but the metrics are calculated before this sleep interval. In this 
PR, we first do SLEEP_IF_REPAIR_FINISHES_QUICKLY and then calculate the metrics.
   2. Making the InJvm dtest more aggressive by reducing the 
min_repair_interval and increasing the concurrency
   3. Add the node's broadcast address in the _Assert_ to know which node 
exactly failed for better debugging, if it happens the next time.
   
   
   The [Cassandra 
Jira](https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-20620)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

[PR] Fix AutoRepair Flaky InJvm dtest [cassandra]

Reply via email to