tolbertam commented on code in PR #4304:
URL: https://github.com/apache/cassandra/pull/4304#discussion_r2296375066


##########
src/java/org/apache/cassandra/config/DatabaseDescriptor.java:
##########
@@ -5940,6 +5940,16 @@ public static void 
setIncrementalRepairDiskHeadroomRejectRatio(double value)
         conf.incremental_repair_disk_headroom_reject_ratio = value;
     }
 
+    public static boolean getMixedMajorVersionRepairEnabled()
+    {
+        return conf.auto_repair.mixed_major_version_repair_enabled;
+    }
+
+    public static void setMixedMajorVersionRepairEnabled(boolean 
mixed_major_version_repair_enabled)
+    {
+        conf.auto_repair.mixed_major_version_repair_enabled = 
mixed_major_version_repair_enabled;
+    }
+

Review Comment:
   It looks like we already have a 
`AutoRepairConfig.isMixedMajorVersionRepairEnabled()` which is currently 
unused.   Since this is an `AutoRepair`-specific thing can we remove these 
methods and use methods on `AutoRepairConfig` instead?
   
   Would also be nice to update `SetAutoRepairConfig` to allow configuring this 
value.



##########
test/unit/org/apache/cassandra/repair/autorepair/AutoRepairUtilsTest.java:
##########
@@ -232,6 +236,52 @@ public void testGetHostIdsInCurrentRing_multiple_nodes()
         assertTrue(hosts.contains(hostId));
     }
 
+    @Test
+    public void testHasMultipleLiveMajorVersionsWithSingleNode()
+    {
+        boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions();
+        assertFalse(result);
+    }
+
+    @Test
+    public void 
testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameVersion()
+    {
+        ClusterMetadataTestHelper.addEndpoint(2);
+        // Test the current behavior with the existing cluster setup
+        // In a single-node test environment, this should return false
+        boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions();
+        assertFalse(result);
+    }
+
+    @Test
+    public void 
testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameMajorVersionDifferentMinorVersions()
+    {
+        // add two nodes with cassandra 5 major version, but different minor 
version
+        CassandraVersion differentCassandraVersion = new CassandraVersion(
+            String.format("%d.%d",
+                          NodeVersion.CURRENT.cassandraVersion.major,
+                          NodeVersion.CURRENT.cassandraVersion.minor+1));
+        ClusterMetadataTestHelper.addEndpoint(2, new NodeVersion(
+             differentCassandraVersion,
+             NodeVersion.CURRENT_METADATA_VERSION));
+        // With the same major versions, but different minor versions, we 
should still see this function return true
+        boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions();
+        assertFalse(result);
+    }
+
+    @Test
+    public void 
testHasMultipleLiveMajorVersionsWithMultipleNodesOfDifferentMajorVersions()
+    {
+        // add two nodes with cassandra 5 major version, but different minor 
version

Review Comment:
   A bit pedantic I know, but in future the current major version will change.
   
   ```suggestion
           // add two nodes with current cassandra major version, but different 
minor version
   ```



##########
conf/cassandra.yaml:
##########
@@ -2779,6 +2779,9 @@ storage_compatibility_mode: NONE
 #   # for a specified duration to ensure they are indeed removed before 
adjustments are made to the schedule.
 #   history_clear_delete_hosts_buffer_interval: 2h
 #   # NOTE: Each of the below settings can be overridden per repair type under 
repair_type_overrides
+#   # by default repair is disabled if there are mixed major versions detected 
- which would happen
+#   # if a major version upgrade is being performed on the cluster, but a user 
can enable it using this flag
+#   # mixed_major_version_repair_enabled = false;

Review Comment:
   Small suggestions:
   
   * Move `#NOTE: ...` to the line before `global_settings` as it makes more 
sense to be proximate to there.
   * Uncomment `mixed_major_version_repair_enabled = false` and remove 
semi-colon.
   
   ```suggestion
   #   # by default repair is disabled if there are mixed major versions 
detected - which would happen
   #   # if a major version upgrade is being performed on the cluster, but a 
user can enable it using this flag
   #   mixed_major_version_repair_enabled = false
   #   # NOTE: Each of the below settings can be overridden per repair type 
under repair_type_overrides
   ```
   
   A couple of other things that would to be good to do:
   
   1. Can you also add this to `cassandra_latest.yaml` as well?
   2. Add to 'Top level settings' in auto_repair.adoc? In that file it would 
also be good to expand on why repairs should not be run during major version 
upgrades (previous slack thread about it: 
https://the-asf.slack.com/archives/C06MR8K2Q0Z/p1729716524563759)



##########
test/distributed/org/apache/cassandra/distributed/test/log/ClusterMetadataTestHelper.java:
##########
@@ -798,19 +798,27 @@ public MoveProcess finishMove()
 
     public static void addEndpoint(int i)
     {
+        addEndpoint(i, NodeVersion.CURRENT);
+    }
+
+    public static void addEndpoint(int i, NodeVersion nodeVersion) {
         try

Review Comment:
   `{` on next line
   
   ```suggestion
       public static void addEndpoint(int i, NodeVersion nodeVersion)
       {
           try
   ```



##########
test/distributed/org/apache/cassandra/distributed/test/log/ClusterMetadataTestHelper.java:
##########
@@ -798,19 +798,27 @@ public MoveProcess finishMove()
 
     public static void addEndpoint(int i)
     {
+        addEndpoint(i, NodeVersion.CURRENT);
+    }
+
+    public static void addEndpoint(int i, NodeVersion nodeVersion) {
         try
         {
-            addEndpoint(InetAddressAndPort.getByName("127.0.0." + i), new 
Murmur3Partitioner.LongToken(i));
+            addEndpoint(InetAddressAndPort.getByName("127.0.0." + i), new 
Murmur3Partitioner.LongToken(i), nodeVersion);
         }
         catch (UnknownHostException e)
         {
             throw new RuntimeException(e);
         }
     }
 
-    public static void addEndpoint(InetAddressAndPort endpoint, Token t)
+    public static void addEndpoint(InetAddressAndPort endpoint, Token t) {
+        addEndpoint(endpoint, t, NodeVersion.CURRENT);

Review Comment:
   { on next line:
   
   ```suggestion
       public static void addEndpoint(InetAddressAndPort endpoint, Token t)
       {
           addEndpoint(endpoint, t, NodeVersion.CURRENT);
   ```



##########
src/java/org/apache/cassandra/repair/autorepair/AutoRepairUtils.java:
##########
@@ -425,6 +425,21 @@ public static CurrentRepairStatus 
getCurrentRepairStatus(RepairType repairType,
         return null;
     }
 
+    /**
+     * Checks whether the cluster has multiple major versions
+     * @return
+     *  true if more than one major versions are detected
+     *  false if only one major version is detected
+     *
+     */
+    public static boolean hasMultipleLiveMajorVersions()
+    {
+        ClusterMetadata metadata = ClusterMetadata.current();
+        int maxMajorVersion = 
ClusterMetadata.current().directory.clusterMaxVersion.cassandraVersion.major;
+        int minMajorVersion = 
ClusterMetadata.current().directory.clusterMinVersion.cassandraVersion.major;
+        return maxMajorVersion != minMajorVersion;
+    }
+

Review Comment:
   👍 nice and simple.
   
   I had some doubts on whether this is sufficient given Cassandra's version 
history.   e.g. between 2.0 and 2.1 maybe this wouldn't be safe?  But I'd hope 
from C* 6.0 on this is probably sufficient.



##########
src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java:
##########
@@ -165,6 +166,11 @@ public void repair(AutoRepairConfig.RepairType repairType)
             logger.debug("Auto-repair is disabled for repair type {}", 
repairType);
             return;
         }
+        if (!DatabaseDescriptor.getMixedMajorVersionRepairEnabled() &&
+            Gossiper.instance.hasMultipleLiveMajorVersions()) {
+            logger.info("Auto-repair is disabled when nodes in the cluster 
have different major versions");

Review Comment:
   I think the metrics are calculated off of / engaged by  `AutoRepairState` 
(accessed 2 lines below).  You could add something to that.   Curious on the 
metric we are interested in adding?  I'm guess maybe a `Counter` that 
increments whenever repair is skipped because mixed versions are detected?



##########
test/unit/org/apache/cassandra/repair/autorepair/AutoRepairUtilsTest.java:
##########
@@ -232,6 +236,52 @@ public void testGetHostIdsInCurrentRing_multiple_nodes()
         assertTrue(hosts.contains(hostId));
     }
 
+    @Test
+    public void testHasMultipleLiveMajorVersionsWithSingleNode()
+    {
+        boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions();
+        assertFalse(result);
+    }
+
+    @Test
+    public void 
testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameVersion()
+    {
+        ClusterMetadataTestHelper.addEndpoint(2);
+        // Test the current behavior with the existing cluster setup
+        // In a single-node test environment, this should return false
+        boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions();
+        assertFalse(result);
+    }
+
+    @Test
+    public void 
testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameMajorVersionDifferentMinorVersions()
+    {
+        // add two nodes with cassandra 5 major version, but different minor 
version

Review Comment:
   ```suggestion
           // add two nodes with current cassandra major version, but different 
minor version
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Reply via email to