tolbertam commented on code in PR #4304: URL: https://github.com/apache/cassandra/pull/4304#discussion_r2296375066
########## src/java/org/apache/cassandra/config/DatabaseDescriptor.java: ########## @@ -5940,6 +5940,16 @@ public static void setIncrementalRepairDiskHeadroomRejectRatio(double value) conf.incremental_repair_disk_headroom_reject_ratio = value; } + public static boolean getMixedMajorVersionRepairEnabled() + { + return conf.auto_repair.mixed_major_version_repair_enabled; + } + + public static void setMixedMajorVersionRepairEnabled(boolean mixed_major_version_repair_enabled) + { + conf.auto_repair.mixed_major_version_repair_enabled = mixed_major_version_repair_enabled; + } + Review Comment: It looks like we already have a `AutoRepairConfig.isMixedMajorVersionRepairEnabled()` which is currently unused. Since this is an `AutoRepair`-specific thing can we remove these methods and use methods on `AutoRepairConfig` instead? Would also be nice to update `SetAutoRepairConfig` to allow configuring this value. ########## test/unit/org/apache/cassandra/repair/autorepair/AutoRepairUtilsTest.java: ########## @@ -232,6 +236,52 @@ public void testGetHostIdsInCurrentRing_multiple_nodes() assertTrue(hosts.contains(hostId)); } + @Test + public void testHasMultipleLiveMajorVersionsWithSingleNode() + { + boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions(); + assertFalse(result); + } + + @Test + public void testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameVersion() + { + ClusterMetadataTestHelper.addEndpoint(2); + // Test the current behavior with the existing cluster setup + // In a single-node test environment, this should return false + boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions(); + assertFalse(result); + } + + @Test + public void testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameMajorVersionDifferentMinorVersions() + { + // add two nodes with cassandra 5 major version, but different minor version + CassandraVersion differentCassandraVersion = new CassandraVersion( + String.format("%d.%d", + NodeVersion.CURRENT.cassandraVersion.major, + NodeVersion.CURRENT.cassandraVersion.minor+1)); + ClusterMetadataTestHelper.addEndpoint(2, new NodeVersion( + differentCassandraVersion, + NodeVersion.CURRENT_METADATA_VERSION)); + // With the same major versions, but different minor versions, we should still see this function return true + boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions(); + assertFalse(result); + } + + @Test + public void testHasMultipleLiveMajorVersionsWithMultipleNodesOfDifferentMajorVersions() + { + // add two nodes with cassandra 5 major version, but different minor version Review Comment: A bit pedantic I know, but in future the current major version will change. ```suggestion // add two nodes with current cassandra major version, but different minor version ``` ########## conf/cassandra.yaml: ########## @@ -2779,6 +2779,9 @@ storage_compatibility_mode: NONE # # for a specified duration to ensure they are indeed removed before adjustments are made to the schedule. # history_clear_delete_hosts_buffer_interval: 2h # # NOTE: Each of the below settings can be overridden per repair type under repair_type_overrides +# # by default repair is disabled if there are mixed major versions detected - which would happen +# # if a major version upgrade is being performed on the cluster, but a user can enable it using this flag +# # mixed_major_version_repair_enabled = false; Review Comment: Small suggestions: * Move `#NOTE: ...` to the line before `global_settings` as it makes more sense to be proximate to there. * Uncomment `mixed_major_version_repair_enabled = false` and remove semi-colon. ```suggestion # # by default repair is disabled if there are mixed major versions detected - which would happen # # if a major version upgrade is being performed on the cluster, but a user can enable it using this flag # mixed_major_version_repair_enabled = false # # NOTE: Each of the below settings can be overridden per repair type under repair_type_overrides ``` A couple of other things that would to be good to do: 1. Can you also add this to `cassandra_latest.yaml` as well? 2. Add to 'Top level settings' in auto_repair.adoc? In that file it would also be good to expand on why repairs should not be run during major version upgrades (previous slack thread about it: https://the-asf.slack.com/archives/C06MR8K2Q0Z/p1729716524563759) ########## test/distributed/org/apache/cassandra/distributed/test/log/ClusterMetadataTestHelper.java: ########## @@ -798,19 +798,27 @@ public MoveProcess finishMove() public static void addEndpoint(int i) { + addEndpoint(i, NodeVersion.CURRENT); + } + + public static void addEndpoint(int i, NodeVersion nodeVersion) { try Review Comment: `{` on next line ```suggestion public static void addEndpoint(int i, NodeVersion nodeVersion) { try ``` ########## test/distributed/org/apache/cassandra/distributed/test/log/ClusterMetadataTestHelper.java: ########## @@ -798,19 +798,27 @@ public MoveProcess finishMove() public static void addEndpoint(int i) { + addEndpoint(i, NodeVersion.CURRENT); + } + + public static void addEndpoint(int i, NodeVersion nodeVersion) { try { - addEndpoint(InetAddressAndPort.getByName("127.0.0." + i), new Murmur3Partitioner.LongToken(i)); + addEndpoint(InetAddressAndPort.getByName("127.0.0." + i), new Murmur3Partitioner.LongToken(i), nodeVersion); } catch (UnknownHostException e) { throw new RuntimeException(e); } } - public static void addEndpoint(InetAddressAndPort endpoint, Token t) + public static void addEndpoint(InetAddressAndPort endpoint, Token t) { + addEndpoint(endpoint, t, NodeVersion.CURRENT); Review Comment: { on next line: ```suggestion public static void addEndpoint(InetAddressAndPort endpoint, Token t) { addEndpoint(endpoint, t, NodeVersion.CURRENT); ``` ########## src/java/org/apache/cassandra/repair/autorepair/AutoRepairUtils.java: ########## @@ -425,6 +425,21 @@ public static CurrentRepairStatus getCurrentRepairStatus(RepairType repairType, return null; } + /** + * Checks whether the cluster has multiple major versions + * @return + * true if more than one major versions are detected + * false if only one major version is detected + * + */ + public static boolean hasMultipleLiveMajorVersions() + { + ClusterMetadata metadata = ClusterMetadata.current(); + int maxMajorVersion = ClusterMetadata.current().directory.clusterMaxVersion.cassandraVersion.major; + int minMajorVersion = ClusterMetadata.current().directory.clusterMinVersion.cassandraVersion.major; + return maxMajorVersion != minMajorVersion; + } + Review Comment: 👍 nice and simple. I had some doubts on whether this is sufficient given Cassandra's version history. e.g. between 2.0 and 2.1 maybe this wouldn't be safe? But I'd hope from C* 6.0 on this is probably sufficient. ########## src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java: ########## @@ -165,6 +166,11 @@ public void repair(AutoRepairConfig.RepairType repairType) logger.debug("Auto-repair is disabled for repair type {}", repairType); return; } + if (!DatabaseDescriptor.getMixedMajorVersionRepairEnabled() && + Gossiper.instance.hasMultipleLiveMajorVersions()) { + logger.info("Auto-repair is disabled when nodes in the cluster have different major versions"); Review Comment: I think the metrics are calculated off of / engaged by `AutoRepairState` (accessed 2 lines below). You could add something to that. Curious on the metric we are interested in adding? I'm guess maybe a `Counter` that increments whenever repair is skipped because mixed versions are detected? ########## test/unit/org/apache/cassandra/repair/autorepair/AutoRepairUtilsTest.java: ########## @@ -232,6 +236,52 @@ public void testGetHostIdsInCurrentRing_multiple_nodes() assertTrue(hosts.contains(hostId)); } + @Test + public void testHasMultipleLiveMajorVersionsWithSingleNode() + { + boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions(); + assertFalse(result); + } + + @Test + public void testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameVersion() + { + ClusterMetadataTestHelper.addEndpoint(2); + // Test the current behavior with the existing cluster setup + // In a single-node test environment, this should return false + boolean result = AutoRepairUtils.hasMultipleLiveMajorVersions(); + assertFalse(result); + } + + @Test + public void testHasMultipleLiveMajorVersionsWithMultipleNodesOfSameMajorVersionDifferentMinorVersions() + { + // add two nodes with cassandra 5 major version, but different minor version Review Comment: ```suggestion // add two nodes with current cassandra major version, but different minor version ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org