Re: [PR] CASSANDRA-19472 - CEP-15 (C*) Integrate accord with repair [cassandra]

via GitHub Fri, 29 Mar 2024 11:10:50 -0700


aweisberg commented on code in PR #3174:
URL: https://github.com/apache/cassandra/pull/3174#discussion_r1544720878



##########
src/java/org/apache/cassandra/db/streaming/CassandraStreamReceiver.java:
##########
@@ -246,7 +246,7 @@ public void finished()
         checkNotNull(minVersion, "Unable to determine minimum cluster 
version");
         IAccordService accordService = AccordService.instance();
         if (session.streamOperation().requiresBarrierTransaction()
-            && cfs.metadata().isAccordEnabled()
+            && cfs.metadata().requiresAccordSupport()

Review Comment:
   Need to puzzle through here if there is a race similar to mutations where 
the mutation is supposed to write through Accord, but doesn't and so we don't 
do a barrier here when need to.



##########
src/java/org/apache/cassandra/service/consensus/migration/ConsensusMigrationRepairResult.java:
##########
@@ -33,18 +34,30 @@ private 
ConsensusMigrationRepairResult(ConsensusMigrationRepairType type, Epoch
         this.minEpoch = minEpoch;
     }
 
-    public static ConsensusMigrationRepairResult fromCassandraRepair(Epoch 
minEpoch, boolean migrationEligibleRepair)
+    public static ConsensusMigrationRepairResult fromRepair(Epoch minEpoch, 
boolean paxosRepaired, boolean accordRepaired, boolean deadNodesExcluded)
     {
-        checkArgument(!migrationEligibleRepair || 
minEpoch.isAfter(Epoch.EMPTY), "Epoch should not be empty if Paxos and regular 
repairs were performed");
-        if (migrationEligibleRepair)
-            return new 
ConsensusMigrationRepairResult(ConsensusMigrationRepairType.paxos, minEpoch);
-        else
-            return new 
ConsensusMigrationRepairResult(ConsensusMigrationRepairType.ineligible, 
Epoch.EMPTY);
+        checkArgument((!paxosRepaired && !accordRepaired) || 
minEpoch.isAfter(Epoch.EMPTY), "Epoch should not be empty if Paxos and regular 
repairs were performed");
+
+        if (deadNodesExcluded) return INELIGIBLE;
+        if (paxosRepaired && accordRepaired) return new 
ConsensusMigrationRepairResult(ConsensusMigrationRepairType.either, minEpoch);
+        if (paxosRepaired) return new 
ConsensusMigrationRepairResult(ConsensusMigrationRepairType.paxos, minEpoch);
+        if (accordRepaired) return new 
ConsensusMigrationRepairResult(ConsensusMigrationRepairType.accord, minEpoch);
+        return INELIGIBLE;
+    }
+
+    public static ConsensusMigrationRepairResult fromPaxosOnlyRepair(Epoch 
minEpoch, boolean deadNodesExcluded)
+    {
+        return fromRepair(minEpoch, true, false, deadNodesExcluded);

Review Comment:
   I think this is mixing up how Paxos repair needs to work unless I am wrong. 
Paxos repair doesn't finish at `ALL` and the requirement for Accord for 
migration is that it be done at `ALL` so that eventually Accord can safely 
switch to single replica reads.
   
   That is why previously the Paxos only repair was not marked as migration 
eligible.
   
   If Paxos repair actually applies everything at all then this would work, but 
I recall that it doesn't because that wouldn't be highly available.



##########
src/java/org/apache/cassandra/tools/nodetool/Repair.java:
##########
@@ -105,6 +105,9 @@ public class Repair extends NodeToolCmd
     @Option(title = "paxos-only", name = {"-paxos-only", "--paxos-only"}, 
description = "If the --paxos-only flag is included, no table data is repaired, 
only paxos operations..")
     private boolean paxosOnly = false;
 
+    @Option(title = "accord-only", name = {"-accord-only", "--accord-only"}, 
description = "If the --accord-only flag is included, no table data is 
repaired, only accord operations..")
+    private boolean accordOnly = false;

Review Comment:
   Last we discussed this you actually pushed to not expose this to users in 
repair and I still agree with that. An accord-only repair isn't something 
useful to end users and we added a separate finish migration step to the JMX 
interface to handle invoking the Accord only repair.
   
   We definitely should as part of repair do the Accord repairs to help 
maintain the contract people rely on where the repaired set is safe to read at 
CL=1.



##########
src/java/org/apache/cassandra/repair/RepairJob.java:
##########
@@ -163,19 +196,59 @@ public void onFailure(Throwable t)
             return;
         }
 
+        Future<Void> accordRepair;
+        if (doAccordRepair)
+        {
+            accordRepair = paxosRepair.flatMap(unused -> {
+                logger.info("{} {}.{} starting accord repair", 
session.previewKind.logPrefix(session.getId()), desc.keyspace, 
desc.columnFamily);
+                IPartitioner partitioner = 
desc.ranges.iterator().next().left.getPartitioner();
+                AccordRepair repair = new AccordRepair(null, partitioner, 
desc.keyspace, desc.ranges);
+                return repair.repair(taskExecutor);
+            }, taskExecutor);
+        }
+        else
+        {
+            accordRepair = paxosRepair.flatMap(unused -> {
+                logger.info("{} {}.{} not running accord repair", 
session.previewKind.logPrefix(session.getId()), desc.keyspace, 
desc.columnFamily);
+                return ImmediateFuture.success(null);
+            });
+        }
+
+        if (session.accordOnly)
+        {
+            accordRepair.addCallback(new FutureCallback<Void>()
+            {
+                public void onSuccess(Void ignored)
+                {
+                    logger.info("{} {}.{} accord repair completed", 
session.previewKind.logPrefix(session.getId()), desc.keyspace, 
desc.columnFamily);
+                    trySuccess(new RepairResult(desc, Collections.emptyList(), 
ConsensusMigrationRepairResult.fromAccordOnlyRepair(repairStartingEpoch, 
session.excludedDeadNodes)));
+                }
+
+                /**
+                 * Snapshot, validation and sync failures are all handled here

Review Comment:
   Comment is wrong I think



##########
test/distributed/org/apache/cassandra/distributed/test/accord/AccordMigrationTest.java:
##########
@@ -570,7 +570,7 @@ public void testAccordToPaxos() throws Exception
                  assertTargetPaxosWrite(runCasNoApply, 1, 
paxosMigratingKeys.next(), 2, 1, 1, 1, 1);
 
                  // Repair the currently migrating range from when targets 
were switched, but it's not an Accord repair, this is to make sure the wrong 
repair type doesn't trigger progress
-                 nodetool(coordinator, "repair", "-st", 
upperMidToken.toString(), "-et", maxAlignedWithLocalRanges.toString());
+                 nodetool(coordinator, "repair", "-st", 
upperMidToken.toString(), "-et", maxAlignedWithLocalRanges.toString(), 
"--paxos-only");

Review Comment:
   Paxos only is not sufficient here, we need to repair both Paxos and the base 
table for migration. We are also trying to address partially applied mutations 
that aren't from Paxos.



##########
src/java/org/apache/cassandra/repair/RepairJob.java:
##########
@@ -163,19 +196,59 @@ public void onFailure(Throwable t)
             return;
         }
 
+        Future<Void> accordRepair;
+        if (doAccordRepair)
+        {
+            accordRepair = paxosRepair.flatMap(unused -> {
+                logger.info("{} {}.{} starting accord repair", 
session.previewKind.logPrefix(session.getId()), desc.keyspace, 
desc.columnFamily);
+                IPartitioner partitioner = 
desc.ranges.iterator().next().left.getPartitioner();
+                AccordRepair repair = new AccordRepair(null, partitioner, 
desc.keyspace, desc.ranges);
+                return repair.repair(taskExecutor);

Review Comment:
   To confirm my understand, the Accord repair will complete before the regular 
repair so everything it puts in memtables will be included in the regular 
repair?



##########
src/java/org/apache/cassandra/repair/messages/RepairOption.java:
##########
@@ -199,21 +198,16 @@ public static RepairOption parse(Map<String, String> 
options, IPartitioner parti
         boolean ignoreUnreplicatedKeyspaces = 
Boolean.parseBoolean(options.get(IGNORE_UNREPLICATED_KS));
         boolean repairPaxos = 
Boolean.parseBoolean(options.get(REPAIR_PAXOS_KEY));
         boolean paxosOnly = Boolean.parseBoolean(options.get(PAXOS_ONLY_KEY));
-        boolean accordRepair = 
Boolean.parseBoolean(options.get(ACCORD_REPAIR_KEY));
+        boolean accordOnly = 
Boolean.parseBoolean(options.get(ACCORD_ONLY_KEY));
+
+        if (paxosOnly && accordOnly)
+            throw new IllegalArgumentException("Cannot repair paxos and repair 
only");
 
         if (previewKind != PreviewKind.NONE)
         {
             Preconditions.checkArgument(!repairPaxos, "repairPaxos must be set 
to false for preview repairs");
             Preconditions.checkArgument(!paxosOnly, "paxosOnly must be set to 
false for preview repairs");
-            Preconditions.checkArgument(!accordRepair, "accordRepair must be 
set to false for preview repairs");
-        }
-
-        if (accordRepair)
-        {
-            Preconditions.checkArgument(!paxosOnly, "paxosOnly must be set to 
false for Accord repairs");
-            Preconditions.checkArgument(previewKind == PreviewKind.NONE, 
"Can't perform preview repair with an Accord repair");
-            Preconditions.checkArgument(!force, "Accord repair only requires a 
quorum to work so force is not supported");

Review Comment:
   Is the force check done anywhere else?
   
   We should still be checking for dead nodes so technically yes we can run 
with force and only fail if it actually does something, but that might be less 
useful then just failing immediately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19472 - CEP-15 (C*) Integrate accord with repair [cassandra]

Reply via email to