Re: [PR] CEP-46 witnesses [cassandra]

via GitHub Fri, 19 Sep 2025 18:29:33 -0700


aratno commented on code in PR #4378:
URL: https://github.com/apache/cassandra/pull/4378#discussion_r2364903990



##########
src/java/org/apache/cassandra/db/AbstractReadCommandVerbHandler.java:
##########
@@ -107,6 +108,7 @@ else if 
(localComparisonEpoch.isAfter(readCommand.serializedAtEpoch()))
 
     private ClusterMetadata checkTokenOwnership(ClusterMetadata metadata, 
Message<T> message)
     {
+        boolean acceptsTransient = message.verb() != Verb.READ_REQ;

Review Comment:
   Confirming this will include RANGE_REQ later on



##########
src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java:
##########
@@ -1388,7 +1371,7 @@ public CleanupSummary 
releaseRepairData(Collection<TimeUUID> sessions)
         readLock.lock();
         try
         {
-            for (PendingRepairManager prm : 
Iterables.concat(pendingRepairs.getManagers(), transientRepairs.getManagers()))
+            for (PendingRepairManager prm : 
Iterables.concat(pendingRepairs.getManagers()))

Review Comment:
   No need to concat anymore



##########
src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java:
##########
@@ -422,8 +422,9 @@ private void printSStableMetadata(File file, boolean scan) 
throws IOException
             field("ClusteringTypes", clusteringTypes.toString());
             field("StaticColumns", FBUtilities.toString(statics));
             field("RegularColumns", FBUtilities.toString(regulars));
-            if (stats != null)
-                field("IsTransient", stats.isTransient);
+            // TODO (review): This should be removed right? Nothing depends on 
it?
+//            if (stats != null)
+//                field("IsTransient", stats.isTransient);

Review Comment:
   I support removal 👍 



##########
src/java/org/apache/cassandra/db/ReadCommand.java:
##########
@@ -1430,7 +1393,7 @@ private void serializeHeader(ReadCommand command, 
DataOutputPlus out, int versio
             out.writeByte(
             digestFlag(command.isDigestQuery())
             | indexFlag(null != command.indexQueryPlan())
-            | acceptsTransientFlag(command.acceptsTransient())
+            // | acceptsTransientFlag(false) Deprecated flag, could be reused?
             | 
needsReconciliationFlag(command.rowFilter().needsReconciliation())

Review Comment:
   `needsReconciliationFlag` was added for non-strict filtering, should 
probably rename for clarity now that read reconciliation is its own concept



##########
src/java/org/apache/cassandra/cql3/statements/schema/CreateTableStatement.java:
##########
@@ -167,11 +166,12 @@ public Keyspaces apply(ClusterMetadata metadata)
         TableMetadata table = builder.build();
         table.validate();
 
-        if (keyspace.replicationStrategy.hasTransientReplicas()
-            && table.params.readRepair != ReadRepairStrategy.NONE)
-        {
-            throw ire("read_repair must be set to 'NONE' for transiently 
replicated keyspaces");
-        }
+        // TODO (review): This can be removed right? ReadRepair is effectively 
not done anymore so the setting doesn't matter
+//        if (keyspace.replicationStrategy.hasTransientReplicas()
+//            && table.params.readRepair != ReadRepairStrategy.NONE)
+//        {
+//            throw ire("read_repair must be set to 'NONE' for transiently 
replicated keyspaces");
+//        }

Review Comment:
   I think we should require BLOCKING for tracked keyspaces for now. Users 
could previously switch to NONE if they preferred write atomicity over read 
monotonicity and wanted slightly better performance, but with mutation tracking 
there wouldn't be a semantics tradeoff, and the performance of read 
reconciliation should be even better than read repair anyway.
   
   The reason I think we should defer support for NONE is that we may decide to 
do an equivalent of full repair with range reads, and there isn't a strong 
justification to support both modes yet.



##########
src/java/org/apache/cassandra/db/Keyspace.java:
##########
@@ -629,8 +634,42 @@ private Future<?> applyInternalTracked(Mutation mutation, 
Promise<?> future)
                         continue;
                     }
 
+                    // If this range is only witnessed then don't apply the 
update to the underlying column family store
+                    // We still want the mutation tracking log to see the 
update so that it can witness it
+                    // and participate in reconciliation of the mutation
+                    AbstractReplicationStrategy replicationStrategy = 
cfs.keyspace.getReplicationStrategy();
+                    if (replicationStrategy.hasTransientReplicas())
+                    {
+                        RangesAtEndpoint localRanges = 
replicationStrategy.getLocalRanges(cm);
+                        Token token = upd.partitionKey().getToken();
+                        boolean foundMatch = false;
+                        // TODO (review): Races and this being an out of range 
mutation, it's silently accepting the mutation to the log
+                        // but not applying to the CF. This was already 
checked, but in a race what happens? Double check ack handling
+                        // and whether the voting group is checked to match 
safely
+                        for (Range<Token> r : localRanges.onlyFull().ranges())

Review Comment:
   If we're losing the range (coordinator is behind and believes replica owns 
the range), we may write something to an SSTable that's not read, and will be 
purged on next cleanup.
   
   If we're gaining the range (coordinator is ahead and believes replica owns 
the range), TCM should try to catch up before we process the message and get 
here.



##########
src/java/org/apache/cassandra/locator/ReplicaPlans.java:
##########
@@ -632,6 +632,9 @@ E select(ConsistencyLevel consistencyLevel, L liveAndDown, 
L live)
         }
     };
 
+    // TODO (review): Cheap quorums are not a goal for now, would really 
require speculation for writes
+    // and would trigger reconciliation work later, potentially quite a lot so 
I don't think it makes sense.
+    // So just remove selector?

Review Comment:
   I support removing support for cheap quorums. I _could_ imagine a world 
where we deprioritize delivery of new writes to witnesses if reconciliation is 
behind, but even a change like that would require more justification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CEP-46 witnesses [cassandra]

Reply via email to