aweisberg commented on code in PR #4508:
URL: https://github.com/apache/cassandra/pull/4508#discussion_r2628779083


##########
src/java/org/apache/cassandra/db/AbstractMutationVerbHandler.java:
##########
@@ -193,6 +196,49 @@ else if 
(message.epoch().isBefore(metadata.schema.lastModified()))
         return metadata;
     }
 
+    /**
+     * Confirm that the presence/absence of a mutation id matches our 
expectations for the given keyspace/table/token. If
+     * it doesn't, then we're not on the same epoch as the coordinator, or 
there's a bug.
+     */
+    private ClusterMetadata checkReplicationMigration(ClusterMetadata 
metadata, Message<T> message, InetAddressAndPort respondTo)
+    {
+        IMutation mutation = message.payload;
+        MutationRouting expected = mutation.id().isNone() ? 
MutationRouting.UNTRACKED : MutationRouting.TRACKED;
+        if (expected == MigrationRouter.getMutationRouting(metadata, mutation))
+            return metadata;
+
+        if (message.epoch().isAfter(metadata.epoch))
+        {
+            // coordinator is ahead, fetch log and recheck
+            metadata = 
ClusterMetadataService.instance().fetchLogFromPeerOrCMS(metadata, respondTo, 
message.epoch());
+            if (expected != MigrationRouter.getMutationRouting(metadata, 
mutation))
+                throw new IllegalStateException(String.format("Inconsistent 
mutation routing after fetching log for epoch = %s. Keyspace: %s key: %s ",
+                                                              metadata.epoch,
+                                                              
mutation.getKeyspaceName(),
+                                                              mutation.key()));
+        }
+        else if (message.epoch().isBefore(metadata.epoch))
+        {
+            TCMMetrics.instance.coordinatorBehindReplication.mark();
+            throw new CoordinatorBehindException(String.format("Replication 
type / migration mismatch for keyspace: %s token %s, coordinator: %s is behind, 
our epoch = %s, their epoch = %s",

Review Comment:
   The repair waits for in-flight writes to complete after having brought the 
local cluster metadata state up to date.
   
   So a racing in-flight write that hasn't yet gotten into the write op order 
could do this check, think the write is OK to proceed, the repair completes the 
barrier thinking all the writes are now included in the repair, but then this 
write goes ahead but is routed to the old thing, and the repair doesn't 
actually ensure it included this write.
   
   Does that make sense? For the write op order to act as a barrier the check 
against cluster metadata has to be done inside the op order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to