aweisberg commented on code in PR #4508:
URL: https://github.com/apache/cassandra/pull/4508#discussion_r2628779083
##########
src/java/org/apache/cassandra/db/AbstractMutationVerbHandler.java:
##########
@@ -193,6 +196,49 @@ else if
(message.epoch().isBefore(metadata.schema.lastModified()))
return metadata;
}
+ /**
+ * Confirm that the presence/absence of a mutation id matches our
expectations for the given keyspace/table/token. If
+ * it doesn't, then we're not on the same epoch as the coordinator, or
there's a bug.
+ */
+ private ClusterMetadata checkReplicationMigration(ClusterMetadata
metadata, Message<T> message, InetAddressAndPort respondTo)
+ {
+ IMutation mutation = message.payload;
+ MutationRouting expected = mutation.id().isNone() ?
MutationRouting.UNTRACKED : MutationRouting.TRACKED;
+ if (expected == MigrationRouter.getMutationRouting(metadata, mutation))
+ return metadata;
+
+ if (message.epoch().isAfter(metadata.epoch))
+ {
+ // coordinator is ahead, fetch log and recheck
+ metadata =
ClusterMetadataService.instance().fetchLogFromPeerOrCMS(metadata, respondTo,
message.epoch());
+ if (expected != MigrationRouter.getMutationRouting(metadata,
mutation))
+ throw new IllegalStateException(String.format("Inconsistent
mutation routing after fetching log for epoch = %s. Keyspace: %s key: %s ",
+ metadata.epoch,
+
mutation.getKeyspaceName(),
+ mutation.key()));
+ }
+ else if (message.epoch().isBefore(metadata.epoch))
+ {
+ TCMMetrics.instance.coordinatorBehindReplication.mark();
+ throw new CoordinatorBehindException(String.format("Replication
type / migration mismatch for keyspace: %s token %s, coordinator: %s is behind,
our epoch = %s, their epoch = %s",
Review Comment:
The repair waits for in-flight writes to complete after having brought the
local cluster metadata state up to date.
So a racing in-flight write that hasn't yet gotten into the write op order
could do this check, think the write is OK to proceed, the repair completes the
barrier thinking all the writes are now included in the repair, but then this
write goes ahead but is routed to the old thing, and the repair doesn't
actually ensure it included this write.
Does that make sense? For the write op order to act as a barrier the check
against cluster metadata has to be done inside the op order.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]