adelapena commented on a change in pull request #1168:
URL: https://github.com/apache/cassandra/pull/1168#discussion_r788690244
##########
File path: src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java
##########
@@ -436,16 +455,23 @@ public void handleMutation(Mutation m, int size, int
entryLocation, CommitLogDes
sawCDCMutation = true;
pendingMutationBytes += size;
+
+ boolean isSchemaMutation =
SchemaConstants.isSchemaKeyspace(m.getKeyspaceName());
+
+ if (isSchemaMutation)
+ writeOrder.awaitNewBarrier();
Review comment:
I don't understand how counting schema mutations would help us to verify
that we are reproducing the problematic case, since we are not sending more
mutations of this kind and the problem is due to the order in which mutations
are replayed. Also, regular node starts include multiple schema mutations (~20)
besides the ones that we introduce with the test, and it's not easy to identify
which ones are the relevant ones with byteman.
I think that the induced 2 seconds delay is a really long time for replaying
mutations, and it seems enough to guarantee that not-schema mutations are
replayed before schema mutations. Indeed, the test passes [1000 iterations
against the patched
branch](https://app.circleci.com/pipelines/github/adelapena/cassandra/1246/workflows/9ab6f30b-8e87-486f-a5a7-2d1e74873ea5),
while these other [1000 runs against unpatched
trunk](https://app.circleci.com/pipelines/github/adelapena/cassandra/1248/workflows/03e75af8-985a-42ea-84f5-cf5f7480d6a7)
fail every single time.
However, running the test against trunk fails after the first replay, which
doesn't even need the byteman rule. I we slightly modify the test to skip the
first replay and go straight to the second replay with the byteman rule, [this
way](https://github.com/adelapena/cassandra-dtest/commit/d54cc4878d674d3736bd71c88ae192f08aa9a622),
the modified tests fails [1000
runs](https://app.circleci.com/pipelines/github/adelapena/cassandra/1250/workflows/75ee8eba-c7a7-4e7c-96bb-30e82d2f4520)
against trunk, while it succeeds another [1000
runs](https://app.circleci.com/pipelines/github/adelapena/cassandra/1249/workflows/9182133c-aee0-4b58-bf1b-8082ff3a3cbf)
against the patched branch.
I thinks these runs show the stability of the test. We could try to add some
locking with a couple of byteman rules to be able to somehow produce the
desired ordering of replayed mutations, more or less reverting the action of
the introduced `writeOrder`, but I think that would introduce unneeded
complexity, making the rules more sensitive to future changes in
`CommitLogReplayer`. wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]