adelapena commented on a change in pull request #1168:
URL: https://github.com/apache/cassandra/pull/1168#discussion_r788690244



##########
File path: src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java
##########
@@ -436,16 +455,23 @@ public void handleMutation(Mutation m, int size, int 
entryLocation, CommitLogDes
             sawCDCMutation = true;
 
         pendingMutationBytes += size;
+
+        boolean isSchemaMutation = 
SchemaConstants.isSchemaKeyspace(m.getKeyspaceName());
+
+        if (isSchemaMutation)
+            writeOrder.awaitNewBarrier();

Review comment:
       I don't understand how counting schema mutations would help us to verify 
that we are reproducing the problematic case, since we are not sending more 
mutations of this kind and the problem is due to the order in which mutations 
are replayed. Also, regular node starts include multiple schema mutations (~20) 
besides the ones that we introduce with the test, and it's not easy to identify 
which ones are the relevant ones with byteman.
   
   I think that the induced 2 seconds delay is a really long time for replaying 
mutations, and it seems enough to guarantee that not-schema mutations are 
replayed before schema mutations. Indeed, the test passes [1000 iterations 
against the patched 
branch](https://app.circleci.com/pipelines/github/adelapena/cassandra/1246/workflows/9ab6f30b-8e87-486f-a5a7-2d1e74873ea5),
 while these other [1000 runs against unpatched 
trunk](https://app.circleci.com/pipelines/github/adelapena/cassandra/1248/workflows/03e75af8-985a-42ea-84f5-cf5f7480d6a7)
 fail every single time. 
   
   However, running the test against trunk fails after the first replay, which 
doesn't even need the byteman rule. I we slightly modify the test to skip the 
first replay and go straight to the second replay with the byteman rule, [this 
way](https://github.com/adelapena/cassandra-dtest/commit/d54cc4878d674d3736bd71c88ae192f08aa9a622),
 the modified tests fails [1000 
runs](https://app.circleci.com/pipelines/github/adelapena/cassandra/1250/workflows/75ee8eba-c7a7-4e7c-96bb-30e82d2f4520)
 against trunk, while it succeeds another [1000 
runs](https://app.circleci.com/pipelines/github/adelapena/cassandra/1249/workflows/9182133c-aee0-4b58-bf1b-8082ff3a3cbf)
 against the patched branch. 
   
   I thinks these runs show the stability of the test. We could try to add some 
locking with a couple of byteman rules to be able to somehow produce the 
desired ordering of replayed mutations, more or less reverting the action of 
the introduced `writeOrder`, but I think that would introduce unneeded 
complexity, making the rules more sensitive to future changes in 
`CommitLogReplayer`. wdyt?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to