Re: [PR] CASSANDRA-19744 Accord migration and interop correctness [cassandra]

via GitHub Wed, 07 Aug 2024 15:27:32 -0700


aweisberg commented on code in PR #3395:
URL: https://github.com/apache/cassandra/pull/3395#discussion_r1708022010



##########
src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java:
##########
@@ -294,8 +329,20 @@ public void onFailure(InetAddressAndPort from, 
RequestFailure failure)
         if (blockFor() + n > candidateReplicaCount())
             signal();
 
-        if (hintOnFailure != null && 
StorageProxy.shouldHint(replicaPlan.lookup(from)))
-            StorageProxy.submitHint(hintOnFailure.get(), 
replicaPlan.lookup(from), null);
+        // If the failure was RETRY_ON_DIFFERENT_TRANSACTION_SYSTEM then we 
only want to hint once

Review Comment:
   TBH I am having trouble remembering why I went to the trouble of doing this 
here. The purpose of this code was mostly to reduce the number of redundant 
Accord transactions created by the coordinator not to change the behavior of 
whether a hint is created or not.
   
   This doesn't mean the entire request failed with RETRY_ON_DIFFERENT_SYSTEM 
it just means we got one response of such and it could still succeed or fail at 
other nodes for other reasons, succeed overall entirely, or write regular hints 
for some nodes that return other error responses.
   
   Part of the reason hints are written here is that the callback may return 
before timeouts/responses come from all nodes so you can't rely on the callback 
completion to then figure out which hints to write. I believe this was the race 
condition I ran into.
   
   The direction of the migration here isn't guaranteed. It could be that it 
was rejected during a migration away from Accord because the node that we asked 
to mutate thought it should have been on Accord. If we don't hint then 
something like ANY doesn't do the right thing.
   
   With Accord ANY isn't really honored at the moment so you get an error 
instead of a hint written at the coordinator.
   
   That's a very long way of saying I didn't think too hard here to try and 
prove it was safe to not hint since it should be a rare occurrence and I 
figured that hinting was better than not hinting.
   
   It's possible to optimize it a little more by having the Accord hint written 
only if the overall outcome is retry on different system, but I didn't think 
this was something worth optimizing much since it is only an issue during a 
very brief window where nodes disagree on the correct system to use along with 
a concurrent write to that range.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19744 Accord migration and interop correctness [cassandra]

Reply via email to