aweisberg commented on code in PR #3395:
URL: https://github.com/apache/cassandra/pull/3395#discussion_r1708022010
##########
src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java:
##########
@@ -294,8 +329,20 @@ public void onFailure(InetAddressAndPort from,
RequestFailure failure)
if (blockFor() + n > candidateReplicaCount())
signal();
- if (hintOnFailure != null &&
StorageProxy.shouldHint(replicaPlan.lookup(from)))
- StorageProxy.submitHint(hintOnFailure.get(),
replicaPlan.lookup(from), null);
+ // If the failure was RETRY_ON_DIFFERENT_TRANSACTION_SYSTEM then we
only want to hint once
Review Comment:
TBH I am having trouble remembering why I went to the trouble of doing this
here. The purpose of this code was mostly to reduce the number of redundant
Accord transactions created by the coordinator not to change the behavior of
whether a hint is created or not.
This doesn't mean the entire request failed with RETRY_ON_DIFFERENT_SYSTEM
it just means we got one response of such and it could still succeed or fail at
other nodes for other reasons, succeed overall entirely, or write regular hints
for some nodes that return other error responses.
Part of the reason hints are written here is that the callback may return
before timeouts/responses come from all nodes so you can't rely on the callback
completion to then figure out which hints to write. I believe this was the race
condition I ran into.
The direction of the migration here isn't guaranteed. It could be that it
was rejected during a migration away from Accord because the node that we asked
to mutate thought it should have been on Accord. If we don't hint then
something like ANY doesn't do the right thing.
With Accord ANY isn't really honored at the moment so you get an error
instead of a hint written at the coordinator.
That's a very long way of saying I didn't think too hard here to try and
prove it was safe to not hint since it should be a rare occurrence and I
figured that hinting was better than not hinting.
It's possible to optimize it a little more by having the Accord hint written
only if the overall outcome is retry on different system, but I didn't think
this was something worth optimizing much since it is only an issue during a
very brief window where nodes disagree on the correct system to use along with
a concurrent write to that range.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]