[GitHub] [cassandra-accord] dcapwell commented on a diff in pull request #56: Remote exceptions

via GitHub Thu, 17 Aug 2023 14:35:23 -0700


dcapwell commented on code in PR #56:
URL: https://github.com/apache/cassandra-accord/pull/56#discussion_r1297775035



##########
accord-core/src/main/java/accord/messages/ReadData.java:
##########
@@ -149,14 +149,16 @@ void read(SafeCommandStore safeStore, Timestamp 
executeAt, PartialTxn txn)
         Ranges unavailable = safeStore.ranges().unsafeToReadAt(executeAt);
 
         txn.read(safeStore, executeAt).begin((next, throwable) -> {
-            if (throwable != null)
+            // TODO (expected, exceptions): should send exception to client, 
and consistency handle/propagate locally
+            logger.trace("{}: read failed for {}: {}", txnId, unsafeStore, 
throwable);
+            synchronized (ReadData.this)
             {
-                // TODO (expected, exceptions): should send exception to 
client, and consistency handle/propagate locally
-                logger.trace("{}: read failed for {}: {}", txnId, unsafeStore, 
throwable);
-                node.reply(replyTo, replyContext, ReadNack.Error);
+                if (fail == null)
+                    fail = throwable;
+                else
+                    fail.addSuppressed(throwable);
             }
-            else
-                readComplete(unsafeStore, next, unavailable);
+            readComplete(unsafeStore, next, unavailable);

Review Comment:
   before this patch this is what I see
   
   * accord.messages.ReadData#read would reply `Error` on the first exception 
and not call `readComplete`.  This would not update `waitingOn`, but more 
importantly it would not call `ack`; which means `if (-1 == --waitingOnCount)` 
will never be true!
   * `accord.impl.AbstractFetchCoordinator.FetchRequest#readComplete` makes the 
read 2 operations:`txn.read()` and `store.maxAppliedFor(...).onSuccess(a -> 
{this.maxApplied = a; super.readComplete(...)})` (I overly simplified).  Even 
if a fetch had 2 reads (C* only has 1 atm), then we would still not see the ACK 
so would never complete the operation....
   * `accord.messages.ReadTxnData#readComplete` documents that it should 
unregister its listener... but because we call `cancel` we will schedule 
removal of the listener anyways... so thats prob fine.
   
   > I am not married to that decision though. Sending back the first error is 
fine.
   
   I guess I would ask, what will the coordinator do differently with a list of 
failures?  If a single one exists we will do the same as if 100 existed...  so 
by only sending the first we have a smaller msg payload and can start handling 
the exception faster!  We could detect this case and just log locally that 
these exceptions are getting ignored....
   
   So, in short, I am in favor of failing fast and not pushing handling 
exceptions properly to each impl.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [cassandra-accord] dcapwell commented on a diff in pull request #56: Remote exceptions

Reply via email to