dcapwell commented on code in PR #57:
URL: https://github.com/apache/cassandra-accord/pull/57#discussion_r1301993877


##########
accord-core/src/main/java/accord/impl/AbstractConfigurationService.java:
##########
@@ -269,6 +269,13 @@ public synchronized void reportTopology(Topology topology, 
boolean startSync)
         }
 
         long lastAcked = epochs.lastAcknowledged;
+        // TODO (now, review): lastAcked == 0, lastReceived = 2
+        // if we wait for epoch=1.acknowledge the test seems to wait 
forever... looks like burn test doesn't ack epoch=1
+        if (lastAcked == 0 && lastReceived > 0)

Review Comment:
   I don't remember a sha/seed, would have to splunk through slack to see if I 
could find that.  It happened very often with this patch without this change 
though, not sure if other changes made it less helpful or not though.
   
   So, the history was the following
   
   ```
   T1: burn test starts with epoch=1, but does not setup the ack; this was a 
difference with C* which would ack
   T2: epoch=2, but ack is slow due to timeouts/failures
   T3: epoch=3, ack is slightly faster than epoch=2, so it wins the race and 
fails
   ```
   
   I made a change to `start` so we ack the first topology, so if you call 
start (which we don't in C*) then `lastAcked !=0` and you wouldn't hit this 
issue.  In the C* case we stored the state locally and on reboot we delay ack 
until we can replay the events again, so it could be possible to hit this there 
if the oldest topology had not seen all replies yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to