dcapwell commented on code in PR #57:
URL: https://github.com/apache/cassandra-accord/pull/57#discussion_r1301993877
##########
accord-core/src/main/java/accord/impl/AbstractConfigurationService.java:
##########
@@ -269,6 +269,13 @@ public synchronized void reportTopology(Topology topology,
boolean startSync)
}
long lastAcked = epochs.lastAcknowledged;
+ // TODO (now, review): lastAcked == 0, lastReceived = 2
+ // if we wait for epoch=1.acknowledge the test seems to wait
forever... looks like burn test doesn't ack epoch=1
+ if (lastAcked == 0 && lastReceived > 0)
Review Comment:
I don't remember a sha/seed, would have to splunk through slack to see if I
could find that. It happened very often with this patch without this change
though, not sure if other changes made it less helpful or not though.
So, the history was the following
```
T1: burn test starts with epoch=1, but does not setup the ack; this was a
difference with C* which would ack
T2: epoch=2, but ack is slow due to timeouts/failures
T3: epoch=3, ack is slightly faster than epoch=2, so it wins the race and
fails
```
I made a change to `start` so we ack the first topology, so if you call
start (which we don't in C*) then `lastAcked !=0` and you wouldn't hit this
issue. In the C* case we stored the state locally and on reboot we delay ack
until we can replay the events again, so it could be possible to hit this there
if the oldest topology had not seen all replies yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]