Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22133 )
Change subject: KUDU-3571: fix flakiness in AutoIncrementingItest.BootstrapNoWalsNoData ...................................................................... Patch Set 7: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/22133/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22133/1//COMMIT_MSG@10 PS1, Line 10: not been initialized or inab > Thank you all for reviewing! Alright, this looks like a reliable solution to me. Thank you! http://gerrit.cloudera.org:8080/#/c/22133/1//COMMIT_MSG@10 PS1, Line 10: not been initialized or inab > IIUC, leader does send updates right away but on peers, the update may take > time upto FLAGS_raft_heartbeat_interval_ms duration. An update is first stored into the follower replica's WAL, and only after that it's 'prepared' and 'applied'. Only after completion of all these phases, the update is visible to a Kudu client that reads data from the follower replica. Upon the completion of storing the operation in the WAL the replica acks the operation to the leader replica with corresponding response to the original RPC, and that's how the leader replica knows the follower has persisted the data. It might take much longer than Raft heartbeat interval for the acknowledgment to arrive to the leader replica, of course. There isn't an upper limit there, except for the overall timeout for the Raft consensus RPC. As for 'prepare' and 'apply' phases, those are separate phases, and they might take a long time as well under certain conditions, especially if the apply queue is very long (e.g., see KUDU-1587 for anecdotal evidence apply queue wait times). http://gerrit.cloudera.org:8080/#/c/22133/1//COMMIT_MSG@10 PS1, Line 10: not been initialized or inab > It helps to catch any regressions - for example at any point leader would > have flushed down data to disk with a higher probability than the followers, > in such cases where we have to populate the counter where few replicas have > flushed the data and few replicas did not, we should not expect different > values for the counter. Alright, but isn't it covered by many existing tests for tablet Raft consensus already? OK, it seems CheckCluster() for ClusterVerifier should take care of this anyway as of PS7. -- To view, visit http://gerrit.cloudera.org:8080/22133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5bd387c82b632dbb77aa5a45f831273392ae05b4 Gerrit-Change-Number: 22133 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Thu, 19 Dec 2024 03:10:50 +0000 Gerrit-HasComments: Yes
