Todd Lipcon has posted comments on this change. Change subject: [flaky tests] Increase the result ttl for ExactlyOnceRpcTest.TestExactlyOnceSemanticsGarbageCollectionStressTest ......................................................................
Patch Set 1: I don't think this is exactly the problem. Instead, I think the issue is the following interleaving: - the LongWritesThread generates sequence number 1 (from the result tracker) - the StubbornWriteThreads generates a sequence number 2 (from the result tracker) - the StubbornWritesThread manually sets first_incomplete_seq_no in its RPC to 2 (its own sequence number) - the StubbornWritesThread sends an RPC, which sets the server side "stale" threshold to 2 - the LongWritesThread continues running and sends sequence number 1, which is now marked as stale, and fails. If you add a usleep(5000) at the top of CalculatureServiceRpc::Try() you'll trigger this interleaving regularly even with your patch. I think the real fix here is that the AddRequestId() call should be using the request_tracker's view of the first_incomplete_seq_no rather than setting it equal to the request's sequence number. -- To view, visit http://gerrit.cloudera.org:8080/4644 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1257ddaf02dfca05829edc904b2f4ff406b933ca Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: No
