Todd Lipcon has posted comments on this change.

Change subject: [flaky tests] Increase the result ttl for 
ExactlyOnceRpcTest.TestExactlyOnceSemanticsGarbageCollectionStressTest
......................................................................


Patch Set 1:

I don't think this is exactly the problem. Instead, I think the issue is the 
following interleaving:

- the LongWritesThread generates sequence number 1 (from the result tracker)
- the StubbornWriteThreads generates a sequence number 2 (from the result 
tracker)
- the StubbornWritesThread manually sets first_incomplete_seq_no in its RPC to 
2 (its own sequence number)
- the StubbornWritesThread sends an RPC, which sets the server side "stale" 
threshold to 2
- the LongWritesThread continues running and sends sequence number 1, which is 
now marked as stale, and fails.

If you add a usleep(5000) at the top of CalculatureServiceRpc::Try() you'll 
trigger this interleaving regularly even with your patch.

I think the real fix here is that the AddRequestId() call should be using the 
request_tracker's view of the first_incomplete_seq_no rather than setting it 
equal to the request's sequence number.

-- 
To view, visit http://gerrit.cloudera.org:8080/4644
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1257ddaf02dfca05829edc904b2f4ff406b933ca
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: No

Reply via email to