Hello David Ribeiro Alves,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/5358
to review the following change.
Change subject: exactly_once_rpc-test: fix gc stress test flakiness
......................................................................
exactly_once_rpc-test: fix gc stress test flakiness
This test involves two threads:
1) the 'stubborn writer' thread retries a request with the same sequence
number over and over. It expects that eventually the cached result will
go stale, and then later that the client will be entirely GCed and thus
the request will start to succeed again.
2) the 'long write' thread, which uses the normal RetriableRpc mechanism
to send requests, each with increasing sequence numbers. We expect that,
since each of these requests is a new one, and isn't retried once it's
successful, we won't see any 'stale' responses.
The test was flaky, however, because the 'stubborn writer' thread was
always sending its own sequence number as the last_incomplete sequence
number, and we also didn't ensure that it started before the 'long
write' thread. Given that, it was possible to have this interleaving:
1) start the 'long write' thread, which is assigned seq number 1
2) before the write is sent, the 'stubborn writer' thread assigns
itself seq number 2, and sends a request indicating last_incomplete=2.
3) when the 'long write' thread sends its request, it immediately gets
a 'stale' response, causing a test failure.
One fix would have been to make the 'stubborn writer' thread send the
first_incomplete calculated by the RequestTracker. However, that would
have involved modifying a bunch of other tests to properly update the
RequestTracker.
So instead this test takes the approach of assigning the 'stubborn
writer's sequence number before starting the 'long writer' thread. This
ensures that the 'stubborn writer' won't explicitly GC any request made
by the 'long writer'.
With the patch, I looped this test 500 times and it passed[1]. Without
the patch, it failed 64/500[2].
[1] http://dist-test.cloudera.org//job?job_id=todd.1480926593.3793
[2] http://dist-test.cloudera.org//job?job_id=todd.1480926999.4126
Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
---
M src/kudu/rpc/exactly_once_rpc-test.cc
1 file changed, 23 insertions(+), 12 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/5358/1
--
To view, visit http://gerrit.cloudera.org:8080/5358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>