Alexey Serbin has posted comments on this change. (
Change subject: [TestWorkload] an option to retry on read timeouts
Patch Set 3:
> I'm not 100% following this. Reads should already have internal
> retries since they're configured to be fault-tolerant, right? so
> what's the purpose of retrying on a read timeout rather than just
> setting the timeout to be longer so that the client's internal
> retries have a longer budget? Is our issue that our internal client
> doesn't do a good job? Maybe we should fix that, considering we
> rely on this behavior for fault tolerance on long-running queries.
In the context of TestWorkload, it's the same story as with write operations --
if a timeout happens with a write operation, TestWorkload can ignore that
instead of crashing and retry. That's how it's implemented now, right? So,
the idea here is to do the same with timeouts for scan/read operations.
While running the raft_consensus_stress-itest in TSAN configuration, I noticed
that scans just time out while waiting for the server to advance its safe time
(they might be paused or it might be some furious re-election activity). So,
the idea is to avoid crashing in this case and to make the TestWorkload
behaving the same way as with write operations when 'write_timeout_allowed_' is
Yes, I could set the read timeout to a huge value, but that would mean the test
would attempt to do just a few scans and will be stuck with that for the rest
of the time. However, I think I want it to attempt scans/reads from as many
servers as possible, thinking that it could trigger some other bugs to manifest
I'm not sure we need to address that from the client's side as is.
Another alternative would be to disable the reader thread for TestWorkload in
that stress test.
To view, visit http://gerrit.cloudera.org:8080/9295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Tue, 13 Feb 2018 22:12:08 +0000