Hello Adar Dembo,
I'd like you to do a code review. Please visit
to review the following change.
Change subject: WIP: KUDU-1466: improve error message when writes fail at TS
WIP: KUDU-1466: improve error message when writes fail at TS
Currently, when we hit certain types of tablet server errors,
we fall back to re-requesting locations from the master. If
the timing of the errors lines up right, the last request to the
master may have a very short time out, in which case we will
misreport the write as failing due to a timeout on the
GetTableLocations() RPC, rather than due to the actual error on the
Injecting a bit of latency into GetTableLocations() reproduces the
issue reliably in ClientTest.TestFailedDnsResolution which is
already quite flaky in TSAN builds due to this issue.
This is a WIP patch as one potential way to solve it -- have the
location picker keep track of the "best" error seen so far. But,
perhaps it's actually better for this to be done in the retriable RPC.
Posting in order to get some comments on the best approach.
4 files changed, 31 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/3326/1
To view, visit http://gerrit.cloudera.org:8080/3326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>