Adar Dembo has posted comments on this change.
Change subject: KUDU-1542: in some cases apply will hang.
Patch Set 4:
Let me make sure I understand what's happening here.
The problem: with AUTO_FLUSH_SYNC, apply() can loop forever if each retry
throws a PleaseThrottleException. It's not quite looping forever though; it's
just that there's nothing stopping this operation from hitting "bad luck" in
every call to apply() and getting repeatedly starved by other operations. Is
If that's accurate, I don't think this is the right approach to fixing the
problem. Instead, we should prioritize this operation (the one that saw a PTE)
such that after it waits on the PTE's deferred and retries, it cannot be
starved by any operation that comes afterward.
I'm also not sure why we'd want to place an upper bound on the number of retry
attempts. Ultimately, users only care about timeouts and deadlines, as they
correspond to SLAs. Put another way, an upper bound on retries doesn't
correspond to an upper bound on wall clock time. Calculating a deadline using
getTimeoutMillis() when entering apply() and using that as a determinant for
when to break out of the apply() loop seems better, though avoiding starvation
altogether using the above is probably ideal.
Nit: Since it's a primitive, initialize applyRetryTimes inline with the
private int applyRetryTimes = -1;
Line 91: LOG.error("Previous batch apply failed with Exception", e);
If we end up here, we'll return null out of the function, right? That doesn't
seem like the right behavior; we should throw some kind of exception since this
is an error case.
This new method needs Javadoc.
To view, visit http://gerrit.cloudera.org:8080/3787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: song bruce zhang <zsyuyizh...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: song bruce zhang <zsyuyizh...@gmail.com>