Todd Lipcon has posted comments on this change. Change subject: KUDU-2027 retry scan RPC if negotiation times out ......................................................................
Patch Set 14: (3 comments) didn't look in detail at the code yet, but got some higher level design questions here http://gerrit.cloudera.org:8080/#/c/7037/14//COMMIT_MSG Commit Message: PS14, Line 15: handling. Now, regardless of whether it was the RPC or the overall : operation timeout, the corresponding tablet server is marked as failed : and blacklisted. we don't currently invalidate this, right? Given that systems like Impala use a long-running client in the backend, it seems slightly dangerous that a single read timeout will "permanently" blacklist a server (at least until the other remaining replicas get blacklisted too). With writes this isn't that impactful, since we could never get "stuck" on a non-leader, and thus we'll always end up going back to the leader. But with reads I think it's quite plausible, no? http://gerrit.cloudera.org:8080/#/c/7037/14/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS14, Line 1991: // The scanner should timeout on the next batch not fetched by the Open(). : // The client should not retry until the overall timeout expires, giving up : // after the fist RPC timeout error: the only available replica is marked : // as failed by the first timeout error. I'm not really understanding this comment. Line 2021: ASSERT_LT(sw.elapsed().wall_millis(), kScanTimeoutMs); in the case that we know we won't be able to retry a scan elsewhere, shouldn't we set the timeout to be the full remaining user-specified timeout? i.e I think if the user specifies 60sec timeout, and we have a non-retriable operation like "nextbatch" on a scan, then we should use that entire 60sec timeout on the one server they gave. It's always possible that it will come back after 45sec or whatever, no? I think this is what the 'allow_time_for_failover' thing was doing before? -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
