Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14824 )
Change subject: IMPALA-9199: Add support for single query retries on cluster membership changes ...................................................................... Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/coordinator.cc@502 PS9, Line 502: RETURN_IF_ERROR(HandleFailedExecRpc(backend_state)); For this loop, do we have any expectation about the number of BackendStates that have errors? In other words, can more than one have errors? If there can be more than one error, we would call TryQueryRetry more than once, but based on my reading of that function, it can handle that. Am I right? Is there any concern that if there is more than one error, we may want to blacklist more than one Impalad before retrying? http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/query-driver.h File be/src/runtime/query-driver.h: http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/query-driver.h@116 PS9, Line 116: a : /// statestore updates I think this should be singular "a statestore update" http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/query-driver.h@122 PS9, Line 122: Status TryQueryRetry(ClientRequestState* client_request_state, Status* status, : bool* was_retried = nullptr) WARN_UNUSED_RESULT; >From reading through TryQueryRetry(), it looks like this is safe to call >multiple times and it will only do the retry once. Is that true? Can we add a >comment here that spells out that it can be called more than once? http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-hs2-server.cc@763 PS9, Line 763: ClientRequestState* request_state = nullptr; > So I'm wondering if its safe to no longer have any shared_ptr here and in t I had the same concern. http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.h@673 PS9, Line 673: ClientRequestState*& Stylistically, I personally don't like mutable references for out arguments. Using ** instead makes it much clearer that values are being passed between functions. Callers need to pass addresses. The function itself can't treat it like a local variable. http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.cc@1453 PS9, Line 1453: ClientRequestState::RetryState retry_state; Nit: you could move this declaration down past the first BlockOnWait() http://gerrit.cloudera.org:8080/#/c/14824/9/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/14824/9/tests/custom_cluster/test_query_retries.py@93 PS9, Line 93: Increase the statestore : heartbeat frequency so that the query actually fails during execution. It looks like we are using the default value for statestore_heartbeat_frequency_ms. Is that intentional? If I understand this correctly, this test uses the heavy shuffle query and should see an RPC failure. We are not expecting the statestore to see the Impalad as dead. Is that right? -- To view, visit http://gerrit.cloudera.org:8080/14824 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd Gerrit-Change-Number: 14824 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Mon, 27 Apr 2020 17:48:11 +0000 Gerrit-HasComments: Yes
