Hello Thomas Tauber-Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14824 to look at the new patch set (#5). Change subject: IMPALA-9199: Add support for single query retries on cluster membership changes ...................................................................... IMPALA-9199: Add support for single query retries on cluster membership changes Adds the core logic for transparently retrying queries that fail due to cluster membership changes. Query retries are triggered if (1) a node has been removed from the cluster membership by a statestore update (rather than cancelling all queries running on the leaving node, queries are retried), and (2) if a query fails and as a result, blacklists a node. Both events are considered cluster membership changes as they affect what nodes a query will be scheduled on. Implementation: * Query retries are driven by a dedicated threadpool * ImpalaServer::RetryQueryFromThreadPool implements the core logic to actually retry a failed query. * When a query is retried, the original query is cancelled, the new query is created, registered, and started, and then the original query is closed * A query cannot be retried once any results from the original query have been fetched, this is to prevent users from seeing incorrect results Features: * Retries are transparent to the user * This is achieved by adding a mapping from failed query ids to the query id of the retried query * ImpalaServer uses this mapping in GetClientFacingRequestState which is used to differentiate between "client facing" requests for a ClientRequestState vs. internal requrets for a CRS * Users can tell if a query is retried using runtime profiles and the Impala Web UI * "Impala Query Status" is a new field in runtime profiles that displays the ClientRequestState execution state (which includes the RETRYING and RETRIED states) * The Impala Web UI will list all retried queries as being in the "RETRIED" state * Retried queries skip all fe/ planning, authorization, etc. * This feature is configurable ('retry_failed_queries') and is off by default Refactoring: * Changes the ClientRequestState so that it can take in an existing TExecRequest * This is required when retrying queries because the TExecRequest of the failed query is copied and used for the ClientRequestState of the retried query * ClientRequestState::ExecState is extended with three new states: RETRYING, RETRIED, and UNKNOWN. Testing: * Added integration tests in test_query_retries.py, these tests consistently pass when run locally * Ran exhaustive tests * Ran exhaustive tests with 'retry_failed_queries' set to true, no unexpected failures TODO: * There are some failed tests I am working through * Additional re-factoring / code cleanup * Lots more documentation Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/client-request-state-map.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-options.cc M be/src/service/query-options.h A be/src/service/retry-work.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift A tests/custom_cluster/test_query_retries.py 16 files changed, 811 insertions(+), 166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/14824/5 -- To view, visit http://gerrit.cloudera.org:8080/14824 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd Gerrit-Change-Number: 14824 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>