Sahil Takiar has uploaded this change for review. (
http://gerrit.cloudera.org:8080/14824
Change subject: IMPALA-9124 [POC][WIP]: Transparently retry queries that fail
due to cluster membership changes
......................................................................
IMPALA-9124 [POC][WIP]: Transparently retry queries that fail due to cluster
membership changes
Adds the core logic for transparently retrying queries that fail due to
cluster membership changes.
The design of this feature is described in the JIRA, but the TL;DR is
that whenever a ClientRequestState receives an error Status for a query,
it checks if the Status is "Retryable". If it is, then it schedules a
retry of the query using a "query retry" threadpool.
This feature requires touching several parts of the query lifecycle /
Coordinator code:
* The feature is configurable ('retry_failed_queries') and is off by
default.
* It modifies Status objects so they can be classified as "Retryable"
errors. The Coordinator then just retries any Status that is marked as
"Retryable". This is modelled using a new field in TStatus called
TStatusProperties.
* Changes the ClientRequestState so that it can take in an existing
TExecRequest (this is required when retrying queries because the
TExecRequest of the failed query is copied and used for the
ClientRequestState of the retried query).
* ClientRequestState::UpdateQueryStatus is modified such that if it
receives a "Retryable" Status, it schedules a retry of the query.
* ClientRequestState::ExecState is extended with three new states:
RETRYING, RETRIED, and UNKNOWN.
* ImpalaServer::RetryQueryFromThreadPool implements the core logic to
actually retry a failed query.
Additional Notes:
* Retries are transparent to the user. This is achieved by registering the
query id of the failed query with the ClientRequestState of the retried
query. This requires modifying the ImpalaServer
client_request_state_map_ so that different query ids can correspond to
the same ClientRequestState.
This patch is based on three currently in progress changes:
* https://gerrit.cloudera.org/#/c/14677/
* https://gerrit.cloudera.org/#/c/14744/
* https://gerrit.cloudera.org/#/c/14755/
Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
---
M be/src/common/status.cc
M be/src/common/status.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/service/client-request-state-map.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/service/retry-work.h
M be/src/util/container-util.h
M be/src/util/error-util.h
M common/protobuf/common.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Status.thrift
M common/thrift/generate_error_codes.py
A tests/custom_cluster/test_query_retries.py
24 files changed, 734 insertions(+), 129 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/14824/1
--
To view, visit http://gerrit.cloudera.org:8080/14824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
Gerrit-Change-Number: 14824
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>