Hello Thomas Tauber-Marshall, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14824
to look at the new patch set (#4).
Change subject: IMPALA-9199: Add support for single query retries on cluster
membership changes
......................................................................
IMPALA-9199: Add support for single query retries on cluster membership changes
Adds the core logic for transparently retrying queries that fail due to
cluster membership changes.
Query retries are triggered if (1) a node has been removed from the
cluster membership by a statestore update (rather than cancelling all
queries running on the leaving node, queries are retried), and (2) if a
query fails and as a result, blacklists a node. Both events are
considered cluster membership changes as they affect what nodes a query
will be scheduled on.
Implementation:
* Query retries are driven by a dedicated threadpool
* ImpalaServer::RetryQueryFromThreadPool implements the core logic to
actually retry a failed query.
* When a query is retried, the original query is cancelled, the new
query is created, registered, and started, and then the original query
is closed
* A query cannot be retried once any results from the original query
have been fetched, this is to prevent users from seeing incorrect results
Features:
* Retries are transparent to the user
* This is achieved by adding a mapping from failed query ids to the
query id of the retried query
* ImpalaServer uses this mapping in GetClientFacingRequestState
which is used to differentiate between "client facing" requests
for a ClientRequestState vs. internal requrets for a CRS
* Users can tell if a query is retried using runtime profiles and the
Impala Web UI
* "Impala Query Status" is a new field in runtime profiles that
displays the ClientRequestState execution state (which includes
the RETRYING and RETRIED states)
* The Impala Web UI will list all retried queries as being in the
"RETRIED" state
* Retried queries skip all fe/ planning, authorization, etc.
* This feature is configurable ('retry_failed_queries') and is off by
default
Refactoring:
* Changes the ClientRequestState so that it can take in an existing
TExecRequest
* This is required when retrying queries because the
TExecRequest of the failed query is copied and used for the
ClientRequestState of the retried query
* ClientRequestState::ExecState is extended with three new states:
RETRYING, RETRIED, and UNKNOWN.
Testing:
* Added integration tests in test_query_retries.py, these tests
consistently pass when run locally
* Ran exhaustive tests
* Ran exhaustive tests with 'retry_failed_queries' set to true, no
unexpected failures
TODO:
* There are some failed tests I am working through
* Additional re-factoring / code cleanup
* Lots more documentation
Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state-map.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/service/retry-work.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
A tests/custom_cluster/test_query_retries.py
16 files changed, 811 insertions(+), 166 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/14824/4
--
To view, visit http://gerrit.cloudera.org:8080/14824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
Gerrit-Change-Number: 14824
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>