Hello Thomas Tauber-Marshall, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14824

to look at the new patch set (#4).

Change subject: IMPALA-9199: Add support for single query retries on cluster 
membership changes
......................................................................

IMPALA-9199: Add support for single query retries on cluster membership changes

Adds the core logic for transparently retrying queries that fail due to
cluster membership changes.

Query retries are triggered if (1) a node has been removed from the
cluster membership by a statestore update (rather than cancelling all
queries running on the leaving node, queries are retried), and (2) if a
query fails and as a result, blacklists a node. Both events are
considered cluster membership changes as they affect what nodes a query
will be scheduled on.

Implementation:
* Query retries are driven by a dedicated threadpool
    * ImpalaServer::RetryQueryFromThreadPool implements the core logic to
      actually retry a failed query.
* When a query is retried, the original query is cancelled, the new
  query is created, registered, and started, and then the original query
  is closed
* A query cannot be retried once any results from the original query
  have been fetched, this is to prevent users from seeing incorrect results

Features:
* Retries are transparent to the user
    * This is achieved by adding a mapping from failed query ids to the
      query id of the retried query
    * ImpalaServer uses this mapping in GetClientFacingRequestState
      which is used to differentiate between "client facing" requests
      for a ClientRequestState vs. internal requrets for a CRS
* Users can tell if a query is retried using runtime profiles and the
  Impala Web UI
    * "Impala Query Status" is a new field in runtime profiles that
      displays the ClientRequestState execution state (which includes
      the RETRYING and RETRIED states)
    * The Impala Web UI will list all retried queries as being in the
      "RETRIED" state
* Retried queries skip all fe/ planning, authorization, etc.
* This feature is configurable ('retry_failed_queries') and is off by
  default

Refactoring:
* Changes the ClientRequestState so that it can take in an existing
  TExecRequest
    * This is required when retrying queries because the
      TExecRequest of the failed query is copied and used for the
      ClientRequestState of the retried query
* ClientRequestState::ExecState is extended with three new states:
  RETRYING, RETRIED, and UNKNOWN.

Testing:
* Added integration tests in test_query_retries.py, these tests
  consistently pass when run locally
* Ran exhaustive tests
* Ran exhaustive tests with 'retry_failed_queries' set to true, no
  unexpected failures

TODO:
* There are some failed tests I am working through
* Additional re-factoring / code cleanup
* Lots more documentation

Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state-map.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/service/retry-work.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
A tests/custom_cluster/test_query_retries.py
16 files changed, 811 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/14824/4
--
To view, visit http://gerrit.cloudera.org:8080/14824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
Gerrit-Change-Number: 14824
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>

Reply via email to