Wenzhe Zhou has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/19959 )

Change subject: IMPALA-12150: Use protocol version to isolate cluster components
......................................................................

IMPALA-12150: Use protocol version to isolate cluster components

Some Thrift request/response structs in CatalogService were changed to
add new variables in the middle, which caused cross version
incompatibility issue for CatalogService.

Impala cluster membership is managed by the statestore. During upgrade
scenarios where different versions of Impala daemons are upgraded one
at a time, the upgraded daemons have incompatible message formats.
Even protocol versions numbers were already defined for Statestore and
Catalog Services, but they are not used. The Statestore and Catalog
server don't check the protocol version in the requests now so that
incompatible Impala daemons could join one cluster. This causes
unexpected query failures during rolling upgrade.

We need a way to detect this and enforce that some rules are followed:
 - Statestore refuses the registration requests from incompatible
   subscrubers.
 - Catalog server refuses the requests from from incompatible clients.
 - Scheduler assigns tasks to a group of compatible executors.

This patch isolate Impala daemons into separate clusters based on
protocol versions of Statestore service to avoid incompatible Impala
daemons to communicate each other. It covers the Thrift RPC
communications between catalogd and coordinators, and communication
between statestore and its subscribers (executor, coordinators,
catalogd and admissiond). This change should work for future upgrade.

Following changes were made:
 - Bump StatestoreServiceVersion and CatalogServiceVersion to V2 for
   all requests of Statestore and Catalog services.
 - Update the request and response structs in CatalogService to ensure
   each Thrift request struct has protocol version and each Thrift
   response struct has returned status.
 - Update the request and response struct in StatestoreService to
   ensure each Thrift request struct has protocol version and each
   Thrift response struct has returned status.
 - Add subscriber type so that statestore could distinguish different
   types of subscribers.
 - Statestore checks protocol version for registration requests from
   subscribers. It refuses the requests with incompatible version.
 - Catalog server checks protocol version for Catalog service APIs, and
   returns error for requests with incompatible version.
 - Catalog daemon sends its address and the protocol version of Catalog
   service when it registers to statestore, statestore forwards the
   address and the protocol version of Catalog service to all
   subscribers during registration.
 - Add UpdateCatalogd API for Statestore service so that the
   coordinators could receive the address and the protocol version of
   Catalog service from statestore if the coordinators register to
   statestore before catalog daemon.

CatalogServiceVersion is defined in CatalogService.thrift. In future,
if we make non back version compatible changes in the request or
response structures for CatalogService APIs, we need to bump the
protocol version of Catalog service.
StatestoreServiceVersion is defined in StatestoreService.thrift.
Similarly if we make non back version compatible changes in the
request or response structures for StatestoreService APIs, we need
to bump the protocol version of Statestore service.

Message formats for KRPC communications between coordinators and
executors, and between admissiond and coordinators are defined
in proto files under common/protobuf. If we make non back version
compatible changes in these structures, we need to bump the
protocol version of Statestore service.

Testing:
 - Added end-to-end unit tests.
 - Passed the core tests.
 - Ran manual test to verify old version of executors cannot register
   with new version of statestore, and new version of executors cannot
   register with old version of statestore.

Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/admissiond-env.cc
M be/src/service/client-request-state.cc
A be/src/statestore/statestore-subscriber-catalog.h
M be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogService.thrift
M common/thrift/StatestoreService.thrift
M common/thrift/generate_error_codes.py
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/catalog/ImpaladTableUsageTracker.java
M tests/catalog_service/test_catalog_service_client.py
M tests/custom_cluster/test_custom_statestore.py
M tests/statestore/test_statestore.py
22 files changed, 1,307 insertions(+), 352 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/19959/7
--
To view, visit http://gerrit.cloudera.org:8080/19959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc
Gerrit-Change-Number: 19959
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>

Reply via email to