Wenzhe Zhou has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19959 )
Change subject: IMPALA-12150: Use protocol version to isolate cluster components ...................................................................... IMPALA-12150: Use protocol version to isolate cluster components Some Thrift request/response structs in CatalogService were changed to add new variables in the middle, which caused cross version incompatibility issue for CatalogService. Impala cluster membership is managed by the statestore. During upgrade scenarios where different versions of Impala daemons are upgraded one at a time, the upgraded daemons have incompatible message formats. Even protocol versions numbers were already defined for Statestore and Catalog Services, but they are not used. The Statestore and Catalog server don't check the protocol version in the requests now so that incompatible Impala daemons could join one cluster. This causes unexpected query failures during rolling upgrade. We need a way to detect this and enforce that some rules are followed: - Statestore refuses the registration requests from incompatible subscrubers. - Catalog server refuses the requests from from incompatible clients. - Scheduler assigns tasks to a group of compatible executors. This patch isolate Impala daemons into separate clusters based on protocol versions of Statestore service to avoid incompatible Impala daemons to communicate each other. It covers the Thrift RPC communications between catalogd and coordinators, and communication between statestore and its subscribers (executor, coordinators, catalogd and admissiond). This change should work for future upgrade. Following changes were made: - Bump StatestoreServiceVersion and CatalogServiceVersion to V2 for all requests of Statestore and Catalog services. - Update the request and response structs in CatalogService to ensure each Thrift request struct has protocol version and each Thrift response struct has returned status. - Update the request and response struct in StatestoreService to ensure each Thrift request struct has protocol version and each Thrift response struct has returned status. - Add subscriber type so that statestore could distinguish different types of subscribers. - Statestore checks protocol version for registration requests from subscribers. It refuses the requests with incompatible version. - Catalog server checks protocol version for Catalog service APIs, and returns error for requests with incompatible version. - Catalog daemon sends its address and the protocol version of Catalog service when it registers to statestore, statestore forwards the address and the protocol version of Catalog service to all subscribers during registration. - Add UpdateCatalogd API for Statestore service so that the coordinators could receive the address and the protocol version of Catalog service from statestore if the coordinators register to statestore before catalog daemon. CatalogServiceVersion is defined in CatalogService.thrift. In future, if we make non back version compatible changes in the request or response structures for CatalogService APIs, we need to bump the protocol version of Catalog service. StatestoreServiceVersion is defined in StatestoreService.thrift. Similarly if we make non back version compatible changes in the request or response structures for StatestoreService APIs, we need to bump the protocol version of Statestore service. Message formats for KRPC communications between coordinators and executors, and between admissiond and coordinators are defined in proto files under common/protobuf. If we make non back version compatible changes in these structures, we need to bump the protocol version of Statestore service. Testing: - Added end-to-end unit tests. - Passed the core tests. - Ran manual test to verify old version of executors cannot register with new version of statestore, and new version of executors cannot register with old version of statestore. Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/exec/catalog-op-executor.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/admissiond-env.cc M be/src/service/client-request-state.cc A be/src/statestore/statestore-subscriber-catalog.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogService.thrift M common/thrift/StatestoreService.thrift M common/thrift/generate_error_codes.py M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/catalog/ImpaladTableUsageTracker.java M tests/catalog_service/test_catalog_service_client.py M tests/custom_cluster/test_custom_statestore.py M tests/statestore/test_statestore.py 22 files changed, 1,307 insertions(+), 352 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/19959/7 -- To view, visit http://gerrit.cloudera.org:8080/19959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc Gerrit-Change-Number: 19959 Gerrit-PatchSet: 7 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
