Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20372 )
Change subject: IMPALA-12156: Support High Availability for Statestore ...................................................................... Patch Set 13: (19 comments) http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore-subscriber.cc File be/src/statestore/statestore-subscriber.cc: http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore-subscriber.cc@178 PS11, Line 178: "statestored (protocol version $0), the request is ignored.", > added log messages Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore-subscriber.cc@326 PS11, Line 326: return statestore_->Start(); > If FLAGS_tolerate_statestore_startup_delay is enabled, subscriber enter rec Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore-subscriber.cc@358 PS11, Line 358: IpAddr ip_address; > statestore_id is received in registration response. Due to race, subscriber Ack http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore-subscriber.cc@410 PS11, Line 410: } else { > done Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.h@137 PS11, Line 137: /// the subscribers. Active statestored also sends heartbeats to standby statestored. > done Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.h@140 PS11, Line 140: /// heartbeats from active statestored. > Will write warning message on standby statestored in this case. Ack http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.h@782 PS11, Line 782: > Updated comment to the purpose of this variable. Ack http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@654 PS11, Line 654: DCHECK(metrics != NULL); > Thanks to catch this. Least significant 64 bits of statestore_id is not use Oh, I actually missed the UUIDToTUniqueId above. But it still wasn't clear what setting lower bits was used for. http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1248 PS11, Line 1248: } else if (active_conn_states_[subscriber->id()] == TStatestoreConnState::FAILED) { > added DCHECK Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1250 PS11, Line 1250: || conn_state == TStatestoreConnState::UNKNOWN); > added DCHECK Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1251 PS11, Line 1251: --failed_conn_state_count_; > added DCHECK Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1573 PS11, Line 1573: // to all subscribers. > timeout is used for re-sending HA updates if there are subscribers which fa But it holds a unique_lock on ha_lock_. Won't that prevent MonitorStatestoredHaHeartbeat from doing anything while WaitFor is waiting? http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1654 PS11, Line 1654: request.__set_catalogd_version(active_catalogd_version); > updated as suggested Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1792 PS11, Line 1792: // Negotiate role for HA with peer statestore instance on startup. > Both designate themselves as active, and start to send heartbeats to each o Done http://gerrit.cloudera.org:8080/#/c/20372/11/be/src/statestore/statestore.cc@1963 PS11, Line 1963: // Check if standby statestored is reachable. > Don't have good plan to handle split-brain in network. Updated comments. An You added a warning below. I don't have any better ideas at the moment. http://gerrit.cloudera.org:8080/#/c/20372/11/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: http://gerrit.cloudera.org:8080/#/c/20372/11/common/thrift/StatestoreService.thrift@499 PS11, Line 499: // statestore should set itself as inactive if dst_statestore_active is set as true. > done Done http://gerrit.cloudera.org:8080/#/c/20372/11/common/thrift/StatestoreService.thrift@500 PS11, Line 500: // Otherwise, source statestore sets itself as active. > done Done http://gerrit.cloudera.org:8080/#/c/20372/11/tests/custom_cluster/test_statestored_ha.py File tests/custom_cluster/test_statestored_ha.py: http://gerrit.cloudera.org:8080/#/c/20372/11/tests/custom_cluster/test_statestored_ha.py@42 PS11, Line 42: > done Done http://gerrit.cloudera.org:8080/#/c/20372/11/tests/custom_cluster/test_statestored_ha.py@48 PS11, Line 48: request = Subscriber.TSetStatestoreDebugActionRequest( > removed Done -- To view, visit http://gerrit.cloudera.org:8080/20372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87 Gerrit-Change-Number: 20372 Gerrit-PatchSet: 13 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 27 Sep 2023 18:59:19 +0000 Gerrit-HasComments: Yes
