Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20689
Change subject: IMPALA-12550: Fix flaky test test_statestored_auto_failover_with_disabling_network ...................................................................... IMPALA-12550: Fix flaky test test_statestored_auto_failover_with_disabling_network Test test_statestored_auto_failover_with_disabling_network failed occasionally due to delay of HA Handshake or HA heartbeat RPCs between two statestore instances. Sometimes the active statestore took a few minutes to response the handshake requests from standby statestore. This patch fixs the issue by not holding mutex ha_lock_ when sending HA handshake and HA heartbeat. Redundant HA heartbeats are handled on receiver side. Redundant HA handshakes are harmless. Testing: - Repeatedly ran test_statestored_auto_failover_with_disabling_network on Jenkins for hundreds of times without failure. - Repeatedly ran test_statestored_auto_failover_with_disabling_network on local machine for thousand times without failure. - Repeatedly ran all tests in test_statestored_ha.py for over 12 hours on Jenkins without failure. - Passed core tests. Change-Id: I515bbaaddfb4bf9bd2a39414cd6e3e4590dfbfb1 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M tests/custom_cluster/test_statestored_ha.py 4 files changed, 71 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/20689/2 -- To view, visit http://gerrit.cloudera.org:8080/20689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I515bbaaddfb4bf9bd2a39414cd6e3e4590dfbfb1 Gerrit-Change-Number: 20689 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou <[email protected]>
