Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20657 )

Change subject: IMPALA-12525: Fix flaky test test_statestored_manual_failover
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/20657/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20657/4//COMMIT_MSG@14
PS4, Line 14: all subscribers re-registered
nit: some subscribers to not receive


http://gerrit.cloudera.org:8080/#/c/20657/4/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/20657/4/be/src/statestore/statestore-subscriber.cc@387
PS4, Line 387: *active_statestore_conn_state = 
statestore2_->GetStatestoreConnState();
This patch remove the condition that the heartbeating statestored must not be 
an active one if request_active_conn_state is True.

Should it log WARNING if the heartbeating statestored is still known as an 
active one?


http://gerrit.cloudera.org:8080/#/c/20657/4/be/src/statestore/statestore-subscriber.cc@377
PS4, Line 377:   if (statestore_->IsMatchingStatestoreId(statestore_id) || 
statestore2_ == nullptr) {
             :     // Due to race, subscriber could receive heartbeat before 
receiving registration
             :     // response. "statestore2_ == nullptr" means HA is not 
enabled. In that case,
             :     // we still handle the heartbeat request from statestore 
even registration response
             :     // is not received.
             :     statestore_->Heartbeat(registration_id);
             :     // Report connection state with active statestore instance 
for the request from
             :     // standby statestore. It's possible that the notification 
of active statestored
             :     // change has not been received.
             :     if (request_active_conn_state && statestore2_ != nullptr) {
             :       *active_statestore_conn_state = 
statestore2_->GetStatestoreConnState();
             :     }
             :   } else if (statestore2_ != nullptr
             :       && statestore2_->IsMatchingStatestoreId(statestore_id)) {
             :     statestore2_->Heartbeat(registration_id);
             :     // Report connection state with active statestore instance 
for the request from
             :     // standby statestore. It's possible that the notification 
of active statestored
             :     // change has not been received.
             :     if (request_active_conn_state) {
             :       *active_statestore_conn_state = 
statestore_->GetStatestoreConnState();
             :     }
nit: This part of code confuse me a bit. I think it will be clearer to have 
local pointer variable that clarify the heartbeating statestored vs the other 
one (maybe, heartbeating_statestored & other_statestored pointer).


http://gerrit.cloudera.org:8080/#/c/20657/4/be/src/statestore/statestore-subscriber.cc@445
PS4, Line 445: active_statestore = GetActiveStatestore();
             :     active_statestore->SetStatestoreActive(is_active, 
active_statestored_version);
             :     standby_statestore = GetStandbyStatestore();
             :     standby_statestore->SetStatestoreActive(!is_active, 
active_statestored_version);
Why this active/standby set/switch does not hold statestore_ha_lock_? Are they 
need to be atomic or not?


http://gerrit.cloudera.org:8080/#/c/20657/4/be/src/statestore/statestore-subscriber.cc@461
PS4, Line 461:         // Active/standby statestored are not set. This could 
happen if statestoreds were
             :         // started after subscribers' registration attemption.
Add WARNING log for this case.


http://gerrit.cloudera.org:8080/#/c/20657/4/tests/custom_cluster/test_statestored_ha.py
File tests/custom_cluster/test_statestored_ha.py:

http://gerrit.cloudera.org:8080/#/c/20657/4/tests/custom_cluster/test_statestored_ha.py@681
PS4, Line 681: execute_query
nit: Maybe use execute_query_expect_success() here?



--
To view, visit http://gerrit.cloudera.org:8080/20657
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If03bf09d22a2875d2c1eec8a4f62eeefc5d855dc
Gerrit-Change-Number: 20657
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 07 Nov 2023 23:37:52 +0000
Gerrit-HasComments: Yes

Reply via email to