Yida Wu has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/23094 )
Change subject: IMPALA-12057: Track removed coordinators to reject queued queries early ...................................................................... IMPALA-12057: Track removed coordinators to reject queued queries early Queries in global admission control can remain queued for a long time if they are assigned to a coordinator that has already left the cluster. Admissiond can't distinguish between a coordinator that hasn’t yet been propagated via the statestore and one that has already been removed, resulting in unnecessary waiting until timeout. This timeout is determined by either FLAGS_queue_wait_timeout_ms or the queue_timeout_ms in the pool config. By default, FLAGS_queue_wait_timeout_ms is 1 minute, but in production it's normally configured to 10 to 15 minutes. This change tracks recently removed coordinators and rejects such queued queries immediately using REASON_COORDINATOR_REMOVED. To ensure the removed coordinator list remains simple and bounded, it avoids duplicate entries and enforces FIFO eviction at MAX_REMOVED_COORD_SIZE (1000). It's possible that a coordinator marked as removed comes back with the same backend id. In that case, admissiond will see it in current_backends and won't need to check the removed list. Even if a coordinator briefly flaps and a request is rejected, it's not critical, the coordinator can retry. So to keep the design simple and safe, we keep the removed coord entry as-is. Tests: Passed exhaustive tests. Added unit tests to verify the eviction logic and the duplicate case. Added regression test test_coord_not_registered_in_ac. Change-Id: I1e0f270299f8c20975d7895c17f4e2791c3360e0 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M tests/custom_cluster/test_admission_controller.py 5 files changed, 228 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/23094/9 -- To view, visit http://gerrit.cloudera.org:8080/23094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e0f270299f8c20975d7895c17f4e2791c3360e0 Gerrit-Change-Number: 23094 Gerrit-PatchSet: 9 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]>
