Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/23094 )
Change subject: IMPALA-12057: Track removed coordinators to reject queued queries early ...................................................................... Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/23094/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23094/5//COMMIT_MSG@13 PS5, Line 13: resulting in unnecessary waiting until timeout. > What is the default admission queue timeout? Please mention in commit messa The timeout is determined by either FLAGS_queue_wait_timeout_ms or the queue_timeout_ms in the pool config. By default, FLAGS_queue_wait_timeout_ms is 1 minute, but in production it's normally configured to 10 to 15 minutes. Added this to the commit message. A TTL-based approach makes sense in theory, but in cases like IMPALA-10866, a coordinator can still send requests to admissiond even after being removed from the statestore. So we can’t guarantee the TTL will outlast such requests. This patch trusts the statestore and uses a fixed-size list as a simple, bounded solution. TTL adds complexity and potential edge case issues, which may not be worth it here. Making the size configurable could be a future improvement. http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/admission-controller.cc@263 PS5, Line 263: FIX = "The coordinator no longer > Add coordinator identifier or host/port in this message? It will make assoc Done http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/admission-controller.cc@2547 PS5, Line 2547: stats->agg_num_queued(), > The call path to detect Coordinator removal is from FindGroupToAdmitOrRejec Changed to allow FindGroupToAdmitOrReject to return false in this case. http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/cluster-membership-mgr-test.cc File be/src/scheduling/cluster-membership-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/cluster-membership-mgr-test.cc@329 PS5, Line 329: > Can you increase the loop? Maybe 10? Done http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/cluster-membership-mgr-test.cc@332 PS5, Line 332: ble(rng_); > Can you create another test like this but with changing backend_id in each Done http://gerrit.cloudera.org:8080/#/c/23094/5/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/23094/5/tests/custom_cluster/test_admission_controller.py@2170 PS5, Line 2170: > Can you loop the test body, say, 3x to ensure consistent behavior between c Done http://gerrit.cloudera.org:8080/#/c/23094/5/tests/custom_cluster/test_admission_controller.py@2178 PS5, Line 2178: .execute_serially > Use self.client.execute_async so it consistent with L2185. Done -- To view, visit http://gerrit.cloudera.org:8080/23094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e0f270299f8c20975d7895c17f4e2791c3360e0 Gerrit-Change-Number: 23094 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 10 Jul 2025 06:20:30 +0000 Gerrit-HasComments: Yes
