Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23094 )

Change subject: IMPALA-12057: Track removed coordinators to reject queued 
queries early
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23094/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23094/5//COMMIT_MSG@13
PS5, Line 13: resulting in unnecessary waiting until timeout.
> What is the default admission queue timeout? Please mention in commit messa
The timeout is determined by either FLAGS_queue_wait_timeout_ms or the 
queue_timeout_ms in the pool config. By default, FLAGS_queue_wait_timeout_ms is 
1 minute, but in production it's normally configured to 10 to 15 minutes. Added 
this to the commit message.
A TTL-based approach makes sense in theory, but in cases like IMPALA-10866, a 
coordinator can still send requests to admissiond even after being removed from 
the statestore. So we can’t guarantee the TTL will outlast such requests. This 
patch trusts the statestore and uses a fixed-size list as a simple, bounded 
solution. TTL adds complexity and potential edge case issues, which may not be 
worth it here. Making the size configurable could be a future improvement.


http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/admission-controller.cc@263
PS5, Line 263: FIX = "The coordinator no longer
> Add coordinator identifier or host/port in this message? It will make assoc
Done


http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/admission-controller.cc@2547
PS5, Line 2547: stats->agg_num_queued(),
> The call path to detect Coordinator removal is from FindGroupToAdmitOrRejec
Changed to allow FindGroupToAdmitOrReject to return false in this case.


http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/cluster-membership-mgr-test.cc
File be/src/scheduling/cluster-membership-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/cluster-membership-mgr-test.cc@329
PS5, Line 329:
> Can you increase the loop? Maybe 10?
Done


http://gerrit.cloudera.org:8080/#/c/23094/5/be/src/scheduling/cluster-membership-mgr-test.cc@332
PS5, Line 332: ble(rng_);
> Can you create another test like this but with changing backend_id in each
Done


http://gerrit.cloudera.org:8080/#/c/23094/5/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/23094/5/tests/custom_cluster/test_admission_controller.py@2170
PS5, Line 2170:
> Can you loop the test body, say, 3x to ensure consistent behavior between c
Done


http://gerrit.cloudera.org:8080/#/c/23094/5/tests/custom_cluster/test_admission_controller.py@2178
PS5, Line 2178: .execute_serially
> Use self.client.execute_async so it consistent with L2185.
Done



--
To view, visit http://gerrit.cloudera.org:8080/23094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e0f270299f8c20975d7895c17f4e2791c3360e0
Gerrit-Change-Number: 23094
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 10 Jul 2025 06:20:30 +0000
Gerrit-HasComments: Yes

Reply via email to