Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )
Change subject: IMPALA-10577: Add retrying of AdmitQuery ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc File be/src/scheduling/remote-admission-control-client.cc: http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40 PS1, Line 40: admission_max_retry_time_s > I am wondering if it's good to set maximum number of times for retrying RPC Given the use of back off and jitter, it's hard to relate a maximum number of retries with how long that number of retries will take, so I think its easier for users to think about a maximum amount of time to retry instead. Its a good point that I choose this number (60 seconds) basically arbitrarily. It might be good to set it experimentally, but the right number is potentially going to depend a lot on the configuration of the system that's monitoring the admissiond and restarting it (eg. Kubernetes, CM, etc.) and there may not be a single value that's always appropriate. I would also argue that if its going to take a very long time to get the admissiond restarted, it may be the right thing to just let some queries fail rather than having them all retry for long enough, since in the time it takes the admissiond to come up you could have a ton of queries sitting around retrying which could cause other problems (eg. the coordinator runs out of threads, the new admissiond gets overwhelmed by all the requests and falls over again, etc.), so 60 seconds seems reasonable to me. Happy to file a JIRA to examine this more if you want. http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1369 PS1, Line 1369: e > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1385 PS1, Line 1385: result = self.client.fetch(query, after_kill_handle) > Could you add another test case for which sleep more than admission_max_ret Done -- To view, visit http://gerrit.cloudera.org:8080/17188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a Gerrit-Change-Number: 17188 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Tue, 16 Mar 2021 18:05:58 +0000 Gerrit-HasComments: Yes
