Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40
PS1, Line 40: admission_max_retry_time_s
> I am wondering if it's good to set maximum number of times for retrying RPC
Given the use of back off and jitter, it's hard to relate a maximum number of 
retries with how long that number of retries will take, so I think its easier 
for users to think about a maximum amount of time to retry instead.

Its a good point that I choose this number (60 seconds) basically arbitrarily. 
It might be good to set it experimentally, but the right number is potentially 
going to depend a lot on the configuration of the system that's monitoring the 
admissiond and restarting it (eg. Kubernetes, CM, etc.) and there may not be a 
single value that's always appropriate.

I would also argue that if its going to take a very long time to get the 
admissiond restarted, it may be the right thing to just let some queries fail 
rather than having them all retry for long enough, since in the time it takes 
the admissiond to come up you could have a ton of queries sitting around 
retrying which could cause other problems (eg. the coordinator runs out of 
threads, the new admissiond gets overwhelmed by all the requests and falls over 
again, etc.), so 60 seconds seems reasonable to me.

Happy to file a JIRA to examine this more if you want.


http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1369
PS1, Line 1369: e
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1385
PS1, Line 1385:     result = self.client.fetch(query, after_kill_handle)
> Could you add another test case for which sleep more than admission_max_ret
Done



--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:05:58 +0000
Gerrit-HasComments: Yes

Reply via email to