Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/15378 )
Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol ...................................................................... Patch Set 8: (10 comments) Responding to the latest set of comments. Code changes will follow soonish. http://gerrit.cloudera.org:8080/#/c/15378/8/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/15378/8/shell/impala_client.py@947 PS8, Line 947: Retries, if enabled, are attempted : for all exceptions other than QueryCancelledByShellException : and MissingThriftMethodException > My understanding of TApplicationExceptions is that they are thrown by the T I totally agree with not retrying all exceptions. Only the ones which arise from the transport layer. I have made TApplicationExceptions as not retriable. http://gerrit.cloudera.org:8080/#/c/15378/7/tests/custom_cluster/test_hs2_fault_injection.py File tests/custom_cluster/test_hs2_fault_injection.py: http://gerrit.cloudera.org:8080/#/c/15378/7/tests/custom_cluster/test_hs2_fault_injection.py@123 PS7, Line 123: @pytest.mark.execute_serially > For what it's worth, it seems that all of the custom cluster are serial, be Yeah the cluster gets restarted for every test. I am also not sure if we have the capability to run tests in parallel for the custom cluster. I see many-many custom cluster tests use the '@pytest.mark.execute_serially' tag and many which don't. I originally started writing these tests in an ImpalaTestSuite class. But, realized that the 'test_connect' leaves an idle session since we don't get back a session handle (to close). I think the 'test_execute_query' also leaves behind and orphan query handle/fragments since we don't close the query (again because we don't have the query handle). In ImpalaTestSuite tests there is a metrics_verifier which is run at the end (after executing all tests in a class) and it fails because some of the metrics such as session_handle and fragments etc are non zero. I could still split the tests, and use CustomClusterTestSuite for the ones which don't close properly and use ImpalaTestSuite for other, but I prefer having all tests in one common place. http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py File tests/custom_cluster/test_hs2_fault_injection.py: http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@30 PS8, Line 30: """Class for injecting faults in the THttpClient.""" > should include an explanation of how faults are injected (e.g. they are inj Done. http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@40 PS8, Line 40: fault > nit: enable_fault would sound clearer Done http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@81 PS8, Line 81: """Fault injecting ImpalaHS2Client class using http as transport""" > should mention the relation between this class and FaultInjectingHttpClient Done http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@85 PS8, Line 85: """Creates a transport with HTTP as the base.""" > Or possibly moved above the call to super(...) as a proper docstring. Good point. I'll move it as a proper docstring. http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@101 PS8, Line 101: CustomClusterTestSuite > why does this need to be a custom cluster test? you are just injecting faul The test_connect test leaves an idle session since we don't get back a session handle. I think the test_execute_query also leaves behind and orphan query handle/fragments since we don't close the query (again because we don't have the query handle). In ImpalaTestSuite tests there is a metrics_verifier which is run at the end (after executing all tests in a class) and it fails because some of the metrics such as session_handle and fragments etc are non zero. http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@106 PS8, Line 106: kerberos_host_fqdn = None > why is this needed, just pass None as a parameter? Done. http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@128 PS8, Line 128: self.transport.fault(502, "Injected Fault", 0.20) > whats the point in adding a frequency to these tests? why not always inject I think the point of having a frequency is so that we have some faults and we are able to recover from them after retries. The retries only help if faults don't happen 100% of the time, else the retries will also fail. http://gerrit.cloudera.org:8080/#/c/15378/8/tests/custom_cluster/test_hs2_fault_injection.py@129 PS8, Line 129: self.connect() > is there a way to actually verify that operations were retried? I think we might be able to get it from stderr. -- To view, visit http://gerrit.cloudera.org:8080/15378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5 Gerrit-Change-Number: 15378 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Thu, 19 Mar 2020 20:55:03 +0000 Gerrit-HasComments: Yes
