Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14641 )
Change subject: IMPALA-8138: Reintroduce rpc debugging options ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/14641/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14641/3//COMMIT_MSG@26 PS3, Line 26: RPC_SERVICE_POOL IMPALA_SERVICE_POOL? http://gerrit.cloudera.org:8080/#/c/14641/3/be/src/rpc/rpc-mgr.cc File be/src/rpc/rpc-mgr.cc: http://gerrit.cloudera.org:8080/#/c/14641/3/be/src/rpc/rpc-mgr.cc@193 PS3, Line 193: DCHECK(IsResolvedAddress(address_)); move this to Init? since that is when address is passed in? http://gerrit.cloudera.org:8080/#/c/14641/3/be/src/util/debug-util.cc File be/src/util/debug-util.cc: http://gerrit.cloudera.org:8080/#/c/14641/3/be/src/util/debug-util.cc@400 PS3, Line 400: if (ImpaladMetrics::DEBUG_ACTION_NUM_FAIL != nullptr) { this should never be nullptr right? add a DCHECK asserting it is never a nullptr? http://gerrit.cloudera.org:8080/#/c/14641/3/tests/custom_cluster/test_rpc_exception.py File tests/custom_cluster/test_rpc_exception.py: http://gerrit.cloudera.org:8080/#/c/14641/3/tests/custom_cluster/test_rpc_exception.py@63 PS3, Line 63: must_fail=True is this used anywhere? http://gerrit.cloudera.org:8080/#/c/14641/3/tests/custom_cluster/test_rpc_exception.py@63 PS3, Line 63: execute_test_query it seems like this while loop in this method is necessary because there is only a certain probability that a failure is actually injected. the reason the probability is necessary is that we don't want all RPC attempts to fail. would a better way to do this be something similar to https://github.com/apache/impala/commit/19cb8dc1c1c2247e91adc4bf62cab27a7c1e4381#diff-ab4af79ee4df02bf95d708a1d207f79aR189-R201 maybe there is a generic way to create a FAIL_FIRST debug action? http://gerrit.cloudera.org:8080/#/c/14641/3/tests/custom_cluster/test_rpc_exception.py@67 PS3, Line 67: while self._get_num_fails(impalad) == 0: should there be a limit or a timeout to the number of times a query is attempted? otherwise if this test breaks this loop may never exit http://gerrit.cloudera.org:8080/#/c/14641/3/tests/custom_cluster/test_rpc_exception.py@76 PS3, Line 76: assert self._get_num_fails(impalad) > 0 is this necessary? if the while loop exits this should always be true, right? -- To view, visit http://gerrit.cloudera.org:8080/14641 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c047ebce6d32c5ae461f70279391fa2df4c2029 Gerrit-Change-Number: 14641 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Wed, 13 Nov 2019 01:56:42 +0000 Gerrit-HasComments: Yes