Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16369 )
Change subject: IMPALA-9636: Don't run retried query on the blacklisted nodes ...................................................................... Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/runtime/query-driver.cc File be/src/runtime/query-driver.cc: http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/runtime/query-driver.cc@128 PS8, Line 128: : // Set blacklisted_executor_addresses for client_request_state of the original query. : if (blacklisted_executor_addresses != nullptr : && !blacklisted_executor_addresses->empty()) { : client_request_state_->SetBlacklistedExecutorAddresses( : *blacklisted_executor_addresses); : } > instead of passing blacklisted_executor_addresses into this method, it seem Good suggestion. It reduces code change. Fixed it as suggested. http://gerrit.cloudera.org:8080/#/c/16369/4/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/16369/4/be/src/scheduling/admission-controller.cc@1572 PS4, Line 1572: reated) delete filtered_executor_group; > right now, if a query is retried and all executors are blacklisted, the que Yes, we can fix it by changing the returned status. http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc@1120 PS8, Line 1120: if (!request.blacklisted_executor_addresses.empty()) { : // This debug_action is to allow delay to be injected for the admission of a retried : // query, which is triggered by blacklisted node, before getting membership snapshot. : DebugActionNoFail(FLAGS_debug_actions, "ADMISSION_DELAY_FOR_RETRIED_QUERY"); : } > can you use CRS_BEFORE_ADMISSION instead? No, it does not work for adding new cases which need to trigger blacklist timeout for retried query. CRS_BEFORE_ADMISSION apply all queries. If we use it, the original attempt of query will be failed due to timeout, instead of RPC failure for the new test cases. We need a debug action which only apply to retried query. http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/scheduler.cc@163 PS8, Line 163: // If exec_at_coord equals false and number of executors in the executor group : // equals 0, the overloaded ComputeScanRangeAssignment(), which will be called : // below, will return error. > not sure I fully understand why the DCHECK needs to be removed, could you e If all executor are blacklisted, the number of executors in filtered executor could be 0. This could happen and is already handled in overloaded ComputeScanRangeAssignment() so we should remove DCHECK here. Actually the new test case test_query_retries.py::TestQueryRetries::test_retry_query_failure_no_executor_available will hit this DCHECK failure if we don't remove it. http://gerrit.cloudera.org:8080/#/c/16369/8/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/16369/8/tests/custom_cluster/test_query_retries.py@287 PS8, Line 287: _get_rpc_fail_action(FAILED_KRPC_PORT_2) > nice! I think this actually solves IMPALA-9227 as well. Maybe we need to add more test cases for IMPALA-9227. -- To view, visit http://gerrit.cloudera.org:8080/16369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105 Gerrit-Change-Number: 16369 Gerrit-PatchSet: 8 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Wed, 09 Sep 2020 03:21:56 +0000 Gerrit-HasComments: Yes
