Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16369 )
Change subject: IMPALA-9636: Don't run retried query on the blacklisted nodes ...................................................................... Patch Set 13: (4 comments) looking better, still some concerns about the logic in admission-controller.cc http://gerrit.cloudera.org:8080/#/c/16369/13/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/16369/13/be/src/scheduling/admission-controller.cc@1555 PS13, Line 1555: const ExecutorGroup* filtered_executor_group = its probably cleaner to make this a unique_ptr so that you don't have to manually call 'delete' whenever the method returns. that actually probably removes the need for the 'created' variable as well. in general, we try to prefer smart pointers over calling 'delete'. http://gerrit.cloudera.org:8080/#/c/16369/13/be/src/scheduling/admission-controller.cc@1562 PS13, Line 1562: if (created && num_executors == 0) { : // All executors are blacklisted for retried query. : delete filtered_executor_group; : continue; : } I'm not sure if this is correct; before this patch, 'executor_group' could by the empty executor group returned by 'GetEmptyExecutorGroup' which is used for scheduling coordinator-only queries. I would expect 'GetEmptyExecutorGroup()->NumExecutors()' would return 0. so it seems like adding this logic would break coordinator-only queries. I think you probably want to skip this check if the 'executor_group' == GetEmptyExecutorGroup()'. http://gerrit.cloudera.org:8080/#/c/16369/13/be/src/scheduling/admission-controller.cc@1568 PS13, Line 1568: cluster_membership_mgr_->GetEmptyExecutorGroup() == filtered_executor_group I don't think this is a valid comparison using '=='. IIUC, the '==' is not overloaded for the ExecutorGroup object, so this comparison is comparing if the objects themselves are the same, which doesn't seem possible given that GetFilteredExecutorGroup creates a new ExecutorGroup. http://gerrit.cloudera.org:8080/#/c/16369/13/be/src/scheduling/admission-controller.cc@1579 PS13, Line 1579: if (output_schedules->empty()) { : // Retried query could not be scheduled since all executors are blacklisted. : // To keep consistent with the other blacklisting logic, set not_admitted_reason as : // REASON_NO_EXECUTOR_GROUPS. : queue_node->not_admitted_reason = REASON_NO_EXECUTOR_GROUPS; : LOG(WARNING) << queue_node->not_admitted_reason; : } can this be moved before the call to 'Scheduler()'. it should be possible, although you might have to re-factor the for loop a bit. -- To view, visit http://gerrit.cloudera.org:8080/16369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105 Gerrit-Change-Number: 16369 Gerrit-PatchSet: 13 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Fri, 11 Sep 2020 18:16:34 +0000 Gerrit-HasComments: Yes
