Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13550 )
Change subject: WIP IMPALA-8484: Run queries on disjoint executor groups ...................................................................... Patch Set 8: (15 comments) First round of feedback. really like the overall direction that it's going. I think generally my main feedback if I did a full pass would be: * The usually code readability stuff * More documentation in code comments and the commit message of the expected behaviour * Clarifications of the various edge cases when things are in not-fully-healthy states. http://gerrit.cloudera.org:8080/#/c/13550/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13550/8//COMMIT_MSG@19 PS8, Line 19: - When using executor groups, only a single coordinator and a single AC What happens if you have multiple coordinators? http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@370 PS8, Line 370: /// The per host number of queries admitted only for the queries admitted locally. It seems weird to me to have all these maps with one value each instead of one map with a struct in. Not the biggest deal since you're following the existing pattern. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@589 PS8, Line 589: /// Map from executor group to schedule There are a few members here that are set in various states. Might be worth documenting it in terms of a state machine and what members are set in which state. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@696 PS8, Line 696: /// number of slots per executors does not change with the group or cluster size and Per offline discussion, I lean towards the alternate option of # slots being something that each executor determines (based on # cores or configuration). I don't think it it critical though - you can achieve the same behaviour either way so long as you don't want to have heterogeneous executors. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@707 PS8, Line 707: void UpdateHostNumAdmitted(const QuerySchedule& schedule, int64_t num); I feel like these two functions should really be combined - there's no reason to ever call one but not the other, right? http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@710 PS8, Line 710: /// Rejection happens in several stages This really helps clarify, thanks! http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@725 PS8, Line 725: /// - Thread reservation limit (thread_reservation_limit, When we added the largest_min_mem_reservation and thread_reservation_limit checks, we didn't think about what might happen if the schedule changed. Maybe if we'd thought about that we would have done it in a way that was schedule-agnostic, e.g. assume that all fragments will run on a host, which overestimates the resource requirements somewhat. Anyway, I guess there are plenty of other checks in this bucket, so we wouldn't be simplifying things. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@549 PS8, Line 549: // All other executor groups are limited by the number of running queries per It's a little counter-intuitive that the slots mechanism doesn't apply to the default executor group. This is just to preserve the previous behaviour, right? http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@888 PS8, Line 888: schedule_result->reset(queue_node.admitted_schedule.release()); This is OK, but probably is cleaner just to convert the output to unique_ptr. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1088 PS8, Line 1088: if (executor_group->NumExecutors() == 0) continue; Maybe we should factor this out into a helper like executor_group->IsHealthy(), just as a placeholder for a more sophisticated check. I think in future we'll probably want to avoid scheduling on executor groups that don't have enough members or similar. I'm wondering if we could use a somewhat simple heuristic like "only schedule on executor groups that have 75% of the members of the largest group". That would avoid two kinds of pathological behaviour - never running queries when there are available groups, because no groups are considered health, and running queries on unhealthy groups when healthy groups are available (and maybe just busy). http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1154 PS8, Line 1154: void AdmissionController::DequeueLoop() { This function is getting pretty unwieldy. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1182 PS8, Line 1182: GetMaxToDequeue Maybe want to rename to indicate that it's only applicable to default group http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1231 PS8, Line 1231: // rejection message gets lost Does this mean that the query status doesn't include the reason for rejection? That seems like a problem. http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1336 PS8, Line 1336: for (auto& it: queue_node->per_group_schedules) { I do wonder if we're going to end up wanting to define some kind of ordering over the schedules. This might help pack queries into a smaller number of executor groups. We might also want to schedule preferentially on executor groups with more executors (i.e. try to drain less-healthy groups instead of arbitrary groups). http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1363 PS8, Line 1363: if (check_for_rejection && RejectForSchedule( This is because we're assuming that if it doesn't fit on a single group, it shouldn't fit on any right? This seems like it makes it fairly critical that we shouldn't schedule on "unhealthy" groups with fewer executors than necessary. E.g. an executor group with a single executor coming online could maybe lead to spurious rejections? -- To view, visit http://gerrit.cloudera.org:8080/13550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Gerrit-Change-Number: 13550 Gerrit-PatchSet: 8 Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sat, 29 Jun 2019 00:19:10 +0000 Gerrit-HasComments: Yes
