[Impala-ASF-CR] WIP IMPALA-8484: Run queries on disjoint executor groups

Tim Armstrong (Code Review) Fri, 28 Jun 2019 17:19:21 -0700

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13550 )


Change subject: WIP IMPALA-8484: Run queries on disjoint executor groups
......................................................................


Patch Set 8:

(15 comments)

First round of feedback. really like the overall direction that it's going. I 
think generally my main feedback if I did a full pass would be:

* The usually code readability stuff
* More documentation in code comments and the commit message of the expected 
behaviour
* Clarifications of the various edge cases when things are in not-fully-healthy 
states.

http://gerrit.cloudera.org:8080/#/c/13550/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13550/8//COMMIT_MSG@19
PS8, Line 19: - When using executor groups, only a single coordinator and a 
single AC
What happens if you have multiple coordinators?


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@370
PS8, Line 370:   /// The per host number of queries admitted only for the 
queries admitted locally.
It seems weird to me to have all these maps with one value each instead of one 
map with a struct in. Not the biggest deal since you're following the existing 
pattern.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@589
PS8, Line 589:     /// Map from executor group to schedule
There are a few members here that are set in various states.

Might be worth documenting it in terms of a state machine and what members are 
set in which state.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@696
PS8, Line 696:   /// number of slots per executors does not change with the 
group or cluster size and
Per offline discussion, I lean towards the alternate option of # slots being 
something that each executor determines (based on # cores or configuration). I 
don't think it it critical though - you can achieve the same behaviour either 
way so long as you don't want to have heterogeneous executors.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@707
PS8, Line 707:   void UpdateHostNumAdmitted(const QuerySchedule& schedule, 
int64_t num);
I feel like these two functions should really be combined - there's no reason 
to ever call one but not the other, right?


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@710
PS8, Line 710:   /// Rejection happens in several stages
This really helps clarify, thanks!


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.h@725
PS8, Line 725:   ///     - Thread reservation limit (thread_reservation_limit,
When we added the largest_min_mem_reservation and thread_reservation_limit 
checks, we didn't think about what might happen if the schedule changed. Maybe 
if we'd thought about that we would have done it in a way that was 
schedule-agnostic, e.g. assume that all fragments will run on a host, which 
overestimates the resource requirements somewhat.

Anyway, I guess there are plenty of other checks in this bucket, so we wouldn't 
be simplifying things.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@549
PS8, Line 549:     // All other executor groups are limited by the number of 
running queries per
It's a little counter-intuitive that the slots mechanism doesn't apply to the 
default executor group. This is just to preserve the previous behaviour, right?


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@888
PS8, Line 888:     
schedule_result->reset(queue_node.admitted_schedule.release());
This is OK, but probably is cleaner just to convert the output to unique_ptr.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1088
PS8, Line 1088:     if (executor_group->NumExecutors() == 0) continue;
Maybe we should factor this out into a helper like executor_group->IsHealthy(), 
just as a placeholder for a more sophisticated check.

I think in future we'll probably want to avoid scheduling on executor groups 
that don't have enough members or similar.

I'm wondering if we could use a somewhat simple heuristic like "only schedule 
on executor groups that have 75% of the members of the largest group". That 
would avoid two kinds of pathological behaviour - never running queries when 
there are available groups, because no groups are considered health, and 
running queries on unhealthy groups when healthy groups are available (and 
maybe just busy).


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1154
PS8, Line 1154: void AdmissionController::DequeueLoop() {
This function is getting pretty unwieldy.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1182
PS8, Line 1182: GetMaxToDequeue
Maybe want to rename to indicate that it's only applicable to default group


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1231
PS8, Line 1231:             //   rejection message gets lost
Does this mean that the query status doesn't include the reason for rejection? 
That seems like a problem.


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1336
PS8, Line 1336:   for (auto& it: queue_node->per_group_schedules) {
I do wonder if we're going to end up wanting to define some kind of ordering 
over the schedules. This might help pack queries into a smaller number of 
executor groups. We might also want to schedule preferentially on executor 
groups with more executors (i.e. try to drain less-healthy groups instead of 
arbitrary groups).


http://gerrit.cloudera.org:8080/#/c/13550/8/be/src/scheduling/admission-controller.cc@1363
PS8, Line 1363:     if (check_for_rejection && RejectForSchedule(
This is because we're assuming that if it doesn't fit on a single group, it 
shouldn't fit on any right?

This seems like it makes it fairly critical that we shouldn't schedule on 
"unhealthy" groups with fewer executors than necessary. E.g. an executor group 
with a single executor coming online could maybe lead to spurious rejections?



--
To view, visit http://gerrit.cloudera.org:8080/13550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b
Gerrit-Change-Number: 13550
Gerrit-PatchSet: 8
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Sat, 29 Jun 2019 00:19:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-8484: Run queries on disjoint executor groups

Reply via email to