Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13868

to look at the new patch set (#9).

Change subject: IMPALA-8339: Add local executor blacklist to coordinators
......................................................................

IMPALA-8339: Add local executor blacklist to coordinators

This patch adds the concept of a blacklist of executors to the
coordinator, which removes executors from consideration for query
scheduling. Blacklisting decisions are local to a given coordinator
and are not included in statestore updates.

The intention is to allow coordinators to be more aggressive about
deciding that an exeutor is unhealthy or unavailable, to minimize
failed queries in environments where cluster membership may be more
variable, rather than having to wait on the statestore heartbeat
mechanism to decide that the executor is down.

For the first patch, executors will only be blacklisted if the KRPC
status for Exec() is an error. Followup work will add blacklisting of
executors in more complex scenarios, eg. if an executor appears to be
a straggler.

When a query is scheduled and there is currently some blacklisted
executors, a new line 'Blacklisted Executors:' is added to the profile
listing the hostnames of all such executors.

Testing:
- Added a case to the cluster mgr BE unit test that uses blacklisting.
- Added e2e test cases for killing and restarting an impalad.
- Manual randomized testing locally with iptables.
TODO
- Add an e2e test case where an impalad becomes briefly unreachable.
- Manual/stress tests on a real cluster.

Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/cluster-membership-mgr-test.cc
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
A be/src/scheduling/executor-blacklist.cc
A be/src/scheduling/executor-blacklist.h
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
A tests/custom_cluster/test_blacklist.py
15 files changed, 735 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/13868/9
--
To view, visit http://gerrit.cloudera.org:8080/13868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74
Gerrit-Change-Number: 13868
Gerrit-PatchSet: 9
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>

Reply via email to