Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13868


Change subject: [WIP] IMPALA-8339: Add local node blacklist to coordinators
......................................................................

[WIP] IMPALA-8339: Add local node blacklist to coordinators

This patch adds the concept of a blacklist of executors to the
coordinator, which removes nodes from consideration for query
scheduling. Blacklisting decisions are local to a given coordinator
and are not included in statestore updates.

The intention is to allow coordinators to be more aggressive about
deciding that a node is unhealthy or unavailable, to minimize failed
queries in environments where cluster membership may be more variable,
rather than having to wait on the statestore heartbeat mechanism to
decide that the node is down.

For the first patch, nodes will only be blacklisted if the KRPC status
for Exec() is an error. Followup work will add blacklisting of nodes
in more complex scenarios, eg. if a node appears to be a straggler.

TODO:
- Metrics/logs around blacklisting decisions
- Mechanism for manual override of blacklisting decisions
- Automated testing
- Manual testing at scale/stress

Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/scheduling/cluster-membership-mgr.h
A be/src/scheduling/node-blacklist.cc
A be/src/scheduling/node-blacklist.h
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M tests/common/impala_cluster.py
13 files changed, 321 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/13868/1
--
To view, visit http://gerrit.cloudera.org:8080/13868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74
Gerrit-Change-Number: 13868
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>

Reply via email to