Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13868
Change subject: [WIP] IMPALA-8339: Add local node blacklist to coordinators ...................................................................... [WIP] IMPALA-8339: Add local node blacklist to coordinators This patch adds the concept of a blacklist of executors to the coordinator, which removes nodes from consideration for query scheduling. Blacklisting decisions are local to a given coordinator and are not included in statestore updates. The intention is to allow coordinators to be more aggressive about deciding that a node is unhealthy or unavailable, to minimize failed queries in environments where cluster membership may be more variable, rather than having to wait on the statestore heartbeat mechanism to decide that the node is down. For the first patch, nodes will only be blacklisted if the KRPC status for Exec() is an error. Followup work will add blacklisting of nodes in more complex scenarios, eg. if a node appears to be a straggler. TODO: - Metrics/logs around blacklisting decisions - Mechanism for manual override of blacklisting decisions - Automated testing - Manual testing at scale/stress Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/scheduling/CMakeLists.txt M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h A be/src/scheduling/node-blacklist.cc A be/src/scheduling/node-blacklist.h M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M tests/common/impala_cluster.py 13 files changed, 321 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/13868/1 -- To view, visit http://gerrit.cloudera.org:8080/13868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74 Gerrit-Change-Number: 13868 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
