Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13868 )
Change subject: [WIP] IMPALA-8339: Add local node blacklist to coordinators ...................................................................... Patch Set 1: (16 comments) Some comments on the first pass. May be helpful to document the interaction between blacklisted node and cluster membership change. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h@284 PS1, Line 284: Exec() ExecQueryFInstance() http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc@394 PS1, Line 394: // The Exec() rpc failed, so blacklist the node. : const TBackendDescriptor& be_desc = backend_state->exec_params()->be_desc; : ExecEnv::GetInstance()->cluster_membership_mgr()->Blacklist(be_desc); One interesting case is what if the coordinator cannot RPC to itself. I suppose the assumption we make here is that blacklisting certain nodes will prevent scheduler from scheduling scans and thus the rest of the fragments on them. The best practice is to have dedicated coordinators and executors so a coordinator usually doesn't take the role of an executor. However, for certain queries, there may always be a root fragment which will run on the coordinator. This may get better as we move the root fragments off the coordinator. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@96 PS1, Line 96: Ohly typo http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206 PS1, Line 206: boost::mutex update_membership_lock_; Does this also need the mutable modifier like the other two locks ? Also, membership update seems to touch quite a number of fields in this class. May be it's better to clearly document the thread safety of each field. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206 PS1, Line 206: update_membership_lock_; Is there any chance this mutex may be held at the same time as various other mutex in this class ? If so, we had better document the lock ordering. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@119 PS1, Line 119: // Get any previously blacklisted nodes that should be unblacklisted. : std::list<TBackendDescriptor> unblacklisted = node_blacklist_.Maintenance(); I suppose this relies on StateStore being alive and keeps sending periodic cluster membership update. It's a fair assumption that StateStore should recover within a reasonable amount of time. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@168 PS1, Line 168: // Deleted item How would this work if the deleted item is in the list unblacklisted ? http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@175 PS1, Line 175: !blacklisted) { How would it work for a node which is already removed from the executor group due to blacklisting ? Should we remove it altogether from the blacklist if the cluster memebership update indicates this is already deleted ? Otherwise, won't it eventually get into the probation state and get added back with the stale BE descriptor ? In theory, a blacklisted node can get removed from the cluster and restarts with the same or different IP address. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@265 PS1, Line 265: for (const TBackendDescriptor& be_desc : unblacklisted) { How would this work if the cluster membership in the past indicates that this node is removed from the cluster already ? This probably warrants some more detailed documentation. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h File be/src/scheduling/node-blacklist.h: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@30 PS1, Line 30: Class to maintain a local blacklist of backends. Can you please describe the thread safety of this class ? Is there any assumption about the locks held by caller when calling into functions of this class ? http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@38 PS1, Line 38: come typo http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@48 PS1, Line 48: class NodeBlacklist { Is this class called only from the ClusterMembershipMgr class ? If so, may be worth moving this entire file and implementation to cluster-membership-mgr.cc ? http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@78 PS1, Line 78: /// If false, this node is on probation, meaning that it was previously on the : /// blacklist but was removed due to the timeout. We still keep the entry for awhile : /// in case it gets blacklisted again. : bool blacklisted; IMHO, this may be clearer if we use a enum instead to represent the state instead of just relying on a single boolean. It seems a bit error prone to interpret the return value of certain function (e.g. Remove() removing false can mean it's on probation but not blacklisted or it's not in the blacklist to begin with). enum { NOT_BLACKLISTED, BLACKLISTED, ON_PROBATION, }; http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc File be/src/scheduling/node-blacklist.cc: http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc@25 PS1, Line 25: blacklist_timeout_padding What's the reasoning behind having this timeout padding ? May be worth documenting it. http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc@42 PS1, Line 42: // If this node exists in the list, it must be on probation. Re-blacklisk it now. : DCHECK(!it->blacklisted); Is it possible for multiple queries to attempt to blacklist the same backend at the same time so it's not necessarily on probation, right ? http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc@83 PS1, Line 83: 10 May be helpful to define this as a constant with meaningful name. -- To view, visit http://gerrit.cloudera.org:8080/13868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74 Gerrit-Change-Number: 13868 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Comment-Date: Fri, 19 Jul 2019 00:49:08 +0000 Gerrit-HasComments: Yes
