[Impala-ASF-CR] [WIP] IMPALA-8339: Add local node blacklist to coordinators

Michael Ho (Code Review) Thu, 18 Jul 2019 17:50:04 -0700

Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13868 )


Change subject: [WIP] IMPALA-8339: Add local node blacklist to coordinators
......................................................................


Patch Set 1:

(16 comments)

Some comments on the first pass. May be helpful to document the interaction 
between blacklisted node and cluster membership change.

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator-backend-state.h@284
PS1, Line 284: Exec()
ExecQueryFInstance()


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/runtime/coordinator.cc@394
PS1, Line 394:       // The Exec() rpc failed, so blacklist the node.
             :       const TBackendDescriptor& be_desc = 
backend_state->exec_params()->be_desc;
             :       
ExecEnv::GetInstance()->cluster_membership_mgr()->Blacklist(be_desc);
One interesting case is what if the coordinator cannot RPC to itself. I suppose 
the assumption we make here is that blacklisting certain nodes will prevent 
scheduler from scheduling scans and thus the rest of the fragments on them. The 
best practice is to have dedicated coordinators and executors so a coordinator 
usually doesn't take the role of an executor. However, for certain queries, 
there may always be a root fragment which will run on the coordinator.

This may get better as we move the root fragments off the coordinator.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@96
PS1, Line 96: Ohly
typo


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206
PS1, Line 206:   boost::mutex update_membership_lock_;
Does this also need the mutable modifier like the other two locks ?

Also, membership update seems to touch quite a number of fields in this class. 
May be it's better to clearly document the thread safety of each field.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.h@206
PS1, Line 206: update_membership_lock_;
Is there any chance this mutex may be held at the same time as various other 
mutex in this class ? If so, we had better document the lock ordering.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@119
PS1, Line 119:   // Get any previously blacklisted nodes that should be 
unblacklisted.
             :   std::list<TBackendDescriptor> unblacklisted = 
node_blacklist_.Maintenance();
I suppose this relies on StateStore being alive and keeps sending periodic 
cluster membership update. It's a fair assumption that StateStore should 
recover within a reasonable amount of time.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@168
PS1, Line 168:     // Deleted item
How would this work if the deleted item is in the list unblacklisted ?


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@175
PS1, Line 175:  !blacklisted) {
How would it work for a node which is already removed from the executor group 
due to blacklisting ? Should we remove it altogether from the blacklist if the 
cluster memebership update indicates this is already deleted ? Otherwise, won't 
it eventually get into the probation state and get added back with the stale BE 
descriptor ?

In theory, a blacklisted node can get removed from the cluster and restarts 
with the same or different IP address.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/cluster-membership-mgr.cc@265
PS1, Line 265: for (const TBackendDescriptor& be_desc : unblacklisted) {
How would this work if the cluster membership in the past indicates that this 
node is removed from the cluster already ? This probably warrants some more 
detailed documentation.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h
File be/src/scheduling/node-blacklist.h:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@30
PS1, Line 30: Class to maintain a local blacklist of backends.
Can you please describe the thread safety of this class ? Is there any 
assumption about the locks held by caller when calling into functions of this 
class ?


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@38
PS1, Line 38: come
typo


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@48
PS1, Line 48: class NodeBlacklist {
Is this class called only from the ClusterMembershipMgr class ? If so, may be 
worth moving this entire file and implementation to cluster-membership-mgr.cc ?


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.h@78
PS1, Line 78:     /// If false, this node is on probation, meaning that it was 
previously on the
            :     /// blacklist but was removed due to the timeout. We still 
keep the entry for awhile
            :     /// in case it gets blacklisted again.
            :     bool blacklisted;
IMHO, this may be clearer if we use a enum instead to represent the state 
instead of just relying on a single boolean. It seems a bit error prone to 
interpret the return value of certain function (e.g. Remove() removing false 
can mean it's on probation but not blacklisted or it's not in the blacklist to 
begin with).

 enum {
    NOT_BLACKLISTED,
    BLACKLISTED,
    ON_PROBATION,
  };


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc
File be/src/scheduling/node-blacklist.cc:

http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc@25
PS1, Line 25: blacklist_timeout_padding
What's the reasoning behind having this timeout padding ? May be worth 
documenting it.


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc@42
PS1, Line 42:     // If this node exists in the list, it must be on probation. 
Re-blacklisk it now.
            :     DCHECK(!it->blacklisted);
Is it possible for multiple queries to attempt to blacklist the same backend at 
the same time so it's not necessarily on probation, right ?


http://gerrit.cloudera.org:8080/#/c/13868/1/be/src/scheduling/node-blacklist.cc@83
PS1, Line 83: 10
May be helpful to define this as a constant with meaningful name.



--
To view, visit http://gerrit.cloudera.org:8080/13868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb6e73b84042c33cd475b82470a975d04ee9b74
Gerrit-Change-Number: 13868
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Comment-Date: Fri, 19 Jul 2019 00:49:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-8339: Add local node blacklist to coordinators

Reply via email to