Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/13207 )
Change subject: IMPALA-8460: Simplify cluster membership management ...................................................................... Patch Set 5: (20 comments) http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc@454 PS5, Line 454: // Register the ImpalaServer with the cluster membership manager : cluster_membership_mgr_->SetLocalBeDescFn([server]() { : return server->GetLocalBackendDescriptor(); : }); : cluster_membership_mgr_->SetUpdateLocalServerFn( : [server](const ClusterMembershipMgr::BackendAddressSet& current_backends) { : server->CancelQueriesOnFailedBackends(current_backends); : }); just thinking out aloud: should we reset the callback functions during teardown (in ~ExecEnv)? http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@54 PS5, Line 54: /// Clients can also register callbacks to receive notifications of changes to the cluster : /// membership. this makes it sound like there is a generic way of registering callbacks like we have for statestore can you update this based on the specific callback registrations we have http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@68 PS5, Line 68: pool nit: executor pool. Is there a description of "executor pools" anywhere? http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@102 PS5, Line 102: then nit: when http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@112 PS5, Line 112: subscription nit: subscription. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@117 PS5, Line 117: /// Registers a callback to provide the local backend descriptor. : void SetLocalBeDescFn(BackendDescriptorPtrFn fn); : : /// Registers a callback to notify the local ImpalaServer of changes in the cluster : /// membership. This callback will only be called when backends are deleted from the : /// membership. : void SetUpdateLocalServerFn(UpdateLocalServerFn fn); : : /// Registers a callback to notify the local Frontend of changes in the cluster : /// membership. : void SetUpdateFrontendFn(UpdateFrontendFn fn); I know we discussed this offline, but it might be worth documenting in this gerrit review as to why you went with specific callback registration vs having a generic event bus or using the execEnv to call frontend/impalaServer directly them directly http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@165 PS5, Line 165: in nit: extra word http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@167 PS5, Line 167: / occur in 'current_backends' nit: outdated comment? http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@184 PS5, Line 184: May be NULL if the set of : /// backends is fixed. maybe mention that this is only true in tests http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@88 PS5, Line 88: update.is_delta && update.topic_entries.empty() just curious, when can we receive an empty delta http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@127 PS5, Line 127: const vector<string>& groups = DEFAULT_EXECUTOR_GROUPS; : for (const string& group : groups) { : (*new_executor_groups)[group].RemoveExecutor(be_desc); : } This kinda implies that an executor can be added to multiple groups. I think in the subsequent patch that introduces the concept of exec groups, every executor will have the group name in their backendDescriptor. maybe add a comment somewhere describing why we are iterating right now, and how this will change http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@255 PS5, Line 255: auto nit: const auto& http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@263 PS5, Line 263: const BackendIdMap::value_type& nit: const auto& http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@297 PS5, Line 297: auto nit: const auto& http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h File be/src/scheduling/executor-group.h: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h@41 PS5, Line 41: The caller must make sure that the : /// executor group does not get updated while they hold the reference. would it make sense to document what is the expected usage pattern for this class? This line is a bit confusing since this is true for any mutable object shared across threads. having some context would be helpful to understand why we are highlighting it here. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h@49 PS5, Line 49: Note that during tests this can include multiple : /// executors per ip address. any reason we are specifying this here? the class description already implicitly mentions that there can be multiple executors per IP address. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h@a334 PS5, Line 334: we can add something like this back when we introduce exec groups. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h@145 PS5, Line 145: the global executor nit: maybe call this an "executor group"? http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc@439 PS5, Line 439: // TODO(lars) marking this in case you missed it http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc@654 PS5, Line 654: if (membership_snapshot->local_be_desc.get() == nullptr) { : return Status("Local backend has not yet started"); : } can this happen? as you can only submit queries once the impala server is running and accepting new connections, right? Maybe just replace this with a DCheck? -- To view, visit http://gerrit.cloudera.org:8080/13207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3 Gerrit-Change-Number: 13207 Gerrit-PatchSet: 5 Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sat, 04 May 2019 00:50:34 +0000 Gerrit-HasComments: Yes
