Todd Lipcon has posted comments on this change. Change subject: disk failure: coordinate error handling ......................................................................
Patch Set 15: (6 comments) http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: Line 49: typedef Callback<void(const std::set<std::string>&)> ErrorNotificationCallback; did you consider a design where the callback just notifies the failure by UUID, and then the implementer of the callback has to call fs_manager->GetTabletsOnDiskId(uuid) to get the tablet list? It seems slightly more straight-forward to me and also looser coupling, since it moves more of the behavior of what to do on disk failures out of the fs/ layer and into the tserver/ or tablet/ layers. I think this could also remove the need for ErrorManager to point back into DataDirManager, eliminating a possible cycle. I haven't drawn out a collaboration diagram, but seems like: FsManager owns a DataDirManager FsManager owns an ErrorManager ErrorManager references a DataDirManager so we have a sort of triangle here which is a little smelly. Curious for Adar's opinion on this too PS15, Line 125: SetNotificationCallback think this would be better SetErrorNotificationCallback to match the type http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS15, Line 164: fs_manager_->SetNotificationCallback(Bind(&TSTabletManager::ShutdownTabletReplicasAsync, : Unretained(this))); it seems a bit messy to have this being "wired" from the constructor. usually it's less error-prone to have the components wired by whoever creates them, rather than "wiring themselves" during construction. Put another way, I wouldn't expect that creating a TSTabletManager on top of a given FSManager has side effects on the FSManager PS15, Line 175: // Since the FsManager is an external entity, revoke its access. : fs_manager_->SetNotificationCallback(Callback<void(const set<string>&)>()); same, seems a little not-quite-right here http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: PS15, Line 132: // Transitions the tablet state specified by 'tablet_id' with the specified : // 'reason'. : Status TransitionTabletState(const string& tablet_id, : const string& reason, : scoped_refptr<tablet::TabletReplica>* replica, : scoped_refptr<TransitionInProgressDeleter>* deleter, : boost::optional<TabletServerErrorPB::Code>* error_code); should this be private? PS15, Line 197: a tablet replica the prototype indicates it operates on a bunch of replicas at once -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
