Todd Lipcon has posted comments on this change.

Change subject: disk failure: coordinate error handling
......................................................................


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 49: typedef Callback<void(const std::set<std::string>&)> 
ErrorNotificationCallback;
did you consider a design where the callback just notifies the failure by UUID, 
and then the implementer of the callback has to call 
fs_manager->GetTabletsOnDiskId(uuid) to get the tablet list?

It seems slightly more straight-forward to me and also looser coupling, since 
it moves more of the behavior of what to do on disk failures out of the fs/ 
layer and into the tserver/ or tablet/ layers.

I think this could also remove the need for ErrorManager to point back into 
DataDirManager, eliminating a possible cycle. I haven't drawn out a 
collaboration diagram, but seems like:

FsManager owns a DataDirManager
FsManager owns an ErrorManager
ErrorManager references a DataDirManager

so we have a sort of triangle here which is a little smelly.

Curious for Adar's opinion on this too


PS15, Line 125: SetNotificationCallback
think this would be better SetErrorNotificationCallback to match the type


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS15, Line 164:   
fs_manager_->SetNotificationCallback(Bind(&TSTabletManager::ShutdownTabletReplicasAsync,
              :                                             Unretained(this)));
it seems a bit messy to have this being "wired" from the constructor. usually 
it's less error-prone to have the components wired by whoever creates them, 
rather than "wiring themselves" during construction.

Put another way, I wouldn't expect that creating a TSTabletManager on top of a 
given FSManager has side effects on the FSManager


PS15, Line 175:   // Since the FsManager is an external entity, revoke its 
access.
              :   fs_manager_->SetNotificationCallback(Callback<void(const 
set<string>&)>());
same, seems a little not-quite-right here


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

PS15, Line 132:   // Transitions the tablet state specified by 'tablet_id' with 
the specified
              :   // 'reason'.
              :   Status TransitionTabletState(const string& tablet_id,
              :                                const string& reason,
              :                                
scoped_refptr<tablet::TabletReplica>* replica,
              :                                
scoped_refptr<TransitionInProgressDeleter>* deleter,
              :                                
boost::optional<TabletServerErrorPB::Code>* error_code);
should this be private?


PS15, Line 197:  a tablet replica
the prototype indicates it operates on a bunch of replicas at once


-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to