Andrew Wong has posted comments on this change.

Change subject: disk failure: coordinate error handling
......................................................................


Patch Set 16:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 199: 
> Since this is a raw pointer, you should say something about lifespans here.
Done


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

Line 32: typedef Callback<void(const std::string&)> ErrorNotificationCb;
> Hmm, why must this be defined here and in fs_manager.h? Surely we could def
Done, keeping it here and putting error_manager.h in fs_manager.h.


Line 36: // multiple layers, many of which we prefer to keep separate. The 
FsErrorManager
> These macros probably don't belong in this patch; I don't see them used at 
Done


PS15, Line 62: 
> What does "blocks" mean here? If this is ReadableBlock/WritableBlock, there
Done


Line 66: }  // namespace kudu
> Nit: separate on each line.
Done


Line 82
> This is a pretty meaty method; why not move it to a .cc file?
Moved all the logic out of the error manager.


Line 100
> Should this be a CHECK/DCHECK? Under what circumstances would you expect th
You're right that this should never happen. Done


PS15, Line 106: 
              : 
              : 
> This is too much cross-layer mingling. Reword it so it's scoped to only how
Done


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

Line 388:   FileReadableBlock(const FileBlockManager* block_manager, BlockId 
block_id,
> Why did you have to change this? And below?
For a future patch, it was needed to set the DataDirManager in the 
FsErrorManager. This isn't a requirement anymore so I'll revert it.


Line 544:                                            "file_block_manager",
> I don't particularly care for this. Why not hoist the DataDirManager out of
I've restructured this a bit so the error manager doesn't need to know about 
any external state other than the callbacks.

I do intend on moving the DataDirManager out of the BlockManager (and also 
likely renaming it to DirectoryManager or something), but in another patch.


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

Line 106
> FWIW, this is a style that I employ, reminiscent of Javadoc, where the logi
I see, I've seen this style around, just not with such short sections. Reverting


Line 98: 
> Declare this virtual in BlockManager and override it here. Same for log_blo
Done


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 49: 
> I've only finished reviewing this patch (and not the rest) so I don't even 
The key, I think, was pushing all of the logic into the clearly-defined layers 
(e.g. fail the tablets in the TSTabletManager instead of here).

Doing so also opens it up for a bit more flexibility (using a string instead of 
set<string> as an input to this cb).


Line 49: 
> did you consider a design where the callback just notifies the failure by U
Done


Line 124:   // This callback will be run when block operations run into disk 
failures.
> Should also explain what this means. When will this callback be invoked? Wh
Done


PS15, Line 125: e input to the callback
> think this would be better SetErrorNotificationCallback to match the type
Done


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 365: 
> Add a little comment here.
Done


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS15, Line 164:   // Search for tablets in the metadata dir.
              :   vector<string> tablet_ids;
> it seems a bit messy to have this being "wired" from the constructor. usual
Done, moved to the wiring to TabletServer Start()/Shutdown()


PS15, Line 175:   int loaded_count = 0;
              :   for (const string& tablet_id : tablet_ids) {
> same, seems a little not-quite-right here
Done


Line 611:   TabletDataState data_state = 
replica->tablet_metadata()->tablet_data_state();
> Did you intend for this decomposition to be in this patch? Since there's st
Hrm, yeah originally there was a usage that was very specific to disk failure 
handling that's in a separate patch. Will revert.


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

PS15, Line 132:                     scoped_refptr<tablet::TabletReplica>* 
replica) const;
              : 
              :   // Same as LookupTablet but doesn't acquired the shared lock.
              :   bool LookupTabletUnlocked(const std::string& tablet_id,
              :                             
scoped_refptr<tablet::TabletReplica>* replica) const;
              : 
              :   virtual Status GetTabletReplica(const std::string& tablet_id,
> should this be private?
Done


PS15, Line 197: 
> the prototype indicates it operates on a bunch of replicas at once
Restructured the calls.


-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to