Adar Dembo has posted comments on this change.

Change subject: disk failure: coordinate error handling
......................................................................


Patch Set 15:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 199:   // Manager for coordinating error-handling.
Since this is a raw pointer, you should say something about lifespans here. I 
bet the FsErrorManager has to outlive the BlockManager, right?

Actually, since this isn't optional, I'd add it as a standalone argument to the 
FBM/LBM constructors instead.


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

Line 32: typedef Callback<void(const std::set<std::string>&)> 
ErrorNotificationCallback;
Hmm, why must this be defined here and in fs_manager.h? Surely we could define 
it in the "dependent" header and have the "depending" header include it?


Line 36: // Evaluates the expression and handles it if it results in an error.
These macros probably don't belong in this patch; I don't see them used at all.


PS15, Line 62: blocks
What does "blocks" mean here? If this is ReadableBlock/WritableBlock, there's 
not enough context to understand why they'd need to call the callback.


Line 66:   FsErrorManager() : dd_manager_(nullptr), 
notify_cb_(Bind(&DoNothingErrorNotification)) {}
Nit: separate on each line.


Line 82:   void FailTabletsInDataDir(DataDir* dir) {
This is a pretty meaty method; why not move it to a .cc file?


Line 100:     LOG(ERROR) << strings::Substitute("Dir $0 not tracked by 
DataDirManager", dir->dir());
Should this be a CHECK/DCHECK? Under what circumstances would you expect this 
to fire in production?


PS15, Line 106:   // Callback that fails the TSTabletManager's tablet peers.
              :   // The referenced TSTabletManager may be deleted before the 
FsErrorManager,
              :   // so this may be null.
This is too much cross-layer mingling. Reword it so it's scoped to only how 
FsErrorManager interacts with it, and how registrants should behave when it's 
called.


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

Line 388:   FileReadableBlock(FileBlockManager* block_manager, BlockId block_id,
Why did you have to change this? And below?


Line 544:   error_manager_->SetDataDirManager(&dd_manager_);
I don't particularly care for this. Why not hoist the DataDirManager out of the 
block managers and let it be owned by the FsManager? At that point the 
FsManager creates the BlockManager, DataDirManager, and FsErrorManager, and it 
can order construction so that everyone gets the right pointer. Probably 
DataDirManager first, then FsErrorManager, then BlockManager?


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/file_block_manager.h
File src/kudu/fs/file_block_manager.h:

Line 106
FWIW, this is a style that I employ, reminiscent of Javadoc, where the logical 
sections of the function's description are separated by empty lines to 
emphasize their differences.


Line 98:   FsErrorManager* error_manager() { return error_manager_; }
Declare this virtual in BlockManager and override it here. Same for 
log_block_manager.h.


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

Line 49: typedef Callback<void(const std::set<std::string>&)> 
ErrorNotificationCallback;
> did you consider a design where the callback just notifies the failure by U
I've only finished reviewing this patch (and not the rest) so I don't even know 
what set<string> means (could use a comment!). But I agree with Todd that fewer 
cross-references between these classes would probably lead to a clearer 
implementation.


Line 124:   // Registers a callback with the FsErrorManager.
Should also explain what this means. When will this callback be invoked? When 
invoked, what's the significance of its argument? Is it important to unregister 
it?


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 365:   FsErrorManager* error_manager_;
Add a little comment here.


http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 611: Status TSTabletManager::TransitionTabletState(
Did you intend for this decomposition to be in this patch? Since there's still 
just one call into TransitionTabletState(), it's not clear why bother to 
decompose here. Todd was sort of getting at the same thing when he asked why 
this is public; presumably its for a different patch?


-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to