Adar Dembo has posted comments on this change. Change subject: disk failure: coordinate error handling ......................................................................
Patch Set 15: (16 comments) http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 199: // Manager for coordinating error-handling. Since this is a raw pointer, you should say something about lifespans here. I bet the FsErrorManager has to outlive the BlockManager, right? Actually, since this isn't optional, I'd add it as a standalone argument to the FBM/LBM constructors instead. http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/error_manager.h File src/kudu/fs/error_manager.h: Line 32: typedef Callback<void(const std::set<std::string>&)> ErrorNotificationCallback; Hmm, why must this be defined here and in fs_manager.h? Surely we could define it in the "dependent" header and have the "depending" header include it? Line 36: // Evaluates the expression and handles it if it results in an error. These macros probably don't belong in this patch; I don't see them used at all. PS15, Line 62: blocks What does "blocks" mean here? If this is ReadableBlock/WritableBlock, there's not enough context to understand why they'd need to call the callback. Line 66: FsErrorManager() : dd_manager_(nullptr), notify_cb_(Bind(&DoNothingErrorNotification)) {} Nit: separate on each line. Line 82: void FailTabletsInDataDir(DataDir* dir) { This is a pretty meaty method; why not move it to a .cc file? Line 100: LOG(ERROR) << strings::Substitute("Dir $0 not tracked by DataDirManager", dir->dir()); Should this be a CHECK/DCHECK? Under what circumstances would you expect this to fire in production? PS15, Line 106: // Callback that fails the TSTabletManager's tablet peers. : // The referenced TSTabletManager may be deleted before the FsErrorManager, : // so this may be null. This is too much cross-layer mingling. Reword it so it's scoped to only how FsErrorManager interacts with it, and how registrants should behave when it's called. http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 388: FileReadableBlock(FileBlockManager* block_manager, BlockId block_id, Why did you have to change this? And below? Line 544: error_manager_->SetDataDirManager(&dd_manager_); I don't particularly care for this. Why not hoist the DataDirManager out of the block managers and let it be owned by the FsManager? At that point the FsManager creates the BlockManager, DataDirManager, and FsErrorManager, and it can order construction so that everyone gets the right pointer. Probably DataDirManager first, then FsErrorManager, then BlockManager? http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/file_block_manager.h File src/kudu/fs/file_block_manager.h: Line 106 FWIW, this is a style that I employ, reminiscent of Javadoc, where the logical sections of the function's description are separated by empty lines to emphasize their differences. Line 98: FsErrorManager* error_manager() { return error_manager_; } Declare this virtual in BlockManager and override it here. Same for log_block_manager.h. http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: Line 49: typedef Callback<void(const std::set<std::string>&)> ErrorNotificationCallback; > did you consider a design where the callback just notifies the failure by U I've only finished reviewing this patch (and not the rest) so I don't even know what set<string> means (could use a comment!). But I agree with Todd that fewer cross-references between these classes would probably lead to a clearer implementation. Line 124: // Registers a callback with the FsErrorManager. Should also explain what this means. When will this callback be invoked? When invoked, what's the significance of its argument? Is it important to unregister it? http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: Line 365: FsErrorManager* error_manager_; Add a little comment here. http://gerrit.cloudera.org:8080/#/c/7029/15/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 611: Status TSTabletManager::TransitionTabletState( Did you intend for this decomposition to be in this patch? Since there's still just one call into TransitionTabletState(), it's not clear why bother to decompose here. Todd was sort of getting at the same thing when he asked why this is public; presumably its for a different patch? -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
