Todd Lipcon has posted comments on this change. Change subject: disk failure: coordinate error handling ......................................................................
Patch Set 8: (12 comments) http://gerrit.cloudera.org:8080/#/c/7029/8/src/kudu/fs/error_manager.h File src/kudu/fs/error_manager.h: PS8, Line 60: case EIO: // Fallthrough intended : case ENODEV: // Fallthrough intended : case ENOSPC: // Fallthrough intended : case ENXIO: // Fallthrough intended : case EROFS: // Fallthrough intended hm, I dont think "fallthrough intended" is really necessary here. Usually such a comment would be used when there is actual code inside of a 'case' label and then falls through to the below one, like: switch (foo) { case A: DoSomething(); FALLTHROUGH_INTENDED; case B: DoSomethingElse(); break; } (see gutil/macros.h for the above macro and example usage -- it actually has some effect on clang warnings as well) PS8, Line 74: that blocks known to the : // BlockManager can call. not sure what this means PS8, Line 80: Callback<void(const std::set<std::string>&)>* cb maybe use a typedef for this? Line 81: shutdown_replicas_cb_ = cb; std::move(cb)? Line 125: Callback<void(const std::set<std::string>&)>* shutdown_replicas_cb_; nit: I think it's better to keep the specifics of what the callback _does_ out of this class. i.e from this class's perspectives, it's not necessary that it is going to shutdown replicas. Instead maybe just something like notify_cb_? http://gerrit.cloudera.org:8080/#/c/7029/8/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: PS8, Line 121: * why is this taking a pointer to a callback instead of just taking a callback? Line 214: void SetTabletFailedCallback(Callback<void(std::set<std::string>)>* cb); same http://gerrit.cloudera.org:8080/#/c/7029/8/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 684: if (delete_tablet_data) { not quite following the purpose of this parameter. In the case that delete_tablet_data is false, then how much of the rest of this function is still relevant? eg opt_last_logged_opid won't be used, the above CAS-related stuff isn't used, right? Maybe refactoring the function so the common bits can be reused, but introducing a new function like MarkFailedAndShutdown(tablet_id) would be clearer? Line 804: // Disk failures should have been handled. is there a DCHECK we could add here that the replica is already marked failed if not? http://gerrit.cloudera.org:8080/#/c/7029/8/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 131: bool delete_tablet_data = true); can you update the doc as to why you would want to DeleteTablet without deleting the data? PS8, Line 190: // Asynchronously shut down a tablet replica by deleting the tablet. : // TODO(awong): rather than deleting the tablet, don't do anything to the : // tablet metadata and just store disk failure state instead. : void FailTabletReplicas(const std::set<std::string>& tablet_ids); the docs don't seem to match the name here. Is this shutting it down or trying to delete on-disk data, etc? Line 294: Callback<void(const std::set<std::string>&)> shutdown_replicas_cb_; I think if you made the other classes take the callback by value instead of by pointer, you wouldn't need this member -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
