Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9452 )
Change subject: KUDU-2293 fix cleanup after failed copies ...................................................................... Patch Set 9: (7 comments) This is a valiant effort however due to the complexity maybe you were right all along. We should try to clean up as well as we can, no doubt about it. However maybe we should also remove the FATAL in ts_tablet_manager.cc since it's such a tricky invariant to maintain. http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.h File src/kudu/tserver/tablet_copy_client.h: http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.h@148 PS9, Line 148: kStarting Since we are still conditioning on meta_, I don't see the purpose of this new state. http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.h@157 PS9, Line 157: kFinishing What is the purpose of this state? Do we need it? http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@405 PS9, Line 405: superblock_->set_tablet_data_state(tablet::TABLET_DATA_READY); : : boost::optional<OpId> last_logged_opid = superblock_->tombstone_last_logged_opid(); : superblock_->clear_tombstone_last_logged_opid(); : auto revert_activate_superblock = MakeScopedCleanup([&] { : // If we fail below, revert the updated state so further calls to Abort() : // can clean up appropriately. : if (last_logged_opid) { : *superblock_->mutable_tombstone_last_logged_opid() = *last_logged_opid; : } : superblock_->set_tablet_data_state(TabletDataState::TABLET_DATA_COPYING); : }); nit: this is a minor point but I think this would be a little easier to follow written like this, by setting up the revert before changing the state: LOG_WITH_PREFIX(INFO) << "Tablet Copy complete. Replacing tablet superblock."; SetStatusMessage("Replacing tablet superblock"); boost::optional<OpId> last_logged_opid = superblock_->tombstone_last_logged_opid(); auto revert_activate_superblock = MakeScopedCleanup([&] { // If we fail below, revert the updated state so further calls to Abort() // can clean up appropriately. if (last_logged_opid) { *superblock_->mutable_tombstone_last_logged_opid() = *last_logged_opid; } superblock_->set_tablet_data_state(TabletDataState::TABLET_DATA_COPYING); }); superblock_->clear_tombstone_last_logged_opid(); superblock_->set_tablet_data_state(tablet::TABLET_DATA_READY); http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@440 PS9, Line 440: CHECK_EQ(tablet::TABLET_DATA_TOMBSTONED what guarantees this? http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@448 PS9, Line 448: if (!meta_ || state_ == kInitialized || state_ == kFinished) { Looks like we can move this to the top of the function here since it will never overlap with the scoped cleanup above. http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@454 PS9, Line 454: DCHECK_EQ(tablet::TABLET_DATA_COPYING, superblock_->tablet_data_state()); Let's also put this before the SCOPED_CLEANUP. http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@455 PS9, Line 455: RETURN_NOT_OK(meta_->LoadFromSuperBlock(*superblock_)); isn't it possible to violate the CHECK_EQ in the SCOPED_CLEANUP if this fails, say because we can't load or create a data dir group? If so, it seems we should remove that CHECK_EQ since those are generally intended for programming errors. -- To view, visit http://gerrit.cloudera.org:8080/9452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04 Gerrit-Change-Number: 9452 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 06 Apr 2018 01:09:49 +0000 Gerrit-HasComments: Yes