Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9452 )

Change subject: KUDU-2293 fix cleanup after failed copies
......................................................................


Patch Set 9:

(7 comments)

This is a valiant effort however due to the complexity maybe you were right all 
along. We should try to clean up as well as we can, no doubt about it. However 
maybe we should also remove the FATAL in ts_tablet_manager.cc since it's such a 
tricky invariant to maintain.

http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.h
File src/kudu/tserver/tablet_copy_client.h:

http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.h@148
PS9, Line 148: kStarting
Since we are still conditioning on meta_, I don't see the purpose of this new 
state.


http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.h@157
PS9, Line 157: kFinishing
What is the purpose of this state? Do we need it?


http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@405
PS9, Line 405:   superblock_->set_tablet_data_state(tablet::TABLET_DATA_READY);
             :
             :   boost::optional<OpId> last_logged_opid = 
superblock_->tombstone_last_logged_opid();
             :   superblock_->clear_tombstone_last_logged_opid();
             :   auto revert_activate_superblock = MakeScopedCleanup([&] {
             :     // If we fail below, revert the updated state so further 
calls to Abort()
             :     // can clean up appropriately.
             :     if (last_logged_opid) {
             :       *superblock_->mutable_tombstone_last_logged_opid() = 
*last_logged_opid;
             :     }
             :     
superblock_->set_tablet_data_state(TabletDataState::TABLET_DATA_COPYING);
             :   });
nit: this is a minor point but I think this would be a little easier to follow 
written like this, by setting up the revert before changing the state:

  LOG_WITH_PREFIX(INFO) << "Tablet Copy complete. Replacing tablet superblock.";
  SetStatusMessage("Replacing tablet superblock");

  boost::optional<OpId> last_logged_opid = 
superblock_->tombstone_last_logged_opid();
  auto revert_activate_superblock = MakeScopedCleanup([&] {
    // If we fail below, revert the updated state so further calls to Abort()
    // can clean up appropriately.
    if (last_logged_opid) {
      *superblock_->mutable_tombstone_last_logged_opid() = *last_logged_opid;
    }
    superblock_->set_tablet_data_state(TabletDataState::TABLET_DATA_COPYING);
  });

  superblock_->clear_tombstone_last_logged_opid();
  superblock_->set_tablet_data_state(tablet::TABLET_DATA_READY);


http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@440
PS9, Line 440: CHECK_EQ(tablet::TABLET_DATA_TOMBSTONED
what guarantees this?


http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@448
PS9, Line 448:   if (!meta_ || state_ == kInitialized || state_ == kFinished) {
Looks like we can move this to the top of the function here since it will never 
overlap with the scoped cleanup above.


http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@454
PS9, Line 454:   DCHECK_EQ(tablet::TABLET_DATA_COPYING, 
superblock_->tablet_data_state());
Let's also put this before the SCOPED_CLEANUP.


http://gerrit.cloudera.org:8080/#/c/9452/9/src/kudu/tserver/tablet_copy_client.cc@455
PS9, Line 455: RETURN_NOT_OK(meta_->LoadFromSuperBlock(*superblock_));
isn't it possible to violate the CHECK_EQ in the SCOPED_CLEANUP if this fails, 
say because we can't load or create a data dir group?

If so, it seems we should remove that CHECK_EQ since those are generally 
intended for programming errors.



--
To view, visit http://gerrit.cloudera.org:8080/9452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04
Gerrit-Change-Number: 9452
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 06 Apr 2018 01:09:49 +0000
Gerrit-HasComments: Yes

Reply via email to