Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9452 )

Change subject: KUDU-2293 fix cleanup after failed copies
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@213
PS10, Line 213:
> Hmm good question. I suppose the argument for not including it here is that
OK. So I our guarantees are the following, which we should document in the 
header file I think:
 - if we have made any changes to disk, we are no longer in kInitialized state 
(we'll be in kStarting or better)
 - if we are in kStarting state or higher then meta_ is guaranteed to be set, 
but meta_ may actually be set while still in kInitialized state


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@325
PS10, Line 325:     state_ = kStarting;
> Responded up there, but to add another point, that would mean that any time
Ah, good catch about the last_logged_opid. OK, let's leave it as-is.


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@455
PS10, Line 455: WARN_NOT_OK
> Hmm. I wouldn't be opposed to making this a CHECK. I made it WARN_NOT_OK()
OK, let's add a comment here to the following effect:

  // We warn instead of return early here upon load failure because even if the 
superblock protobuf was somehow corrupted, we still want to attempt to delete 
the tablet's data dir group and WAL segments.


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@458
PS10, Line 458:   fs_manager_->dd_manager()->DeleteDataDirGroup(tablet_id_);
> Ah, good callout. This was kind of hacky and more in-line with the invarian
OK, so we call TSTabletManager::DeleteTabletData() which calls 
TabletMetadata::DeleteTabletData() which calls DeleteDataDirGroup(tablet_id). 
What I think you're saying is that TabletMetadata::DeleteTabletData() reverts 
deletion of the data dir group if TabletMetadata::Flush() fails, so in order to 
counteract that revert we are preemptively deleting it so that it becomes 
impossible to revert in such a case. A couple questions for you:
1. What happens if we fail to remove that entry?
2. What happens if we remove that entry but not all of the blocks got deleted 
by Flush()?



--
To view, visit http://gerrit.cloudera.org:8080/9452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04
Gerrit-Change-Number: 9452
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 09 Apr 2018 19:33:11 +0000
Gerrit-HasComments: Yes

Reply via email to