Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9452 )

Change subject: KUDU-2293 fix cleanup after failed copies
......................................................................


Patch Set 10:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.h
File src/kudu/tserver/tablet_copy_client.h:

http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.h@155
PS10, Line 155:     kFinishing,
> If we're going to add states we should add tests to verify their functional
Ack


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@213
PS10, Line 213:
> Based on the latest invariant change should we add the following here?
Hmm good question. I suppose the argument for not including it here is that no 
changes are being made to the metadata, and therefore there's nothing to role 
back. That argument assumes that all the changes being done to meta_ are done 
through this class, which seems true.


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@325
PS10, Line 325:     state_ = kStarting;
> Maybe we should move this to line 213 per my comment above?
Responded up there, but to add another point, that would mean that any time 
between calling SetTabletToReplace() and here (which basically means any 
potential point of failure in Start()) will trigger the cleanup in Abort(), and 
right now, Abort() entails loading superblock_ and flushing it.

Also note that only at L320 do we set superblock_'s last logged opid, and 
flushing the superblock without keeping the last logged opid seems like an 
error.


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@446
PS10, Line 446: we call DeleteTabletData()
> ok but see below on my question about WARN_NOT_OK
Ack


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@455
PS10, Line 455: WARN_NOT_OK
> Why WARN_NOT_OK and not RETURN_NOT_OK? Or should failure here be fatal, sin
Hmm. I wouldn't be opposed to making this a CHECK. I made it WARN_NOT_OK() 
because:
1. it guarantees we get to DeleteTabletData, as you pointed out
2. it doesn't seem like it should be a blocker to getting to DeleteTabletData, 
even if it does throw an error. I suppose it could maybe indicate a programming 
error in how we load from PB?


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/tablet_copy_client.cc@458
PS10, Line 458:   fs_manager_->dd_manager()->DeleteDataDirGroup(tablet_id_);
> hmm, should we make this part of TSTabletManager::DeleteTabletData() ?
Ah, good callout. This was kind of hacky and more in-line with the invariant 
that we should try to keep on-disk and in-memory consistent. If you take a look 
at TabletMetadata::DeleteTabletData(), we'll revert the deletion of a data dir 
group on error to uphold that invariant (although we clearly don't do a 
complete job in that function because the in-memory state is still left 
deleted). Maybe we should _not_ revert that group deletion, in which case this 
wouldn't be necessary.


http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9452/10/src/kudu/tserver/ts_tablet_manager.cc@607
PS10, Line 607:           Status s = DeleteTabletData(meta, cmeta_manager_, 
TABLET_DATA_TOMBSTONED,
> Interesting. I guess this is the alternative to what you're done in tablet_
Yeah, I wouldn't be opposed to removing this; wasn't sure how to best remove 
the LOG(FATAL). E.g. is it sufficient to just treat it as tombstoned and pas 
through here?

Although I think I might be missing something. Where are we reverting 
TABLET_DATA_TOMBSTONED back to READY?



--
To view, visit http://gerrit.cloudera.org:8080/9452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04
Gerrit-Change-Number: 9452
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 07 Apr 2018 01:29:08 +0000
Gerrit-HasComments: Yes

Reply via email to