Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15145 )

Change subject: KUDU-1625: background op to GC ancient, fully deleted rowsets
......................................................................


Patch Set 4:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/delta_tracker.cc@468
PS4, Line 468:   return newest_redo && newest_redo->Initted() &&
Curious why you're doing this outside the lock. Seems relatively cheap?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/diskrowset.cc@884
PS4, Line 884:   *deleted_and_ancient = 
delta_tracker_->EstimateAllRedosAreAncient(ancient_history_mark);
Probably worth commenting here precisely why the semantics of 
EstimateAllRedosAreAncient (i.e. may return false positives but not false 
negatives) are an appropriate heuristic for this method.


http://gerrit.cloudera.org:8080/#/c/15145/1/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/15145/1/src/kudu/tablet/memrowset.h@400
PS1, Line 400:     return Status::OK();
> Yeah, I suppose I could have called this 'empty_and_ancient_and_gcable' giv
I think it's OK, just maybe add a comment here explaining the subtle 
differences in semantics.


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/mt-tablet-test.cc
File src/kudu/tablet/mt-tablet-test.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/mt-tablet-test.cc@374
PS4, Line 374:     while 
(!running_insert_count_.WaitFor(kBackgroundOpInterval)) {
Might want to structure this loop as the other threads do, because otherwise 
the thread waits for 100 ms from the get go instead of trying to run the op.


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/mt-tablet-test.cc@519
PS4, Line 519:   this->StartThreads(FLAGS_num_deleted_rowset_gc_threads, 
&TestFixture::FlushDeltasThread);
Was this a mistake?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/rowset.h@222
PS4, Line 222: no rows
no live rows?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet-test-base.h
File src/kudu/tablet/tablet-test-base.h:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet-test-base.h@342
PS4, Line 342:   Status DeleteTestRows(int64_t first_row, int64_t count) {
Short doc?

For consistency may want to make this void (and callers would use NO_FATALS).


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.h@342
PS4, Line 342:   // These checks may not touch on-disk block data if we can 
determine from the
Nit: seems like a rather abrupt intro to the function. Maybe start with a brief 
description of how it behaves?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.h@350
PS4, Line 350: If this method returns OK, the number of
             :   // bytes deleted are returned.
Not clear what this means.


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2319
PS4, Line 2319:     return Status::OK();
Should we at least set the number of bytes to 0 on this return path?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2328
PS4, Line 2328:   for (size_t i = 0; i < rowsets.size(); i++) {
              :     const auto& rowset = rowsets[i];
Why not a simple for loop on rowsets?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2345
PS4, Line 2345:   if 
(!Tablet::GetTabletAncientHistoryMark(&ancient_history_mark)) return 
Status::OK();
No VLOG in this case?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2350
PS4, Line 2350: them
Nit: ambiguity here, is 'them' referring to the rowsets, or the locks? I know 
the answer (obviously) but you can reword to avoid the confusion.


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2364
PS4, Line 2364:       
RETURN_NOT_OK(rowset->IsDeletedAndFullyAncient(ancient_history_mark, 
&deleted_and_empty));
This involves CountLiveRows, which can fail on older tablets. But IIRC, the 
failure is per-rowset rather than for the entirety of the tablet. So do we want 
to do something special if we're able to get the live rows for some rowsets but 
not all?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2368
PS4, Line 2368:         std::unique_lock<std::mutex> 
lock(*rowset->compact_flush_lock(), std::try_to_lock);
Should we CHECK that we got the lock? That's guaranteed, right?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet.cc@2381
PS4, Line 2381:   
metrics_->deleted_rowset_gc_duration->Increment((MonoTime::Now() - 
start_time).ToMilliseconds());
Have you had a chance to try this on a real cluster? I'm curious what you've 
observed from this histogram.


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_history_gc-test.cc
File src/kudu/tablet/tablet_history_gc-test.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_history_gc-test.cc@706
PS4, Line 706:     ASSERT_OK(DeleteTestRows(rowset_id * rows_per_rowset_, 
rows_per_rowset_));
Would be clearer if you used DeleteOriginalRows for this too, and parameterized 
the flushing behavior.


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_metrics.cc@259
PS4, Line 259:   "Empty Rowset GC Running",
Here and elsewhere you're still referring to these things as "empty" instead of 
"deleted".


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_mm_ops.cc
File src/kudu/tablet/tablet_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_mm_ops.cc@86
PS4, Line 86: TAG_FLAG(enable_deleted_rowset_gc, unsafe);
The other tags I understand, but why this one?


http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_replica-test.cc
File src/kudu/tablet/tablet_replica-test.cc:

http://gerrit.cloudera.org:8080/#/c/15145/4/src/kudu/tablet/tablet_replica-test.cc@815
PS4, Line 815:   ASSERT_EQ(1, tablet->num_rowsets());
Test here that live_row_count is 0?



--
To view, visit http://gerrit.cloudera.org:8080/15145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I696e2a29ea52ad4e54801b495c322bc371787124
Gerrit-Change-Number: 15145
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Volodymyr Verovkin <[email protected]>
Gerrit-Comment-Date: Wed, 18 Mar 2020 07:21:57 +0000
Gerrit-HasComments: Yes

Reply via email to