Todd Lipcon has posted comments on this change.

Change subject: KUDU-236. Implement tablet history GC

Patch Set 6:

File src/kudu/tablet/

Line 704:         // mutations that have only been partially applied?
yea, this is a bit tricky - I think it's OK because of the implementation of 
MajorDeltaCompaction::FlushRowSetAndDeltas() which actually ignores the output 
'redo_head' and re-creates its own new redos. A test for this seems like quite 
a good idea though!

Line 835:       num_rows_history_truncated += is_history_truncated;
we lost the nice WARNING we used to have for this case which actually included 
the row key, etc
File src/kudu/tablet/compaction.h:

Line 42:   HistoryGcOpts(GcHistoryEnabled is_enabled, Timestamp ahm)
I don't think this ctor buys much over just using brace initialization like { 
HistoryGCOpts::GC_ENABLED, ahm }
File src/kudu/tablet/

Line 150:       VLOG(2) << "Input Row: " << dst_row.schema()->DebugRow(dst_row) 
this should probably be a much higher VLOG level (and maybe DVLOG)

Line 191:     //    TODO: Why keep the delete mutations?
because major delta compaction doesn't change row IDs, and thus you can't 
actually get rid of something from the base data
File src/kudu/tablet/tablet-harness.h:

Line 112:       clock_ = make_scoped_refptr(new server::HybridClock());
do you need make_scoped_refptr here? given clock_ is a scoped_refptr I think it 
already supports assignment from a raw ptr
File src/kudu/tablet/

Line 111: DEFINE_int32(tablet_history_max_age_sec, 604800,
can you express this as 60 * 60 * 24 * 7 so it's obvious it's 7 days?

Eventually I think a much shorter default would be useful since most people 
don't need the "query back in time" feature

Line 114:              "To disable history removal, set to -1.");
I think we should probably disable this for the first commit, until we can get 
a bit of cluster test coverage with heavy update workload, etc
File src/kudu/tablet/

PS6, Line 73: ws
nit: capitalize RowSet

PS6, Line 118: Major delta compaction will not remove UNDOs, so we cannot read
             :   // from before the initial insert of the data and expect to 
see base data.
not following this part of the comment -- you haven't run the 
MajorDeltaCompaction yet at this point

Line 134:   // equal to 2.
I guess the comment above really applies here: you're saying that if you read 
from prior to the insertion, you'll still see the rows disappear because the 
UNDOs weren't removed, right?

One suggestion to make these tests slightly easier to follow is to use 
RowSet::DebugDump() which gives you a nice printout per row that you can assert 
on. (though you'll have to set the hybrid timestamp to a constant instead of 

PS6, Line 169: No trace should remain after this
add an assertion that it actually didn't even generate a DRS

Line 201: TEST_F(TabletHistoryGCTest, TestRowRemovalGCOnMergeCompaction) {
can you add a test similar to TabletTest/TestGhostRowsOnDiskRowSets which ends 
up with a row present in multiple DRS but deleted in several of them?

Line 262: }
as mentioned elsewhere, would be good to have a test which does major delta 
compaction of a subset of columns, against an update which touched multiple 
columns, and make sure things dont go wrong

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to