Todd Lipcon has posted comments on this change.

Change subject: KUDU-236 (part 1). Implement tablet history GC

Patch Set 19:

File src/kudu/tablet/

Line 66:   return { HistoryGcOpts::GC_DISABLED, Timestamp::kInvalidTimestamp };
is there special handling for kInvalidTimestamp? I think kInvalidTimestamp 
compares _higher_ than all other timestamps

Line 217: 
nit: no need for blank lines

Line 387:       vector<rowid_t> gced_input_rows;

Line 585:   HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, 
Timestamp(0) };
use NoHistoryGC()?

Line 639:   HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, 
Timestamp(0) };
File src/kudu/tablet/

Line 638:     DVLOG(2) << "Ancient history mark: " << 
given this is a per-row log message, i think DVLOG(3) or DVLOG(4) is probably 
better. otherwise -v=2 on a debug build would be super noisy

Line 647:           prev_undo->set_next(nullptr);
hrm, the comment on line 629-631 is no longer correct, then. (we were 
const_casting away the constness so we could create new undos which point _to_ 
the passed-in undos, but here you're actually modifying the passed-in undos).

I'm not 100% sure this is safe (at least it's unexpected).

It seems like this particular block doesn't really share much code/variables 
with the rest of the function. Maybe it can be broken out into a new method 
which takes a non-const CompactionInputRow*?

Line 800:   rowid_t input_row_offset = -1;
is this variable necessary? seems like it's only used in the DVLOG below, but 
that DVLOG itself seems somewhat redundant with the longer DVLOG on line 838-844

Line 838:       DVLOG(2) << "Output Row: " << 
dst_row.schema()->DebugRow(dst_row) <<
bump to a higher DVLOG level since this is a big message and every row

PS19, Line 863:       DVLOG(2) << "Output Row: " << 
              :                << "; RowId: " << index_in_current_drs;
this vlog is redundant with the new one on 838-844

PS19, Line 954: history_gc_opts.enabled == HistoryGcOpts::GC_ENABLED &&
this condition shows up a lot. what about adding a method like bool 
HistoryGCOpts::ShouldGC(Timestamp t) {...} to encapsulate it?

Line 956:             DVLOG(2) << "Skipping GCed input row: " << 
this log message isn't accurate since it could still be reinserted later. 
perhaps needs to move down to line 1039 or so?
File src/kudu/tablet/compaction.h:

PS19, Line 38:  GC_ENABLED,
             :     GC_DISABLED
reorder these and make GC_DISABLED have the value 0 so that if this is 
accidentally implicitly casted to a bool, you get a reasonable result

Alternatively, you could make this more of a class, with some static factory 
method like HistoryGcOpts::Disabled() and HistoryGcOpts::Enabled(Timestamp 
ahm), along with the 'ShouldGC()' that is suggested elsewhere

Line 46:   // A timestamp prior to which no history will be preserved.
add "Ignored if enabled != GC_ENABLED"

Line 155: // 'is_history_truncated': Set to true if this row had a "reinsert 
update (it's an int, not a bool)
File src/kudu/tablet/

PS19, Line 564:   // TODO: Should we be reopening the tablet in a different way 
to persist the
              :   // clock / timestamps?
does this need to be addressed here?
File src/kudu/tablet/

Line 704:   *ancient_history_mark = 
HybridClock::AddPhysicalTimeToTimestamp(clock_->Now(), neg_hist_age);
slightly worried about underflow here if the user says "I want to keep all 
history, I guess I'll set the age to max_int32". Is that plausible? Perhaps 
AddPhysicalTimeToTimestamp should have 'saturating' behavior? or do we have 
enough bits here that it's not possible?
File src/kudu/tablet/

PS19, Line 193: // Test that no UNDO DELETE is generated after deleting a row 
from the MRS and
              : // waiting for AHM to pass, then flushing the MRS.
              : T
this test actually tests two separate cases. can you expand this comment to 
cover both?

PS19, Line 243: represents the insert. In the UNDO
              : // representation, this insert looks like  DELETE
I think this description is confusing. the UNDO *undoes* the insert, doesn't 
"represent it".

PS19, Line 425:  7,
some commentary at the top of what these special row offsets are would be nice 
(maybe use constants?)
File src/kudu/tserver/

PS19, Line 1512: prohibits with 
missing word

PS19, Line 1534: C
should be a lower case c, since it comes after a ':' in the formatted message

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to