Todd Lipcon has posted comments on this change.

Change subject: KUDU-236 (part 1). Implement tablet history GC
......................................................................


Patch Set 19:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

Line 66:   return { HistoryGcOpts::GC_DISABLED, Timestamp::kInvalidTimestamp };
is there special handling for kInvalidTimestamp? I think kInvalidTimestamp 
compares _higher_ than all other timestamps


Line 217: 
nit: no need for blank lines


Line 387:       vector<rowid_t> gced_input_rows;
unused


Line 585:   HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, 
Timestamp(0) };
use NoHistoryGC()?


Line 639:   HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, 
Timestamp(0) };
same


http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 638:     DVLOG(2) << "Ancient history mark: " << 
history_gc_opts.ancient_history_mark.ToString()
given this is a per-row log message, i think DVLOG(3) or DVLOG(4) is probably 
better. otherwise -v=2 on a debug build would be super noisy


Line 647:           prev_undo->set_next(nullptr);
hrm, the comment on line 629-631 is no longer correct, then. (we were 
const_casting away the constness so we could create new undos which point _to_ 
the passed-in undos, but here you're actually modifying the passed-in undos).

I'm not 100% sure this is safe (at least it's unexpected).

It seems like this particular block doesn't really share much code/variables 
with the rest of the function. Maybe it can be broken out into a new method 
which takes a non-const CompactionInputRow*?


Line 800:   rowid_t input_row_offset = -1;
is this variable necessary? seems like it's only used in the DVLOG below, but 
that DVLOG itself seems somewhat redundant with the longer DVLOG on line 838-844


Line 838:       DVLOG(2) << "Output Row: " << 
dst_row.schema()->DebugRow(dst_row) <<
bump to a higher DVLOG level since this is a big message and every row


PS19, Line 863:       DVLOG(2) << "Output Row: " << 
dst_row.schema()->DebugRow(dst_row)
              :                << "; RowId: " << index_in_current_drs;
this vlog is redundant with the new one on 838-844


PS19, Line 954: history_gc_opts.enabled == HistoryGcOpts::GC_ENABLED &&
              :               
mut->timestamp().ComesBefore(history_gc_opts.ancient_history_mark
this condition shows up a lot. what about adding a method like bool 
HistoryGCOpts::ShouldGC(Timestamp t) {...} to encapsulate it?


Line 956:             DVLOG(2) << "Skipping GCed input row: " << 
schema->DebugRow(row.row)
this log message isn't accurate since it could still be reinserted later. 
perhaps needs to move down to line 1039 or so?


http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction.h
File src/kudu/tablet/compaction.h:

PS19, Line 38:  GC_ENABLED,
             :     GC_DISABLED
reorder these and make GC_DISABLED have the value 0 so that if this is 
accidentally implicitly casted to a bool, you get a reasonable result

Alternatively, you could make this more of a class, with some static factory 
method like HistoryGcOpts::Disabled() and HistoryGcOpts::Enabled(Timestamp 
ahm), along with the 'ShouldGC()' that is suggested elsewhere


Line 46:   // A timestamp prior to which no history will be preserved.
add "Ignored if enabled != GC_ENABLED"


Line 155: // 'is_history_truncated': Set to true if this row had a "reinsert 
after
update (it's an int, not a bool)


http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet-test.cc
File src/kudu/tablet/tablet-test.cc:

PS19, Line 564:   // TODO: Should we be reopening the tablet in a different way 
to persist the
              :   // clock / timestamps?
does this need to be addressed here?


http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

Line 704:   *ancient_history_mark = 
HybridClock::AddPhysicalTimeToTimestamp(clock_->Now(), neg_hist_age);
slightly worried about underflow here if the user says "I want to keep all 
history, I guess I'll set the age to max_int32". Is that plausible? Perhaps 
AddPhysicalTimeToTimestamp should have 'saturating' behavior? or do we have 
enough bits here that it's not possible?


http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet_history_gc-test.cc
File src/kudu/tablet/tablet_history_gc-test.cc:

PS19, Line 193: // Test that no UNDO DELETE is generated after deleting a row 
from the MRS and
              : // waiting for AHM to pass, then flushing the MRS.
              : T
this test actually tests two separate cases. can you expand this comment to 
cover both?


PS19, Line 243: represents the insert. In the UNDO
              : // representation, this insert looks like  DELETE
I think this description is confusing. the UNDO *undoes* the insert, doesn't 
"represent it".


PS19, Line 425:  7,
some commentary at the top of what these special row offsets are would be nice 
(maybe use constants?)


http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS19, Line 1512: prohibits with 
missing word


PS19, Line 1534: C
should be a lower case c, since it comes after a ':' in the formatted message


-- 
To view, visit http://gerrit.cloudera.org:8080/3076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to