Todd Lipcon has posted comments on this change. Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files ......................................................................
Patch Set 12: (17 comments) Any chance you've looked at coverage of the new code paths? I suggested one test addition that I think will be helpful for verifying the 'non-disjoint histories' case which I'm not seeing a lot of coverage for. Also would be great to loop some of the randomized/stress tests with this change (eg the mt-tablet-test delete/reinsert test) http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/common/row_changelist.h File src/kudu/common/row_changelist.h: Line 349 hmm, why remove these assertions? they've been quite useful in catching bugs in the past in the various stress tests where we'd accidentally end up mis-ordering deltas and get two DELETEs in a row PS12, Line 309: updated I guess this function is no longer meant to be update-specific, given you removed the DCHECK? update the comment? Perhaps there should still be a DCHECK that the type is update/reinsert? http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction-test.cc File src/kudu/tablet/compaction-test.cc: Line 528: TEST_F(TestCompaction, TestDuplicatedGhostRowsMerging) { I think it's worth extending this test or adding a new one which does something like: - create 10 rowsets, each with a disjoint history of insert/update/delete - while there is more than one rowset, merge a random set of 3 rowsets - once you've fully compacted down, verify the whole history is still there This will trigger the case of interleaved histories that isn't necessarily triggered in the case you have here. http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 90: row_block_.reset(new RowBlock(iter_->schema(), num_in_block, &arena_)); hm, was this a bug? Line 255: // Compares two duplicate rows. At first before reading the code, I expected there was a precondition here that we've already processed/collapsed the REDO list such that redo_head on both rows is either a single DELETE or nullptr. but, looking at the code I see that's not the case. Would be good to be explicit about that. PS12, Line 256: 1 if left is more recent than right this phrasing is a bit odd. Is it equivalent to say: Returns -1 if 'left' is less recent than 'right', 1 otherwise. Since that's the normal way in which comparison functions are phrased? Line 261: DCHECK(right.redo_head != nullptr); for the purposes of this DCHECK, seems like it's worth actually doing AdvanceToLastInList() here to make sure the _last_ REDO is a DELETE. PS12, Line 284: : // In case the histories aren't disjoint it must be because one row was deleted in the : // the same operation that inserted the other one, which was then mutated. can you give the specific scenario here? iirc it's: - row exists in DRS 1 - a single operation does 'DELETE row ; INSERT row ; UPDATE row' so this results in DRS 1 with a REDO DELETE at the same time as the MRS (and its resulting flush) has a REDO UPDATE PS12, Line 295: it' its PS12, Line 453: The newest one will point to the oldest. this sounds like you're talking about a pre-existing state, but really you are saying that we will set the newer one to point to the older one Line 589: // Copy previous versions recursively. is this recursion inherently depth-limited by the number of input rowsets? want to make sure it doesn't risk a stack overflow Line 667: void MergeUndoHistories(Mutation* left, Mutation* right, Mutation** head) { doc Line 679: MergeUndoHistories(left->next(), right, head); doing this recursively seems dangerous, since it's bounded only by the number of mutations, and not by the number of input rowsets. The mutation history is user-controlled so seems like a plausible crash. I'd think this can be converted to an iterative algorithm. PS12, Line 698: includive typo Line 988: // TODO(dralves) Make Reinserts set defaults on the dest row. yea, this seems like a potentially serious issue. Mind filing a critical JIRA against 1.2 just so we don't lose track? http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.h File src/kudu/tablet/compaction.h: PS12, Line 167: multiple versions I think the term 'version' here is a bit confusing, because even without the delete/reinsert case, one might consider an updated row to have multiple "versions" when updated. Mind switching to use another term, like 'histories' perhaps? or multiple lifetimes/lives? multiple existences? http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: Line 687: // Visitor which applies deletes to the selection vector. we should rename this visitor to LivenessVisitor or DeleteReinsertVisitor or somesuch and update the docs accordingly -- To view, visit http://gerrit.cloudera.org:8080/4995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes