[kudu-CR] KUDU-236. Implement tablet history GC
Mike Percy has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 12: (6 comments) http://gerrit.cloudera.org:8080/#/c/3076/12/src/kudu/integration-tests/tablet_history_gc-itest.cc File src/kudu/integration-tests/tablet_history_gc-itest.cc: Line 30: TEST_F(TabletHistoryGcItest, TestDeleteEmptyTable) { > I don't follow the name of this test. A comment explaining its intent would Ah, it was copypasta. I fixed it in a follow up commit (the randomized test CR) but now I've backported it to this commit. Line 33: // Insert some rows. > why do we need any rows for this test? I suppose we don't. Fixed. http://gerrit.cloudera.org:8080/#/c/3076/7/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 1017: // TODO: So are we safe? > yea, I think a test where you make sure that only alternating rows fall beh Done http://gerrit.cloudera.org:8080/#/c/3076/12/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 1017: // TODO: So are we safe? > Actually I'm worried about this... I think if you combine a "missed update" You were right and this was a problem after all. Added a test in the next rev of this CR called TestGcWithConcurrentCompaction that handles this case and I addressed the problem by plumbing a list of GCed input row offsets into ReupdateMissedDeltas(). I also added reupdating missed deltas into the randomized test that follows this patch. http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: Line 201: NO_FATALS(InsertOriginalRows(kNumRowSets, kRowsPerRowSet)); > can you add a test similar to TabletTest/TestGhostRowsOnDiskRowSets which e Done: TestGhostRowsNotRevived Line 262: // Mutate all of the rows. > as mentioned elsewhere, would be good to have a test which does major delta Done: TestMajorDeltaCompactionOnSubsetOfColumns -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Todd Lipcon has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/3076/12/src/kudu/integration-tests/tablet_history_gc-itest.cc File src/kudu/integration-tests/tablet_history_gc-itest.cc: Line 30: TEST_F(TabletHistoryGcItest, TestDeleteEmptyTable) { I don't follow the name of this test. A comment explaining its intent would be nice too. Line 33: // Insert some rows. why do we need any rows for this test? http://gerrit.cloudera.org:8080/#/c/3076/12/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 1017: // TODO: So are we safe? Actually I'm worried about this... I think if you combine a "missed update" with a GCed row, the row_idxes will get offset and you'll end up re-updating the wrong row. Check out the Test*WithConcurrentMutation tests in tablet-test.cc -- I think we need some kind of coverage of that in coordination with GCed rows. -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#12). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction * Test for opening a scanner at TS < AHM and rejecting that scanner at the RPC level. Missing tests: * Test for ghost rows in multiple RS reinsert case * Test for partial history GC (alternating rows) * Test for major delta compaction on a subset of columns * Randomized test Missing features (slated for future commit): * GC history on delta flush * Background task that schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 24 files changed, 729 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/12 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-236. Implement tablet history GC
Kudu Jenkins has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 12: Build Started http://104.196.14.100/job/kudu-gerrit/2885/ -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Mike Percy has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/3076/11/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: > Why are these tests gated on the log block manager? What is it about GCing actually it just requires HybridTime. We should just rename the macro. -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Adar Dembo has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 11: Could you also add a GCing thread to mt-tablet-test? It's a good way to test interactions between concurrent tablet operations. -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Adar Dembo has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 11: (1 comment) Just passing through with a question. http://gerrit.cloudera.org:8080/#/c/3076/11/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: Why are these tests gated on the log block manager? What is it about GCing history that is incompatible with the file block manager? -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Kudu Jenkins has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 11: Build Started http://104.196.14.100/job/kudu-gerrit/2877/ -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#11). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction * Test for opening a scanner at TS < AHM and rejecting that scanner at the RPC level. Missing tests: * Test for ghost rows in multiple RS reinsert case * Test for partial history GC (alternating rows) * Test for major delta compaction on a subset of columns * Randomized test Missing features (slated for future commit): * GC history on delta flush * Background task that schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 24 files changed, 732 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/11 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-236. Implement tablet history GC
Mike Percy has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 704: // mutations that have only been partially applied? > yea, this is a bit tricky - I think it's OK because of the implementation o I spent a good while reading through the delta flush code path and my conclusion is that this is always safe because: 1. Scanners will be denied on timestamps < the AHM 2. Existing scanners hold the delta blocks open which allows them to maintain a consistent snapshot. I'll add a randomized test that provides coverage to verify that #2 holds true in the presence of history GC. -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Kudu Jenkins has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 10: Build Started http://104.196.14.100/job/kudu-gerrit/2868/ -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#10). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction * Test for opening a scanner at TS < AHM and rejecting that scanner at the RPC level. Missing tests: * Test for ghost rows in multiple RS reinsert case * Test for partial history GC (alternating rows) * Test for major delta compaction on a subset of columns * Randomized test Missing features (slated for future commit): * GC history on delta flush * Background task that schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 24 files changed, 732 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/10 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-236. Implement tablet history GC
Kudu Jenkins has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 9: Build Started http://104.196.14.100/job/kudu-gerrit/2853/ -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#9). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction * Test for opening a scanner at TS < AHM and rejecting that scanner at the RPC level. Missing features (slated for future commit): * No dedicated background task that only schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 24 files changed, 726 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/9 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-236. Implement tablet history GC
Kudu Jenkins has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 8: Build Started http://104.196.14.100/job/kudu-gerrit/2850/ -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#8). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction * Test for opening a scanner at TS < AHM and rejecting that scanner at the RPC level. Missing features (slated for future commit): * No dedicated background task that only schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 24 files changed, 726 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/8 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-236. Implement tablet history GC
Mike Percy has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 6: (11 comments) Posting partially addressed feedback except for 2 comments in compaction.cc and 2 comments in tablet_history_gc-test.cc http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 835: num_rows_history_truncated += is_history_truncated; > we lost the nice WARNING we used to have for this case which actually inclu Fixed http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/compaction.h File src/kudu/tablet/compaction.h: Line 42: HistoryGcOpts(GcHistoryEnabled is_enabled, Timestamp ahm) > I don't think this ctor buys much over just using brace initialization like Done http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: Line 150: VLOG(2) << "Input Row: " << dst_row.schema()->DebugRow(dst_row) << > this should probably be a much higher VLOG level (and maybe DVLOG) Done Line 191: //TODO: Why keep the delete mutations? > because major delta compaction doesn't change row IDs, and thus you can't a Removed comment http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet-harness.h File src/kudu/tablet/tablet-harness.h: Line 112: clock_ = make_scoped_refptr(new server::HybridClock()); > do you need make_scoped_refptr here? given clock_ is a scoped_refptr I thin Changed to reset() which is syntax that makes it more clear that we're giving ownership to a smart ptr. http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 111: DEFINE_int32(tablet_history_max_age_sec, 604800, > can you express this as 60 * 60 * 24 * 7 so it's obvious it's 7 days? I'll just set it to -1. Line 114: "To disable history removal, set to -1."); > I think we should probably disable this for the first commit, until we can Done http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: PS6, Line 73: ws > nit: capitalize RowSet Done PS6, Line 118: Major delta compaction will not remove UNDOs, so we cannot read : // from before the initial insert of the data and expect to see base data. > not following this part of the comment -- you haven't run the MajorDeltaCom Done Line 134: // equal to 2. > I guess the comment above really applies here: you're saying that if you re > I guess the comment above really applies here: you're saying that if you read > from prior to the insertion, you'll still see the rows disappear because the > UNDOs weren't removed, right? Done > One suggestion to make these tests slightly easier to follow is to use > RowSet::DebugDump() which gives you a nice printout per row that you can > assert on. (though you'll have to set the hybrid timestamp to a constant > instead of Now()) Spent some time implementing this idea, got some decent results with regex matches. PS6, Line 169: No trace should remain after this > add an assertion that it actually didn't even generate a DRS Done -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#6). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction TODO: Tests still needed: * Try to generate empty files (UNDO, REDO) * Test for opening scanner at TS < AHM and rejecting that scanner at the RPC level. Missing features (slated for future commit): * No dedicated background task that only schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 22 files changed, 649 insertions(+), 136 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/6 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-236. Implement tablet history GC
Todd Lipcon has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/3076/7/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 1017: // TODO: So are we safe? yea, I think a test where you make sure that only alternating rows fall behind the AHM would be good. Most of the tests seem to fully remove or keep an entire DRS worth, so if the row_idx calculation is off, it won't catch it -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Todd Lipcon has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 704: // mutations that have only been partially applied? yea, this is a bit tricky - I think it's OK because of the implementation of MajorDeltaCompaction::FlushRowSetAndDeltas() which actually ignores the output 'redo_head' and re-creates its own new redos. A test for this seems like quite a good idea though! Line 835: num_rows_history_truncated += is_history_truncated; we lost the nice WARNING we used to have for this case which actually included the row key, etc http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/compaction.h File src/kudu/tablet/compaction.h: Line 42: HistoryGcOpts(GcHistoryEnabled is_enabled, Timestamp ahm) I don't think this ctor buys much over just using brace initialization like { HistoryGCOpts::GC_ENABLED, ahm } http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: Line 150: VLOG(2) << "Input Row: " << dst_row.schema()->DebugRow(dst_row) << this should probably be a much higher VLOG level (and maybe DVLOG) Line 191: //TODO: Why keep the delete mutations? because major delta compaction doesn't change row IDs, and thus you can't actually get rid of something from the base data http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet-harness.h File src/kudu/tablet/tablet-harness.h: Line 112: clock_ = make_scoped_refptr(new server::HybridClock()); do you need make_scoped_refptr here? given clock_ is a scoped_refptr I think it already supports assignment from a raw ptr http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 111: DEFINE_int32(tablet_history_max_age_sec, 604800, can you express this as 60 * 60 * 24 * 7 so it's obvious it's 7 days? Eventually I think a much shorter default would be useful since most people don't need the "query back in time" feature Line 114: "To disable history removal, set to -1."); I think we should probably disable this for the first commit, until we can get a bit of cluster test coverage with heavy update workload, etc http://gerrit.cloudera.org:8080/#/c/3076/6/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: PS6, Line 73: ws nit: capitalize RowSet PS6, Line 118: Major delta compaction will not remove UNDOs, so we cannot read : // from before the initial insert of the data and expect to see base data. not following this part of the comment -- you haven't run the MajorDeltaCompaction yet at this point Line 134: // equal to 2. I guess the comment above really applies here: you're saying that if you read from prior to the insertion, you'll still see the rows disappear because the UNDOs weren't removed, right? One suggestion to make these tests slightly easier to follow is to use RowSet::DebugDump() which gives you a nice printout per row that you can assert on. (though you'll have to set the hybrid timestamp to a constant instead of Now()) PS6, Line 169: No trace should remain after this add an assertion that it actually didn't even generate a DRS Line 201: TEST_F(TabletHistoryGCTest, TestRowRemovalGCOnMergeCompaction) { can you add a test similar to TabletTest/TestGhostRowsOnDiskRowSets which ends up with a row present in multiple DRS but deleted in several of them? Line 262: } as mentioned elsewhere, would be good to have a test which does major delta compaction of a subset of columns, against an update which touched multiple columns, and make sure things dont go wrong -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-236. Implement tablet history GC
Kudu Jenkins has posted comments on this change. Change subject: KUDU-236. Implement tablet history GC .. Patch Set 7: Build Started http://104.196.14.100/job/kudu-gerrit/2833/ -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-236. Implement tablet history GC
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3076 to look at the new patch set (#7). Change subject: KUDU-236. Implement tablet history GC .. KUDU-236. Implement tablet history GC This patch defines the concept of an "ancient history mark" (AHM) and a background job that removes history from disk that represents changes occurring prior to the ancient history mark. There is also a mechanism to reject requests for scans if they attempt to open a snapshot scan at a timestamp prior to the AHM. Included tests: * Test for Redo GC via major delta compaction * Test for history GC via tablet flush * Test for Undo GC via merge compaction * Test for entire-row GC via merge compaction TODO: Tests still needed: * Try to generate empty files (UNDO, REDO) * Test for opening scanner at TS < AHM and rejecting that scanner at the RPC level. Missing features (slated for future commit): * No dedicated background task that only schedules and performs undo GC Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 --- M src/kudu/common/timestamp.h M src/kudu/server/clock.h M src/kudu/server/hybrid_clock.cc M src/kudu/server/hybrid_clock.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/mutation.h M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h A src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tserver/tablet_service.cc 22 files changed, 659 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/3076/7 -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon