Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15145 )
Change subject: KUDU-1625: background op to GC ancient, fully deleted rowsets ...................................................................... KUDU-1625: background op to GC ancient, fully deleted rowsets This adds a background op that deletes disk rowsets that have had all of their rows deleted. If the most recent update to a rowset is older than the ancient history mark, and the rowset contains no live rows, that rowset will be deleted. It'd be nice if we could have the policy work for rowsets that are mostly deleted, but such a solution would come with difficult questions around write amplification and compatibility with the existing compactions strategies. For instance, a more complete solution would need to consider whether to rewrite a rowset if it had 25%, 50%, or 75% deleted rows: some operators wouldn't mind the write amplification to save space. However, picking a good heuristic (or exposing some knobs to turn) makes this tricky. The benefit of the approach in this patch is that no such tradeoff needs to be made: the "write amplification" is minimal here because no new data blocks are written in performing the operation -- the tablet metadata is rewritten to exclude the blocks, and the underlying blocks are deleted, which isn't IO intensive either. There's still room for improvement in this implementation in that, currently, a DMS flush will write stats to disk and we'll only read the stats if we Init() the DeltaFileReader (e.g. on scan). I'll address this in a follow-up patch. Since the op GCs all viable rowsets in the tablet, a tablet should only schedule one deleted rowset GC op at a time. This isn't necessary for correctness, but avoids wasting some MM thread cycles. I ran this on a real cluster, deleting large chunks of keyspace with 4 MM threads to confirm that space is actually freed, concurrent ops for the same tablet aren't scheduled, and the op runs relatively quickly (in the tens of ms, compared to hundreds to thousands of ms for other ops). Change-Id: I696e2a29ea52ad4e54801b495c322bc371787124 Reviewed-on: http://gerrit.cloudera.org:8080/15145 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> --- M src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/deltamemstore.cc M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tablet/tablet_metrics.cc M src/kudu/tablet/tablet_metrics.h M src/kudu/tablet/tablet_mm_ops.cc M src/kudu/tablet/tablet_mm_ops.h M src/kudu/tablet/tablet_replica-test.cc 21 files changed, 664 insertions(+), 110 deletions(-) Approvals: Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/15145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I696e2a29ea52ad4e54801b495c322bc371787124 Gerrit-Change-Number: 15145 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Volodymyr Verovkin <[email protected]>
