Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/23348 )
Change subject: KUDU-3734 include undo size while picking rowsets ...................................................................... KUDU-3734 include undo size while picking rowsets As part of compaction improvement effort, this change is focussing on taking into consideration the size of undo deltas while picking rowsets during rowset compaction. I could not find any historical reason to why it was not done before. Maybe there was some analysis done that ended in a conclusion that considering undo deltas may not be right approach when estimating upper bound fractional solution in the knapsack. The size of undo deltas are taken into consideration while deciding whether a new rowset (a potential compaction 'candidate') fits into the budget and is at least denser than least dense candidate in knapsack. The test data used here has significant undo deltas that fits perfectly into the OOM scenario. Total size of uncompacted data is 29GB. With this patch, rowset compaction never hit OOM and the resident memory kept well within limits (~1GB). Without this patch, compaction hit OOM on a node with limited memory where rowset compaction peak memory touched ~30GB. Testing with different scenarios: 1. Compaction that takes into consideration: - Base data - Redo deltas - Undo delats Budget for compaction (tablet_compaction_budget_mb) is 1024 MB Result: Rowset 211 was skipped due to 1024 MB size constraint. Budgeted compaction selection: [ ] RowSet(211)( 1945M) [0.0000, 1.0000] [ ] RowSet(217)( 1M) [0.0027, 0.1953] [x] RowSet(216)( 1M) [0.5341, 0.5341] [x] RowSet(218)( 1M) [0.5341, 0.5341] [x] RowSet(219)( 1M) [0.5341, 0.5341] 2. Compaction that takes into consideration: - Base data - Redo deltas - Undo delats Budget for compaction (tablet_compaction_budget_mb) is 2048 MB Result: Rowset 211 was NOT skipped with 2048 MB size limit. Budgeted compaction selection: [x] RowSet(211)( 1945M) [0.0000, 1.0000] [x] RowSet(217)( 1M) [0.0027, 0.1953] [x] RowSet(216)( 1M) [0.5341, 0.5341] [x] RowSet(218)( 1M) [0.5341, 0.5341] [x] RowSet(219)( 1M) [0.5341, 0.5341] 3. Compaction that takes into consideration: - Base data - Redo deltas Compaction budget (tablet_compaction_budget_mb) is default 128 MB Result: Rowset 211 with size 1M (ignoring UNDO deltas) included. Budgeted compaction selection: [x] RowSet(211)( 1M) [0.0000, 1.0000] [x] RowSet(217)( 1M) [0.0026, 0.2015] [x] RowSet(216)( 1M) [0.5358, 0.5385] [x] RowSet(218)( 1M) [0.5385, 0.5404] [x] RowSet(219)( 1M) [0.5404, 0.5404] Note: This is different from rowset compaction batching effort. Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592 Reviewed-on: http://gerrit.cloudera.org:8080/23348 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> --- M src/kudu/tablet/compaction_policy-test.cc M src/kudu/tablet/compaction_policy.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/rowset_info.cc M src/kudu/tablet/rowset_info.h M src/kudu/tablet/tablet.cc 11 files changed, 51 insertions(+), 39 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/23348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592 Gerrit-Change-Number: 23348 Gerrit-PatchSet: 11 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Marton Greber <[email protected]>
