Hello Attila Bukor, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23348
to look at the new patch set (#2).
Change subject: [compaction] include undo size while picking rowsets
......................................................................
[compaction] include undo size while picking rowsets
As part of compaction improvement effort, this change is focussing on
taking into consideration the size of undo deltas while picking rowsets
during rowset compaction. I could not find any historical reason to why
it was not done before. Maybe there was some analysis done that ended
in a conclusion that considering undo deltas may not be right approach
when estimating upper bound fractional solution in the knapsack. The
size of undo deltas are taken into consideration while deciding whether
a new rowset (a potential compaction 'candidate') fits into the budget
and is at least denser than least dense candidate in knapsack.
The test data used here has significant undo deltas that fits perfectly
into the OOM scenario. Total size of uncompacted data is 29GB.
With this patch, rowset compaction never hit OOM and the resident
memory kept well within limits (~1GB).
Without this patch, compaction hit OOM on a node with limited memory
where rowset compaction peak memory touched ~30GB.
Testing with different scenarios:
1. Compaction that takes into consideration:
- Base data
- Redo deltas
- Undo delats
Budget for compaction (tablet_compaction_budget_mb) is 1024 MB
Result: Rowset 211 was skipped due to 1024 MB size constraint.
Budgeted compaction selection:
[ ] RowSet(211)( 1945M) [0.0000, 1.0000] [<redacted>,<redacted>]
[ ] RowSet(217)( 1M) [0.0027, 0.1953] [<redacted>,<redacted>]
[x] RowSet(216)( 1M) [0.5341, 0.5341] [<redacted>,<redacted>]
[x] RowSet(218)( 1M) [0.5341, 0.5341] [<redacted>,<redacted>]
[x] RowSet(219)( 1M) [0.5341, 0.5341] [<redacted>,<redacted>]
2. Compaction that takes into consideration:
- Base data
- Redo deltas
- Undo delats
Budget for compaction (tablet_compaction_budget_mb) is 2048 MB
Result: Rowset 211 was NOT skipped with 2048 MB size limit.
Budgeted compaction selection:
[x] RowSet(211)( 1945M) [0.0000, 1.0000] [<redacted>,<redacted>]
[x] RowSet(217)( 1M) [0.0027, 0.1953] [<redacted>,<redacted>]
[x] RowSet(216)( 1M) [0.5341, 0.5341] [<redacted>,<redacted>]
[x] RowSet(218)( 1M) [0.5341, 0.5341] [<redacted>,<redacted>]
[x] RowSet(219)( 1M) [0.5341, 0.5341] [<redacted>,<redacted>]
3. Compaction that takes into consideration:
- Base data
- Redo deltas
Compaction budget (tablet_compaction_budget_mb) is default 128 MB
Result: Rowset 211 with size 1M (ignoring UNDO deltas) included.
Budgeted compaction selection:
[x] RowSet(211)( 1M) [0.0000, 1.0000] [<redacted>,<redacted>]
[x] RowSet(217)( 1M) [0.0026, 0.2015] [<redacted>,<redacted>]
[x] RowSet(216)( 1M) [0.5358, 0.5385] [<redacted>,<redacted>]
[x] RowSet(218)( 1M) [0.5385, 0.5404] [<redacted>,<redacted>]
[x] RowSet(219)( 1M) [0.5404, 0.5404] [<redacted>,<redacted>]
Note: This is different from rowset compaction batching effort.
Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
11 files changed, 32 insertions(+), 31 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/23348/2
--
To view, visit http://gerrit.cloudera.org:8080/23348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592
Gerrit-Change-Number: 23348
Gerrit-PatchSet: 2
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)