Todd Lipcon has uploaded a new change for review.
Change subject: KUDU-749 (part 2): avoid O(n^2) behavior when compacting deltas
KUDU-749 (part 2): avoid O(n^2) behavior when compacting deltas
When handling a zipfian update workload, we have a lot of delta records applied
to a single row. In the compaction code path, we were previously using
Mutation::AppendToList() to append each such delta to the end of a linked list.
Thus, constructing the linked list was O(n^2) in the number of deltas
corresponding to a row. Each such iteration also most likely was a CPU cache
miss, making this quite slow.
In fact, in a YCSB workload I'm currently testing, a single compaction has been
running for over an hour and a half, spending all of its time in the
This patch changes the flow so that the deltas are collected by prepending them
to the linked list (an O(1) operation). The resulting list is then reversed (an
O(n) operation) before feeding into the rest of the unchanged compaction code
This patch is also notable for being one of the few times in real life where a
"reverse the linked list" algorithm was actually necessary. Due to the
notoriously tricky nature of this foundational computer science problem (I once
botched it on a job interview), and the fact that this was a so-called "real
life situation", I googled the algorithm. Thus, I am confident of its
correctness. Aside from such assurances, this code path is also very well
covered by existing test cases which perform multiple updates against the same
row (in particular fuzz-itest).
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <a...@cloudera.com>
(cherry picked from commit c7178e97e842f42e9ed9d5e9e2a4f521fbe70b6b)
10 files changed, 40 insertions(+), 44 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/3224/1
To view, visit http://gerrit.cloudera.org:8080/3224
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Todd Lipcon <t...@apache.org>