Hello David Ribeiro Alves, Mike Percy, Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/3221

to review the following change.

Change subject: KUDU-749 (part 2): avoid O(n^2) behavior when compacting deltas
......................................................................

KUDU-749 (part 2): avoid O(n^2) behavior when compacting deltas

When handling a zipfian update workload, we have a lot of delta records applied
to a single row. In the compaction code path, we were previously using
Mutation::AppendToList() to append each such delta to the end of a linked list.
Thus, constructing the linked list was O(n^2) in the number of deltas
corresponding to a row. Each such iteration also most likely was a CPU cache
miss, making this quite slow.

In fact, in a YCSB workload I'm currently testing, a single compaction has been
running for over an hour and a half, spending all of its time in the
AppendToList function.

This patch changes the flow so that the deltas are collected by prepending them
to the linked list (an O(1) operation). The resulting list is then reversed (an
O(n) operation) before feeding into the rest of the unchanged compaction code
path.

This patch is also notable for being one of the few times in real life where a
"reverse the linked list" algorithm was actually necessary. Due to the
notoriously tricky nature of this foundational computer science problem (I once
botched it on a job interview), and the fact that this was a so-called "real
life situation", I googled the algorithm. Thus, I am confident of its
correctness. Aside from such assurances, this code path is also very well
covered by existing test cases which perform multiple updates against the same
row (in particular fuzz-itest).

Change-Id: If6bfa3fc6f41998b0f1ff58c5c8ea39881022de5
---
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mutation.cc
M src/kudu/tablet/mutation.h
9 files changed, 39 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/3221/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If6bfa3fc6f41998b0f1ff58c5c8ea39881022de5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <david.al...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>

Reply via email to