Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5736

to look at the new patch set (#7).

Change subject: KUDU-1835 (part 2): enable WAL compression
......................................................................

KUDU-1835 (part 2): enable WAL compression

This enables compression of the entries in the WAL.

To test the efficiency, I loaded some data into a local tablet server
using kudu-ts and the influxdb benchmark as a data generator. I think
restarted the server with different configurations, and disabled log
segment prealloocation in order to see the _actual_ length of the WALs
instead of the preallocated length.

Compression     Size
NONE            311M
LZ4              98M - 3.17x
SNAPPY           91M - 3.41x
ZLIB             59M - 5.27x

Based on some results in the JIRA I expect more significant improvements
in some other workloads, since kudu-ts already does its own sort of
"dictionary encoding" by normalizing out the tagset IDs.

In order to enable this feature, an incompatible change was made in the
WAL format. Each entry header now contains an extra field indicating the
uncompressed length.

Given that, it's desirable that old servers be prohibited from opening
new-format WALs. Ideally, we could have bumped the major version field
in the WAL header. However, the old reader code never actually checked
the version fields on read. So, this takes the approach of not setting
the versions anymore, and making them deprecated. Since they were
'required' proto fields in old versions, this will make the reader fail
to parse the header with an error that the version field is missing.
This is preferable to failing later with a Corruption error about an
entry header CRC mismatch.

To solve this problem for future versions, I also added an
'incompatible_features' flag set. See commit f82cf6918c00dff6aec for
more information about this approach.

Note that it migth have been possible to do this in a more
backward-compatible way by making the default codec be "NONE" and having
the new version write old-format headers when compression is disabled.
Initially took this approach, but eventually abandoned it as the code
was more complex. Additionally, we already have a different data format
incompatible change coming in Kudu 1.3 (the above-referenced
f82cf6918c00dff6aec). So adding a second one doesn't add additional
burden on users.

I tested the compatibility manually:
- started a Kudu 1.2 server, wrote some data
- upgraded to a version with this patch
- downgraded back to a version without this patch, and saw that the
  tablets failed to start, with a message that the major/minor version
  fields were missing
- upgraded back to this patch and verified the tablets started

Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log.proto
M src/kudu/consensus/log_anchor_registry.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/integration-tests/disk_reservation-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/util/compression/compression_codec.cc
M src/kudu/util/compression/compression_codec.h
15 files changed, 270 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/5736/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

Reply via email to