Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1835 (part 2): enable WAL compression ......................................................................
KUDU-1835 (part 2): enable WAL compression This enables compression of the entries in the WAL. To test the efficiency, I loaded some data into a local tablet server using kudu-ts and the influxdb benchmark as a data generator. I think restarted the server with different configurations, and disabled log segment prealloocation in order to see the _actual_ length of the WALs instead of the preallocated length. Compression Size NONE 311M LZ4 98M - 3.17x SNAPPY 91M - 3.41x ZLIB 59M - 5.27x Based on some results in the JIRA I expect more significant improvements in some other workloads, since kudu-ts already does its own sort of "dictionary encoding" by normalizing out the tagset IDs. In order to enable this feature, an incompatible change was made in the WAL format. Each entry header now contains an extra field indicating the uncompressed length. Given that, it's desirable that old servers be prohibited from opening new-format WALs. Ideally, we could have bumped the major version field in the WAL header. However, the old reader code never actually checked the version fields on read. So, this takes the approach of not setting the versions anymore, and making them deprecated. Since they were 'required' proto fields in old versions, this will make the reader fail to parse the header with an error that the version field is missing. This is preferable to failing later with a Corruption error about an entry header CRC mismatch. To solve this problem for future versions, I also added an 'incompatible_features' flag set. See commit f82cf6918c00dff6aec for more information about this approach. Note that it might have been possible to do this in a more backward-compatible way by making the default codec be "NONE" and having the new version write old-format headers when compression is disabled. I initially took this approach, but eventually abandoned it as the code was more complex. Additionally, we already have a different data format incompatible change coming in Kudu 1.3 (the above-referenced f82cf6918c00dff6aec). So adding a second one doesn't add additional burden on users. I tested the compatibility manually: - started a Kudu 1.2 server, wrote some data - upgraded to a version with this patch - downgraded back to a version without this patch, and saw that the tablets failed to start, with a message that the major/minor version fields were missing - upgraded back to this patch and verified the tablets started Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5 Reviewed-on: http://gerrit.cloudera.org:8080/5736 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves <[email protected]> --- M src/kudu/consensus/CMakeLists.txt M src/kudu/consensus/log-test-base.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/log.proto M src/kudu/consensus/log_anchor_registry.h M src/kudu/consensus/log_reader.cc M src/kudu/consensus/log_reader.h M src/kudu/consensus/log_util.cc M src/kudu/consensus/log_util.h M src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/util/compression/compression_codec.cc M src/kudu/util/compression/compression_codec.h 15 files changed, 274 insertions(+), 106 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0fc790ddc72a519a82bbb6b71a902418051b35a5 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]>
