Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9372

to look at the new patch set (#8).

Change subject: rowset_metadata: cache min/max encoded keys
......................................................................

rowset_metadata: cache min/max encoded keys

This patch adds a new flag rowset_metadata_store_keys that, when true,
will indicate that Kudu should duplicate diskrowset min/max keys into
the rowset metadata. Doing so allows Kudu to read the keys from tablet
metadata and bootstrap tablets without having to fully initializing the
CFileReaders for the key columns of each rowset.

A small test is added to tablet_server-test that ensures we don't read
any extraneous bytes when starting up the tablet server if we're reading
keys from the rowset metadata.

I benchmarked this with ~50GB of flushed YCSB data (92 tablets of
varying sizes) on a single node with 4 data directories and a separate
WAL/metadata directory. To set up, I let the server flush/compact for a
while so bootstrap times wouldn't be dominated by reading WAL segments,
and set rowset_metadata_store_keys to true so the tserver had the option
of reading the cached keys from the rowset metadata at startup.

With the above setup, I started the tserver with a disabled maintenance
manager (to avoid further IO) and waited for the tablets to get to a
RUNNING state, recording the sum of the logged bootstrap times of each
tablet. I repeated this, configuring Kudu to read the keys from the
rowset metadata, and to read the keys from the data blocks, dropping OS
caches in between runs. The results are below.

Run number:                   1           2           3           Avg
Reading cached keys (s):      26.430      24.143      20.826      23.800
Not reading cached keys (s):  40.578      38.428      37.093      38.700

Based on this, ~15 seconds worth of bootstrapping time was spent on
initializing the key index readers, that could be avoided by reading the
keys from the rowset metadata instead.

Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d
---
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tserver/tablet_server-test.cc
7 files changed, 143 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/9372/8
--
To view, visit http://gerrit.cloudera.org:8080/9372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d
Gerrit-Change-Number: 9372
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to