Hello Tidy Bot, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14760
to look at the new patch set (#8).
Change subject: KUDU-2993: don't require update_dirs to fix directory
inconsistencies
......................................................................
KUDU-2993: don't require update_dirs to fix directory inconsistencies
This patch removes the ENFORCE_CONSISTENCY behavior when opening the
DataDirManager. By default, the directory will be opened with the
UPDATE_ON_DISK behavior. To support this:
- We can now tolerate failures when updating the PIMFs.
- When we start up, we'll look at all the PIMFs, check if any are
missing and create those that are, and update any that have a
different 'all_uuids' field than what actually exists.
Since we now always rewrite the PIMFs to be consistent, the "integrity
check" is now gone. This check was previously useful to ensure that the
'all_uuids' fields matched for every PIMF, which ensured that every data
directory that was expected to exist actually existed. This was
important for a couple reasons:
- When a single missing data directory spelled failure for the entire
node, starting up with even a single "inconsistent" directory would
break all tablets on the tserver.
- The file block manager requires that the UUID indexes used by the
DataDirManager are static. These indexes are defined by the ordering
of the UUIDs in the PIMFs, so we used the integrity check to ensure
the ordering was consistent across PIMFs.
Now that Kudu tablets can start up with missing directories, the first
reason isn't particularly enticing.
The second is trickier to work around. To work around it, I've kept the
essence of the UUID indexing for the file block manager, though I've
made the "integrity checking" virtually non-existent. For the log block
manager, I've made the UUID indexing much simpler: rather than relying
on the integrity check, we'll now always assign a PIMF a UUID, even if
we couldn't read one from disk.
Tests:
- Updated a few tests that previously enforced consistency among PIMFs
to instead check for the correct instance-updating behavior.
- Added a test to check that failures while updating the PIMFs don't
stop us from opening the FS layout.
- Added a test that checks that the adding/removing behavior on a
tserver affects and fails tablets as expected.
- Added a test to make sure that this doesn't completely break the file
block manager. Given we don't expect heavy usage of the FBM, I didn't
do extensive testing when the PIMFs are tampered with.
Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
---
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/error_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/server/server_base.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 686 insertions(+), 712 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/14760/8
--
To view, visit http://gerrit.cloudera.org:8080/14760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
Gerrit-Change-Number: 14760
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)