Hello Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/10340
to look at the new patch set (#2).
Change subject: WIP KUDU-2359: allow startup with missing data dirs
......................................................................
WIP KUDU-2359: allow startup with missing data dirs
Context
-------
As a part of previous disk failure work, Kudu currently supports opening
the FS layout in the face of EIOs when reading instance and block
manager instance files. The directory manager, in such cases, labels the
data directory in which the bad file resides as failed in memory. The
check that enforces consistency between instance files accounts for such
failed DataDirs.
Separately, the introduction of the `kudu fs update_dirs` tool expanded
the logic used to open the FS layout to serve two new purposes when
running the tool:
- To open the FS layout as the user requests it, ignoring any
inconsistencies across data directories. This mode allows Kudu to
stage the requested FS layout and test whether existing tablets would
break due to the update. In this mode, the consistency check is
skipped entirely.
- To actually update the FS layout on disk to match the user input. In
this mode, the consistency check is performed after updating the
appropriate instance files.
New stuff
---------
As mentioned in the JIRA, disk failures can manifest as a failure to
read directory entries, leading to NotFound errors. As such, this patch
reconciles the above features while making the adjustments necessary to
support Kudu starting up with missing data directories. A missing data
directory in this case is one that has no instance or block manager
instance file; upon startup, Kudu will treat such data dirs as failed.
Note that this behavior is different than starting up Kudu with extra or
missing entries in `fs_data_dirs`, which is still not supported unless
running the update tool.
Examples:
- If an existing server were configured with --fs_data_dirs=/a,/b,/c,
and it were restarted such that only /a,/b existed on disk, Kudu will
start up and list /a,/b,/c, and note that /c is failed.
- If the above server were restarted with --fs_data_dirs=/a,/b, even if
only /a,/b existed on disk, Kudu would fail to start up until running
`kudu fs update_dirs [other flags] --fs_data_dirs=/a,/b`
Some notes and changes in this patch include:
- methods involved in loading block manager instance files
(PathInstanceMetadataFiles) now treat missing instance files as
"unhealthy", the same way they treat files that fail due disk errors
- DataDirManager::LoadInstances() has been updated to handle the above
change, optionally returning the missing instances as unhealthy
instances. This allows the update codepaths to explicitly track missing
directories and the startup codepaths to treat missing directories as
an unhealthy instance used to spawn a failed DataDir in memory
- various codepaths that previously ended FsManager::Open() with an
IOError/Corruption because all drives were failed will now return
NotFound, indicating Kudu should attempt to create a new FS layout
- as a byproduct of the above changes, when opening the FS layout in
ENFORCE_CONSISTENCY mode with an extra data dir included in
`fs_data_dirs`, Kudu will fail later than before, at the integrity
check, and yield an IOError instead of a NotFound error
- the UUID and UUID index assignment for missing/failed directories has
been updated when opening the speculative directory manager; see
DataDirManager::Open() for more details.
WIP this change touches codepaths with a fair number of edge cases, and
as such, I haven't fully convinced myself this is the best approach.
That said, it is reviewable and has tests passing.
Change-Id: I61a71265c3cc34a7b72320149770a814ec7f8351
---
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/tools/kudu-tool-test.cc
7 files changed, 259 insertions(+), 126 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/10340/2
--
To view, visit http://gerrit.cloudera.org:8080/10340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61a71265c3cc34a7b72320149770a814ec7f8351
Gerrit-Change-Number: 10340
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins