Hello Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10340

to look at the new patch set (#2).

Change subject: WIP KUDU-2359: allow startup with missing data dirs
......................................................................

WIP KUDU-2359: allow startup with missing data dirs

Context
-------
As a part of previous disk failure work, Kudu currently supports opening
the FS layout in the face of EIOs when reading instance and block
manager instance files. The directory manager, in such cases, labels the
data directory in which the bad file resides as failed in memory. The
check that enforces consistency between instance files accounts for such
failed DataDirs.

Separately, the introduction of the `kudu fs update_dirs` tool expanded
the logic used to open the FS layout to serve two new purposes when
running the tool:
- To open the FS layout as the user requests it, ignoring any
  inconsistencies across data directories. This mode allows Kudu to
  stage the requested FS layout and test whether existing tablets would
  break due to the update. In this mode, the consistency check is
  skipped entirely.
- To actually update the FS layout on disk to match the user input. In
  this mode, the consistency check is performed after updating the
  appropriate instance files.

New stuff
---------
As mentioned in the JIRA, disk failures can manifest as a failure to
read directory entries, leading to NotFound errors. As such, this patch
reconciles the above features while making the adjustments necessary to
support Kudu starting up with missing data directories. A missing data
directory in this case is one that has no instance or block manager
instance file; upon startup, Kudu will treat such data dirs as failed.

Note that this behavior is different than starting up Kudu with extra or
missing entries in `fs_data_dirs`, which is still not supported unless
running the update tool.

Examples:
- If an existing server were configured with --fs_data_dirs=/a,/b,/c,
  and it were restarted such that only /a,/b existed on disk, Kudu will
  start up and list /a,/b,/c, and note that /c is failed.
- If the above server were restarted with --fs_data_dirs=/a,/b, even if
  only /a,/b existed on disk, Kudu would fail to start up until running
  `kudu fs update_dirs [other flags] --fs_data_dirs=/a,/b`

Some notes and changes in this patch include:
- methods involved in loading block manager instance files
  (PathInstanceMetadataFiles) now treat missing instance files as
  "unhealthy", the same way they treat files that fail due disk errors
- DataDirManager::LoadInstances() has been updated to handle the above
  change, optionally returning the missing instances as unhealthy
  instances. This allows the update codepaths to explicitly track missing
  directories and the startup codepaths to treat missing directories as
  an unhealthy instance used to spawn a failed DataDir in memory
- various codepaths that previously ended FsManager::Open() with an
  IOError/Corruption because all drives were failed will now return
  NotFound, indicating Kudu should attempt to create a new FS layout
- as a byproduct of the above changes, when opening the FS layout in
  ENFORCE_CONSISTENCY mode with an extra data dir included in
  `fs_data_dirs`, Kudu will fail later than before, at the integrity
  check, and yield an IOError instead of a NotFound error
- the UUID and UUID index assignment for missing/failed directories has
  been updated when opening the speculative directory manager; see
  DataDirManager::Open() for more details.

WIP this change touches codepaths with a fair number of edge cases, and
as such, I haven't fully convinced myself this is the best approach.
That said, it is reviewable and has tests passing.

Change-Id: I61a71265c3cc34a7b72320149770a814ec7f8351
---
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/tools/kudu-tool-test.cc
7 files changed, 259 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/10340/2
--
To view, visit http://gerrit.cloudera.org:8080/10340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61a71265c3cc34a7b72320149770a814ec7f8351
Gerrit-Change-Number: 10340
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins

Reply via email to