Todd Lipcon has submitted this change and it was merged.

Change subject: disk failure: make DataDirManager failure-aware
......................................................................


disk failure: make DataDirManager failure-aware

The DataDirManager must record what directories are unhealthy in order
to avoid placing new data on failed disks. This patch achieves this by
maintaining a set of UUID indices in the DataDirManager that
correspond to failed directories. Additionally, a count of the number
of known failed directories is maintained as a metric.

Tests are added to data_dirs-test to ensure that failed directories
are not used and are not returned as part of newly created
DataDirGroups. If no healthy directories exist, callers will return an
IOError with posix code ENODEV.

Change-Id: Iee212793152de5de5198751d649ab34fb97f6aa2
Reviewed-on: http://gerrit.cloudera.org:8080/7028
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <[email protected]>
---
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
4 files changed, 221 insertions(+), 41 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7028
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iee212793152de5de5198751d649ab34fb97f6aa2
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to