Andrew Wong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7766

Change subject: avoid opening tablets on failed disks
......................................................................

avoid opening tablets on failed disks

Currently Kudu servers can start up with failed disks. However, the
moment tablets attempt to bootstrap (i.e. open blocks, etc.), they will
attempt to read from the failed disk and fail. This can be avoided by
checking whether a tablet's disk group contains a failed disk before
attempting to open the tablet. If so, the tablet should be marked as
having an error so it can be reassigned.

The default behavior of the 'fs_target_data_dirs_per_tablet' flag is
updated to take into account disk state when assigning new directory
groups. This allows the tablet to be reassigned to a server without
being spread across a failed directory.

Testing is done by loading data into a cluster with multi-disk
servers, failing a single directory of one of the servers, and ensuring
that the tablets spread across the failed disk get replicated upon the
next startup.

Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
9 files changed, 255 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

Reply via email to