Mike Percy has submitted this change and it was merged. Change subject: disk failure: don't open tablets on failed disks ......................................................................
disk failure: don't open tablets on failed disks Currently Kudu servers open the FS layout with failed disks. However, the moment tablets attempt to bootstrap (i.e. open blocks, etc.), they will attempt to read from the failed disk and fail. This can be avoided by checking whether a tablet's disk group contains a failed disk before attempting to read data from the tablet. If so, the tablet should be marked as having an error so it can be reassigned. The default behavior of the 'fs_target_data_dirs_per_tablet' flag is updated to take into account disk state when assigning new directory groups. This allows the tablet to be reassigned to a server without being spread across a failed directory. Testing is done by loading data into a cluster configured to use multiple directories for data blocks, failing a single directory on one of the tablet servers, and ensuring that the tablets with blocks on the failed directory get re-replicated at startup time. The test uses a cluster verifier to verify the healthy end-state of the cluster. Necessary changes have been made to do this on a cluster comprising of multiple data directories. Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Reviewed-on: http://gerrit.cloudera.org:8080/7766 Reviewed-by: Mike Percy <[email protected]> Tested-by: Kudu Jenkins --- M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_failure-itest.cc M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/tserver/ts_tablet_manager.cc 8 files changed, 185 insertions(+), 13 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]>
