Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14760 )
Change subject: KUDU-2993: don't require update_dirs to fix directory inconsistencies ...................................................................... KUDU-2993: don't require update_dirs to fix directory inconsistencies This patch removes the ENFORCE_CONSISTENCY behavior when opening the DataDirManager. By default, the FS layout will be opened with the new UPDATE_AND_IGNORE_FAILURE mode, wherein: - We update the PIMFs if we notice any are missing or their metadata is not consistent with the actual set of directory UUIDs. - We tolerate failures when creating and updating the PIMFs. This also maintains the previous UPDATE_ON_DISK behavior as UPDATE_AND_ERROR_ON_FAILURE, wherein a disk failure during the update would halt any further updates and revert any metadata changes thus far. This is only used by the 'update_dirs' tool to maintain existing behavior. Since we now rewrite the PIMFs to be consistent by default, the "integrity check" is now gone. This check was previously useful to ensure that the 'all_uuids' fields matched for every PIMF, which ensured that every data directory that was expected to exist actually existed. This was important for a couple reasons: - When a single missing data directory spelled failure for the entire node, starting up with even a single "inconsistent" directory would break all tablets on the tserver. - The file block manager requires that the UUID indexes used by the DataDirManager are static. These indexes are defined by the ordering of the UUIDs in the PIMFs, so we used the integrity check to ensure the ordering was consistent across PIMFs. Now that Kudu tablets can start up with missing directories, the first reason isn't particularly enticing. The second is trickier to work around. To work around it, I've kept the essence of the UUID indexing for the file block manager, though I've made the "integrity checking" virtually non-existent. For the log block manager, I've made the UUID indexing much simpler: rather than relying on the integrity check, we'll now always assign a PIMF a UUID, even if we couldn't read one from disk. Tests: - Updated a few tests that previously enforced consistency among PIMFs to instead check for the correct instance-updating behavior. - Added a test to check that failures while updating the PIMFs don't stop us from opening the FS layout. - Added a test that checks that the adding/removing behavior on a tserver affects and fails tablets as expected. - Added a test to make sure that this doesn't completely break the file block manager. Given we don't expect heavy usage of the FBM, I didn't do extensive testing when the PIMFs are tampered with. - Added a test to ensure we don't regress the rollback behavior of the 'update_dirs' tool in the face of a disk failure. Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5 Reviewed-on: http://gerrit.cloudera.org:8080/14760 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <[email protected]> --- M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/error_manager.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/integration-tests/open-readonly-fs-itest.cc M src/kudu/server/server_base.cc M src/kudu/tools/tool_action_fs.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tserver/tablet_server-test.cc 15 files changed, 870 insertions(+), 705 deletions(-) Approvals: Alexey Serbin: Looks good to me, but someone else must approve Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5 Gerrit-Change-Number: 14760 Gerrit-PatchSet: 17 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]>
