Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12239 )
Change subject: [fs]: delete the (orphaned) metadata file when repairing ...................................................................... Patch Set 4: (13 comments) http://gerrit.cloudera.org:8080/#/c/12239/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12239/4//COMMIT_MSG@9 PS4, Line 9: This patch intends to delete the (orphaned) metadata file at : startup while the data file has gone missing and the metadata : file has no live blocks at the same time. In the previous patch : KUDU-2636, it has supported deleting the dead container which : includes data file and metadata file. So, there will be a time : window while deleting both of these two files in sequential, : and it will make a half-container in extreme cases, especially : in a crash or sent the tserver a kill -9. Indeed, Adar has : captured this type of case under 1000 runs of dense_node-itest. Thanks for the added detail. Could you reword this a bit to add more clarity? With KUDU-2636 fixed, we now support deleting dead containers at runtime. Because this involves deleting a pair of files, there's a time window in which it's possible for there to be just one file. A well-timed crash or kill -9 can make this a permanent state that fails the log block manager at startup. Indeed, a loop of dense_node-itest failed 2/1000 times in this way. This patch fixes the problem by detecting "orphaned" metadata files at startup and deleting them. A metadata file is orphaned if it has no corresponding data file and if it contains nothing but dead blocks. http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc@1896 PS4, Line 1896: const auto CheckRepaired = [&] () { If you really want to test that the repair happened, you should look at the FsReport that's created when the block manager is opened, and test the LBMIncompleteContainerCheck within it. http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc@1987 PS4, Line 1987: // Create a metadata file with 1 byte which is <MIN. : unique_ptr<WritableFile> metadata_file_writer; : ASSERT_OK(env_->NewWritableFile(metadata_file_name, &metadata_file_writer)); : ASSERT_OK(metadata_file_writer->Append(Slice("a"))); : metadata_file_writer->Close(); Since this pattern is repeated quite a few times, could you encapsulate it in a lambda? http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc@2029 PS4, Line 2029: // Delete the only block. : vector<BlockId> deleted; : shared_ptr<BlockDeletionTransaction> transaction = bm_->NewDeletionTransaction(); : transaction->AddDeletedBlock(block_id); : ASSERT_OK(transaction->CommitDeletedBlocks(&deleted)); : transaction.reset(); : // Wait for the actual hole punching to take place. : for (const auto& data_dir : dd_manager_->data_dirs()) { : data_dir->WaitOnClosures(); : } Likewise, encapsulate this in a lambda. http://gerrit.cloudera.org:8080/#/c/12239/1/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/12239/1/src/kudu/fs/log_block_manager.cc@774 PS1, Line 774: *-------------*------------*----------------*-------------------------*---------------------*/ > In addition, the metadata file is created before data file in 'LogBlockCont I don't think the order of which file is created/deleted matters because without an fsync() on the parent directory, the filesystem may reorder operations under the hood. A great reference on this is https://danluu.com/file-consistency (see the "Filesystem semantics" section); _some_ filesystems may reorder directory operations, though among ext4/xfs (the filesystems we generally support for Kudu) this doesn't appear to be a problem. Since this doesn't appear to be a problem on ext4/xfs, we can cross this bridge when it becomes an actual issue. http://gerrit.cloudera.org:8080/#/c/12239/1/src/kudu/fs/log_block_manager.cc@801 PS1, Line 801: return Status::OK(); > Our initial goal was to cover scenarios when the data file has gone missing If you look at all the filesystem calls made in this file, they're wrapped in macros like RETURN_NOT_OK_CONTAINER_DISK_FAILURE that capture failed I/O and convert it into a disk failure. Because disks can fail at any time and for any reason, I think it's important to ensure that this wrapping is 100% inclusive. In your patch, if the disk fails during ReadNextPB, we won't notice that. http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@762 PS4, Line 762: /* Note: the status here only represents the result of check. Could you reformat this to use multi-line C++ comments? // first line // second line // ... http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@767 PS4, Line 767: EXIST && NO LIVE BLCOKS | EXIST && LIVE BLCOKS| BLOCKS http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@769 PS4, Line 769: (new case) Don't need this; if you really want to distinguish between the old and new cases, do so descriptively (i.e. "Aborted (orphaned metadata)" vs. "Aborted (no data and too short metadata)", etc.) http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@830 PS4, Line 830: report->incomplete_container_check->entries.emplace_back(common_path); : return Status::Aborted(Substitute("orphaned empty metadata and data files $0", common_path)); Nit: looks like these lines are indented too much. http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@839 PS4, Line 839: if (PREDICT_FALSE(s_data.IsNotFound())) { : if (metadata_size >= pb_util::kPBContainerMinimumValidLength) { Can combine these and save one level of indentation. Probably clearer to switch the order too (more symmetry with the condition above): if (PREDICT_FALSE(metadata_size >= pb_util::kPBContainerMinimumValidLength && s_data.IsNotFound())) http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@863 PS4, Line 863: if ((read_status.IsEndOfFile() && live_blocks.empty())) { Nit: there's an extra set of parens here. http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@865 PS4, Line 865: return Status::Aborted(Substitute("orphaned metadata file $0",common_path)); Nit: need an extra space between the comma and 'common_path' -- To view, visit http://gerrit.cloudera.org:8080/12239 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a4223d0e6385a7f7776f8fe2679a3bb16d832ab Gerrit-Change-Number: 12239 Gerrit-PatchSet: 4 Gerrit-Owner: helifu <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: helifu <[email protected]> Gerrit-Comment-Date: Wed, 23 Jan 2019 08:28:52 +0000 Gerrit-HasComments: Yes
