Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12239 )

Change subject: [fs]: delete the (orphaned) metadata file when repairing
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/12239/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12239/4//COMMIT_MSG@9
PS4, Line 9: This patch intends to delete the (orphaned) metadata file at
           : startup while the data file has gone missing and the metadata
           : file has no live blocks at the same time. In the previous patch
           : KUDU-2636, it has supported deleting the dead container which
           : includes data file and metadata file. So, there will be a time
           : window while deleting both of these two files in sequential,
           : and it will make a half-container in extreme cases, especially
           : in a crash or sent the tserver a kill -9. Indeed, Adar has
           : captured this type of case under 1000 runs of dense_node-itest.
Thanks for the added detail. Could you reword this a bit to add more clarity?

  With KUDU-2636 fixed, we now support deleting dead containers at runtime.
  Because this involves deleting a pair of files, there's a time window in which
  it's possible for there to be just one file. A well-timed crash or kill -9 can
  make this a permanent state that fails the log block manager at startup. 
Indeed,
  a loop of dense_node-itest failed 2/1000 times in this way.

  This patch fixes the problem by detecting "orphaned" metadata files at startup
  and deleting them. A metadata file is orphaned if it has no corresponding data
  file and if it contains nothing but dead blocks.


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc@1896
PS4, Line 1896:   const auto CheckRepaired = [&] () {
If you really want to test that the repair happened, you should look at the 
FsReport that's created when the block manager is opened, and test the 
LBMIncompleteContainerCheck within it.


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc@1987
PS4, Line 1987:     // Create a metadata file with 1 byte which is <MIN.
              :     unique_ptr<WritableFile> metadata_file_writer;
              :     ASSERT_OK(env_->NewWritableFile(metadata_file_name, 
&metadata_file_writer));
              :     ASSERT_OK(metadata_file_writer->Append(Slice("a")));
              :     metadata_file_writer->Close();
Since this pattern is repeated quite a few times, could you encapsulate it in a 
lambda?


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager-test.cc@2029
PS4, Line 2029:     // Delete the only block.
              :     vector<BlockId> deleted;
              :     shared_ptr<BlockDeletionTransaction> transaction = 
bm_->NewDeletionTransaction();
              :     transaction->AddDeletedBlock(block_id);
              :     ASSERT_OK(transaction->CommitDeletedBlocks(&deleted));
              :     transaction.reset();
              :     // Wait for the actual hole punching to take place.
              :     for (const auto& data_dir : dd_manager_->data_dirs()) {
              :       data_dir->WaitOnClosures();
              :     }
Likewise, encapsulate this in a lambda.


http://gerrit.cloudera.org:8080/#/c/12239/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12239/1/src/kudu/fs/log_block_manager.cc@774
PS1, Line 774:    
*-------------*------------*----------------*-------------------------*---------------------*/
> In addition, the metadata file is created before data file in 'LogBlockCont
I don't think the order of which file is created/deleted matters because 
without an fsync() on the parent directory, the filesystem may reorder 
operations under the hood. A great reference on this is 
https://danluu.com/file-consistency (see the "Filesystem semantics" section); 
_some_ filesystems may reorder directory operations, though among ext4/xfs (the 
filesystems we generally support for Kudu) this doesn't appear to be a problem.

Since this doesn't appear to be a problem on ext4/xfs, we can cross this bridge 
when it becomes an actual issue.


http://gerrit.cloudera.org:8080/#/c/12239/1/src/kudu/fs/log_block_manager.cc@801
PS1, Line 801:   return Status::OK();
> Our initial goal was to cover scenarios when the data file has gone missing
If you look at all the filesystem calls made in this file, they're wrapped in 
macros like RETURN_NOT_OK_CONTAINER_DISK_FAILURE that capture failed I/O and 
convert it into a disk failure. Because disks can fail at any time and for any 
reason, I think it's important to ensure that this wrapping is 100% inclusive. 
In your patch, if the disk fails during ReadNextPB, we won't notice that.


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@762
PS4, Line 762:   /* Note: the status here only represents the result of check.
Could you reformat this to use multi-line C++ comments?

  // first line
  // second line
  // ...


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@767
PS4, Line 767:  EXIST && NO LIVE BLCOKS | EXIST && LIVE BLCOKS|
BLOCKS


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@769
PS4, Line 769: (new case)
Don't need this; if you really want to distinguish between the old and new 
cases, do so descriptively (i.e. "Aborted (orphaned metadata)" vs. "Aborted (no 
data and too short metadata)", etc.)


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@830
PS4, Line 830:       
report->incomplete_container_check->entries.emplace_back(common_path);
             :       return Status::Aborted(Substitute("orphaned empty metadata 
and data files $0", common_path));
Nit: looks like these lines are indented  too much.


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@839
PS4, Line 839:   if (PREDICT_FALSE(s_data.IsNotFound())) {
             :     if (metadata_size >= 
pb_util::kPBContainerMinimumValidLength) {
Can combine these and save one level of indentation.

Probably clearer to switch the order too (more symmetry with the condition 
above):

  if (PREDICT_FALSE(metadata_size >= pb_util::kPBContainerMinimumValidLength &&
                    s_data.IsNotFound()))


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@863
PS4, Line 863:       if ((read_status.IsEndOfFile() && live_blocks.empty())) {
Nit: there's an extra set of parens here.


http://gerrit.cloudera.org:8080/#/c/12239/4/src/kudu/fs/log_block_manager.cc@865
PS4, Line 865:         return Status::Aborted(Substitute("orphaned metadata 
file $0",common_path));
Nit: need an extra space between the comma and 'common_path'



--
To view, visit http://gerrit.cloudera.org:8080/12239
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a4223d0e6385a7f7776f8fe2679a3bb16d832ab
Gerrit-Change-Number: 12239
Gerrit-PatchSet: 4
Gerrit-Owner: helifu <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <[email protected]>
Gerrit-Comment-Date: Wed, 23 Jan 2019 08:28:52 +0000
Gerrit-HasComments: Yes

Reply via email to