Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager ......................................................................
Patch Set 20: (26 comments) http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: s-per > Nit: can just omit this. Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 38: DECLARE_bool(cfile_write_checksums); > warning: redundant 'FLAGS_block_time_to_flush' declaration [readability-red Done Line 892: unique_ptr<WritableBlock> sink; > What happened to parameterizing this test for all block_time_to_flush value Refactored the code and the flag specific code in Cfile are removed. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS17, Line 266: RETURN_NOT_OK(block_->Finalize()); : transaction->AddCreatedBlock(std::move(block_)); : return Status::OK(); : } : : void CFileWriter::AddMetadataPair(const Slice &key, const Slice &value) { : C > This is incorrect. We should always Finalize() the block; whether or not we Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 43: DEFINE_string(block_manager_flush_control, "finalize", > Let's rename to "block_manager_flush_control": Done Line 43: DEFINE_string(block_manager_flush_control, "finalize", > warning: redundant 'FLAGS_block_time_to_flush' declaration [readability-red Done Line 44: "Control when to flush a block, either at 'finalize', 'close'," > This should apply to any WritableBlock, not just those in a BlockTransactio Done PS17, Line 48: neve > Let's use "never" instead, since we're talking about a point in time. Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 32: namespace kudu { > This doesn't belong here (block_coalesce_close didn't belong either). Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 340: Status FileWritableBlock::Finalize() { > This is where we should check the value of block_time_to_flush. We should o Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1176: deleted_ = true; > I mentioned this to Dan in our impromptu meeting yesterday; perhaps you wer Done PS17, Line 1397: ReadableBlock::Close() { > You can use 'lbm' here. Done Line 1410: } > And here we also need to gate on block_time_to_flush == 'finalize'. Updated. To clarify, I did this because we were saying 'Finalize indicates Flushing'. But after thought more, I think your suggestion makes more sense. So I separated the flushing (I kept FlushDataAsync()) from Finalize and use the flag to control when to call FlushDataAsync(). http://gerrit.cloudera.org:8080/#/c/7207/19/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 245: > Don't need this; just change "enum CloseFlag" to "enum CloseFlags". Done Line 499: // Note: 'record' may be swapped into 'report'; do not use it after calling > Don't need this; it's only ever used internally (by LogBlockContainer). Done Line 582: total_bytes_(0), > Can we get by with a boolean? Seems like we could set it in RoundUpContaine Removed as is not necessary. PS19, Line 594: } : > I don't understand what protection the lock offers. Furthermore, PosixRWFil Discussed this with Adar offline. I will remove this lock as PosixRWFile::Sync() already tries to keeps track of whether a sync is necessary. Meanwhile, fix KUDU-2102 in another commit. Line 906: } > You should add a note here about mutating result_status on failure, like in Refactored the code and removed FinishBlock. Line 939: RETURN_NOT_OK_HANDLE_ERROR(data_dir_->RefreshIsFull(DataDir::RefreshMode::ALWAYS)); > I don't understand this, and it's making me nervous. Done PS19, Line 955: Status LogBlockContainer::AppendMetadata(const BlockRecordPB& pb) { : DCHECK(!read_only()); : // Note: We don't check for sufficient disk space for metadata writes in : // order to allow for block deletion on full disks. : RETURN_NOT_OK_HANDLE_ERROR(metadata_file_->Append(pb)); : return Status::OK(); : } : > Here's another case where, even though FinishBlock() can be called concurre Done Line 1076: "Container $0 with size $1 is now full, max size is $2", > These two accesses to next_sync_offset_ aren't atomic. Use StoreMax() inste Removed. PS19, Line 1173: : void LogBlock::Delete() { : DCHECK(!deleted_); : deleted_ = true; : } : : //////////////////////////////////////////////////////////// : // LogWritableBlock (definition) : //////////////////////////////////////////////////////////// : : Log > I keep circling back to this as a potential synchronization issue. The comp With the current change since RoundUpContainer is always called before MakeContainerAvailable, I think it should also be guaranteed to be called by just one writer thread at a time. I refactored the DoClose(), SyncBlocks(), FinishBlock() as you suggested, let me know if you think it is still not clear. Updated. Line 1188: block_length_(0), > Surely we needn't duplicate this from LogBlock; if you need it at the conta Sorry about this, I originally removed the duplicate code by calling the container level fs_alighed_length(). You can see that in patch set 4 L1069. But apparently I missed this after rebasing the patch later. Updated it now. PS19, Line 1464: //////////////////////////////////////////////////////////// : // LogBlockManager : //////////////////////////////////////////////////////////// : : const char* LogBlockManager::kContainerMetadataFileSuffix = ".metadata"; : const char* LogBlockManager::kContainerDataFileSuffix = ".data"; : : static const char* kBlockManagerType = "log"; : : // These values were arrived at via experimentation. See commit 4923a74 for : // more details. : const map<int64_t, int64_t> LogBlockManager::kPerFsBlockSizeBlockLimits({ : { 1024, 673 }, : { 2048, 1353 }, : { 4096, 2721 }}); : : LogBlockManager::LogBlockManager(Env* env, : FsErrorManager* error_manager, : const BlockManagerOptions& opts) : : mem_tracker_(MemTracker::CreateTracker(-1, : "log_block_manager", : > Rewrite: Done Line 1905: open_block_ids_.erase(lb->block_id()); > I think this control flow (and the corresponding surgery to DoClose) is mor Refactored the code as suggested. Added DoCloseBlocks() in LogBlockContainer. PS19, Line 1910: > Nit: lwb Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a88884bd Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes