[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has submitted this change and it was merged. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Reviewed-on: http://gerrit.cloudera.org:8080/7207 Reviewed-by: Adar DemboTested-by: Adar Dembo --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 615 insertions(+), 453 deletions(-) Approvals: Adar Dembo: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 35 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 34: Verified+1 We can't let IWYU get in the way of progress. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 34: Not sure why IWYU is complaining about including ext/alloc_traits.h for log_block_manager.cc, etc? -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 34: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#34). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 615 insertions(+), 453 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/34 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 33: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 33 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#33). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 602 insertions(+), 437 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/33 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 33 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 132 > There's no avoiding a downcast in LogBlockManager::CloseBlocks; we need it OK sounds good to me, it can stay as is. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (2 comments) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 132 > Adar earlier made the point "it shouldn't be part of the WritableBlock inte There's no avoiding a downcast in LogBlockManager::CloseBlocks; we need it in order to call other non-interface methods on LogWritableBlock. But, if you prefer to reduce the number of downcasts from two to one, some light refactoring could do that. http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS32, Line 1217: if (container_->read_only()) { : state_ = CLOSED; : if (container_->metrics()) { : container_->metrics()->generic_metrics.blocks_open_writing->Decrement(); : container_->metrics()->generic_metrics.total_bytes_written->IncrementBy( : block_length_); : } : return Status::Aborted("container $0 is read-only", container_->ToString()); : } > Makes sense. I don't see a case we would want to Abort() the block if it is I would prefer we didn't change API semantics at this point. Multiple Abort() calls in succession are legal; let's keep them that way (and ensure they no-op). -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 132 Looks like this is being retained as a method on both implementations, but to call it requires a downcast. Should we keep it in the interface so that the downcast isn't necessary? -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS32, Line 1217: if (container_->read_only()) { : state_ = CLOSED; : if (container_->metrics()) { : container_->metrics()->generic_metrics.blocks_open_writing->Decrement(); : container_->metrics()->generic_metrics.total_bytes_written->IncrementBy( : block_length_); : } : return Status::Aborted("container $0 is read-only", container_->ToString()); : } > It's hard to tell whether it's possible to wind up calling Abort() twice on Yeah, it does not hurt to add the checking. But looking at the flow, Abort() is only called when the state_ != CLOSED now. So instead maybe I can add a DCHECK of the state_ at the beginning of Abort()? -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS32, Line 1217: if (container_->read_only()) { : state_ = CLOSED; : if (container_->metrics()) { : container_->metrics()->generic_metrics.blocks_open_writing->Decrement(); : container_->metrics()->generic_metrics.total_bytes_written->IncrementBy( : block_length_); : } : return Status::Aborted("container $0 is read-only", container_->ToString()); : } > Do we need to check if state_ is already CLOSED and if so make this a no-op It's hard to tell whether it's possible to wind up calling Abort() twice on a read-only container given the constraints required to get a container to become read-only, but Mike's suggestion certainly won't hurt. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Mike Percy has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS32, Line 50: close closed -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Mike Percy has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 32: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/32/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS32, Line 1217: if (container_->read_only()) { : state_ = CLOSED; : if (container_->metrics()) { : container_->metrics()->generic_metrics.blocks_open_writing->Decrement(); : container_->metrics()->generic_metrics.total_bytes_written->IncrementBy( : block_length_); : } : return Status::Aborted("container $0 is read-only", container_->ToString()); : } Do we need to check if state_ is already CLOSED and if so make this a no-op and avoid changing the metrics, like we do in DoClose()? i.e. Close(); Abort(); -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 32 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 30: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 30 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 30: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/29/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS29, Line 1229: it would > Seems like 'it' accidentally got dropped. This sentence should read as: Oops, my bad...Thanks for catching it. Updated. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 30 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#30). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 599 insertions(+), 434 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/30 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 30 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 29: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/29/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS29, Line 1229: would be Seems like 'it' accidentally got dropped. This sentence should read as: Here is the reasoning as to why it would be safe to do so. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 29 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 29: (3 comments) http://gerrit.cloudera.org:8080/#/c/7207/28/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS28, Line 352: possibly sync > possibly synchronizing Done PS28, Line 1227: cou > could Done PS28, Line 1229: saf > would be Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 29 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#29). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 599 insertions(+), 434 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/29 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 29 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 28: (3 comments) http://gerrit.cloudera.org:8080/#/c/7207/28/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS28, Line 352: synchronizing possibly synchronizing PS28, Line 1227: can could PS28, Line 1229: is would be -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#28). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 599 insertions(+), 434 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/28 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 27: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/25/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS25, Line 1206: // Do nothing for Abort() other than updating metrics and block state. : // Here is the reasoning why it is safe to do so. Currently, failures in block : // creation can happen at various stages: : / > Shouldn't this still update metrics? And maybe change state to CLOSED? Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#27). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 590 insertions(+), 439 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/27 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#26). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 588 insertions(+), 438 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/26 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 25: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/25/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS25, Line 1206: // Abort the operation for read-only container. : if (container_->read_only()) { : return Status::Aborted("container $0 is read-only", container_->ToString()); : } Shouldn't this still update metrics? And maybe change state to CLOSED? -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#25). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 587 insertions(+), 437 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/25 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 24: (11 comments) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS22, Line 283: return Status::OK(); : } : > nit: It seems superfluous to specify = default here, particularly since the Done http://gerrit.cloudera.org:8080/#/c/7207/23/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS23, Line 325: static const char* kMagic; : : // Creates a new block container in 'dir'. : st > No longer needed? Done PS23, Line 1219: Do > Doing Done PS23, Line 1219: lt > in Done Line 1220: //data-consuming gaps in containers, but these gaps can be cleaned up > "unmanaged" isn't really an existing concept. How about "Doing nothing can Done Line 1223: //orphaned blocks if the metadata is durable. But orphaned blocks can be > orphaned Done PS23, Line 1226: abort ha > abort Done PS23, Line 1227: abort ha > abort Done Line 1228: // avoid large chunks of data-consuming gaps and orphaned blocks. A > "chunks" and "data-consuming gaps". Done PS23, Line 1230: abort, i > abort Done PS23, Line 1232: abort, D > abort Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#24). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 590 insertions(+), 436 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/24 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 23: (10 comments) http://gerrit.cloudera.org:8080/#/c/7207/23/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS23, Line 325: enum SyncMode { : SYNC, : NO_SYNC : }; No longer needed? PS23, Line 1219: Do Doing PS23, Line 1219: it in Line 1220: //unmanaged blocks which leaves gaps in data files. But the unmanaged "unmanaged" isn't really an existing concept. How about "Doing nothing can result in data-consuming gaps in containers, but these gaps can be cleaned up by hole repunching at start up"? Line 1223: //orphan blocks if the metadata is durable. But orphan blocks can be orphaned PS23, Line 1226: abortion abort PS23, Line 1227: abortion abort Line 1228: // avoid large chunk of useless consumed space and unmanaged blocks. A "chunks" and "data-consuming gaps". PS23, Line 1230: abortion abort PS23, Line 1232: abortion abort -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (10 comments) > (1 comment) > > Looks good! I'm very pleased how this came together, net-net I > think the code is easier to understand with this patch. Thank you! Really appreciate yours and Adar's thorough code reviews! http://gerrit.cloudera.org:8080/#/c/7207/22//COMMIT_MSG Commit Message: PS22, Line 15: for > an Done http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS22, Line 34: blocks > block Done Line 45: DEFINE_string(block_manager_flush_control, "finalize", > It sounds scarier than it really is. We should at least clarify that this f Done PS22, Line 46: when to flush > More like when to pre-flush? Done http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS22, Line 70: is > is set to Done PS22, Line 127: is > is set to Done PS22, Line 128: asynchronous flush of dirty block data to disk > Sounds good for performance. What is the crash consistency contract? Can th Yes, in the worst case we could have 'gaps' in the data file, but this should be handled by hole repunching. Also, we only flush data in this case, and always append metadata after data is durable. This ensures metadata never points to garbage data. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 276: // This is called after synchronization of dirty data and metadata > doesn't finalization have to happen before the data is synced to disk? DoClose() is called by DoCloseBlocks(). If closing a group of blocks, the blocks should be finalized. But for a single block it can be called without being finalized. There is a DCHECK at L904. PS22, Line 458: rounds up > "rounds up"? Done PS22, Line 459: it > clarify: is "it" the block or the container? Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#23). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 574 insertions(+), 440 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/23 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Mike Percy has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 45: DEFINE_string(block_manager_flush_control, "finalize", > Would prefer pre-flush to pre-sync. It sounds scarier than it really is. We should at least clarify that this flag doesn't have any affect on durability, which is controlled by other flags. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 355: // If successful, adds all blocks to the block manager's in-memory maps. > Don't think so; the blocks are just manipulated. ah, you're right. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1228: RETURN_NOT_OK(container_->DoCloseBlocks({ this }, LogBlockContainer::SyncMode::NO_SYNC)); > As I brought up on slack I think there may be an opportunity here to simpli Discussed with Dan and Adar offline. We came to the conclusion that it is safe to do nothing in Abort() for now. Because by doing this could either result in 'gaps' in data file or orphan blocks, and both of them can be handled today by hole repunching and blocks garbage collection. I will add a TODO in Abort() so that we can add a fine-grained handing of abortion in future. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (3 comments) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 45: DEFINE_string(block_manager_flush_control, "finalize", > I don't like the term flush because it doesn't distinguish between a memory Would prefer pre-flush to pre-sync. Also note that this is an experimental flag that largely exists for our own benchmarking, so I think the bar for documenting it is low. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 355: // If successful, adds all blocks to the block manager's in-memory maps. > and takes ownership of the LogWritableBlock instances, right? Don't think so; the blocks are just manipulated. PS22, Line 1119: due to KUDU-1793 > That issue is marked as RESOLVED. Is this still an issue? Yes, because KUDU-1793 existed in the wild in at least one release, so we may load on-disk deployments with misaligned blocks. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Mike Percy has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (13 comments) Looks pretty good to me, I mainly have nits. http://gerrit.cloudera.org:8080/#/c/7207/22//COMMIT_MSG Commit Message: PS22, Line 15: for an http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS22, Line 34: blocks block Line 45: DEFINE_string(block_manager_flush_control, "finalize", I don't like the term flush because it doesn't distinguish between a memory buffer flush vs. an fsync. Also, from the description here it sounds like 'never' means the blocks will never be fsynced. We should clarify that this is an implicit pre-flush or pre-sync, that the global settings for syncing will still be used at close time. Perhaps we should call this flag block_manager_preflush_control instead, or block_manager_early_sync_control. PS22, Line 46: when to flush More like when to pre-flush? http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS22, Line 70: is is set to PS22, Line 127: is is set to PS22, Line 128: asynchronous flush of dirty block data to disk Sounds good for performance. What is the crash consistency contract? Can this leak disk space? PS22, Line 283: BlockTransaction() = default; : : ~BlockTransaction() = default; nit: It seems superfluous to specify = default here, particularly since these are not something special like copy or move constructors. Am I missing something? http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 276: // This is called after synchronization of dirty data and metadata doesn't finalization have to happen before the data is synced to disk? Line 355: // If successful, adds all blocks to the block manager's in-memory maps. and takes ownership of the LogWritableBlock instances, right? PS22, Line 458: rounds up "rounds up"? PS22, Line 459: it clarify: is "it" the block or the container? PS22, Line 1119: due to KUDU-1793 That issue is marked as RESOLVED. Is this still an issue? -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (1 comment) Looks good! I'm very pleased how this came together, net-net I think the code is easier to understand with this patch. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1228: RETURN_NOT_OK(container_->DoCloseBlocks({ this }, LogBlockContainer::SyncMode::NO_SYNC)); As I brought up on slack I think there may be an opportunity here to simplify DoClose and make Abort more efficient by doing nothing during abort calls (instead of writing a create + delete metadata entry). -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 883: > Should be with a pointer (*), actually. Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (7 comments) http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS21, Line 522: Random rand(SeedRandom()); : vectordirty_bl > This is no longer true and can be removed, right? Right. Removed. http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 63: DECLARE_bool(block_manager_lock_dirs); > Not needed anymore? Done http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 79: DECLARE_bool(enable_data_block_fsync); > Can be removed. Done PS21, Line 528: ck is ful > , marking Done PS21, Line 905: > Use DCHECK_EQ Done PS21, Line 1054: e conta > depending Done PS21, Line 1054: n isn' > place Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#22). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 575 insertions(+), 449 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/22 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 21: (8 comments) Almost there, just a few nits left. Dan, since the implementation has changed quite a bit, do you want to make another pass? http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS21, Line 522: // Disable preallocation for this test as it creates many containers. : FLAGS_log_container_preallocate_bytes = 0; This is no longer true and can be removed, right? http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 63: DECLARE_bool(block_coalesce_close); Not needed anymore? http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 883: > Done Should be with a pointer (*), actually. http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 79: DECLARE_bool(block_coalesce_close); Can be removed. PS21, Line 528: . Marking , marking PS21, Line 905: DCHECK Use DCHECK_EQ PS21, Line 1054: places place PS21, Line 1054: depends depending -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#21). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 575 insertions(+), 444 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/21 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 21: (30 comments) http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: --nu > You still need 'localhost' though; that's not an optional parameter I think Done http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 282: AtomicInt total_bytes_written_; > Should be Done PS20, Line 297: vectorall_dirty_blocks; : for (int i = 0; i < FLAGS_block_group_number; i++) { > You changed the pointer/ref style here. Done Line 310: > Update. Done Line 311: dirty_blocks.emplace_back(std::move(block)); > const auto& Done Line 318: // random, and write a smaller chunk of data to it. > auto& Done Line 326: // Write a small chunk of data. > const auto& Done http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS20, Line 44: // throughput but may improve write latency. : DEFINE_string(block_manager_flush_control, "finalize", : "Controls when to flush a block. Valid values are 'finalize', " : "'close', or 'never'. If 'finalize', blocks will be flushed " : "when writing is finished. If 'close', blocks will be > Rewrite: Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1410: > Okay, but why bother maintaining FlushDataAsync() as a separate method? It Makes sense. Updated. But still kept the per-impl methods in LogWritableBlock/FileWritableBlock since calling Flush() from LogBlockManager/FileBlockManager will result in more code if not. http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 260: onst OVERRIDE; > finalizing it if it has not yet been finalized. Done PS20, Line 260: > "Also updates" Done PS20, Line 336: c > , Done PS20, Line 337: > the Done PS20, Line 434: // Malformed records and other container inconsistencies are written to : // 'report'. Healthy blocks are written either to 'live_blocks' or : // 'dead_blocks'. Live records are written to 'live_block_recor > Nit: would prefer to keep BlockCreated and BlockDeleted next to each other Done PS20, Line 543: // Offset up to which we have preallocated bytes. : int64_t preallocated_offset_ = 0; > Since all counters are now atomic, these comments are no longer interesting Done PS20, Line 878: at as a d > "data" (to be consistent with "Syncing metadata file") Done PS20, Line 882: > Replace with "is already made durable". Done PS20, Line 883: > const auto* Done PS20, Line 890: if (mode == SYNC) > I don't think this one needs to be conditioned on SYNC. Done Line 891: > Would be nice if we could DCHECK that if blocks.size() > 1, all of the bloc Done PS20, Line 1040: preallocated_offset_ = off + len; : } : > This needs to be updated. Done Line 1068: DCHECK_GE(block_offset, 0); > Don't need the if statement anymore. Done PS20, Line 1071: // means accounting for the new block should be as simple as adding its : // aligned length to 'next > Let's defer this to BlockCreated(). Then RoundUpContainer can be renamed to Done PS20, Line 1206: etrics > Nit: { this } Done Line 1216: } > Nit: { this } Done PS20, Line 1289: } : : Status LogWritableBl > Rewrite: "We do not mark the container as read-only if FlushDataAsync() fai Done Line 1317: } > 'lbm' is only used once; don't bother storing it in a local variable. Done PS20, Line 1319: state() const { > Just use block_length_ here; it's more clear. Done PS20, Line 1319: te L > and block_id_ here. Done PS20, Line 1751: ol == "close") { > Nit: Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 20: (30 comments) http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: s-per > Done You still need 'localhost' though; that's not an optional parameter I think. I was just saying you could remove the port, which is optional. http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 282: Slice seed_slice(reinterpret_cast(), sizeof(seed)); Should be PS20, Line 297: WritableBlock *block = dirty_blocks[next_block_idx].get(); : Random = dirty_block_rands[next_block_idx]; You changed the pointer/ref style here. Line 310: // Close all dirty blocks. Update. Line 311: for (const auto _block : dirty_blocks) { const auto& Line 318: for (auto _block : dirty_blocks) { auto& Line 326: for (const auto : all_dirty_blocks) { const auto& http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS20, Line 44: "Control when to flush a block, either at 'finalize', 'close'," : " or 'never'. 'finalize' indicates to flush when writing to " : "the block is finished. 'close' indicates to defer the flushing " : "to when the entire BlockTransaction, that the blocks belong " : "to, is committed. And 'never' is not flush at all."); Rewrite: Controls when to flush a block. Valid values are 'finalize', 'close', or 'never'. If 'finalize', blocks will be flushed when writing is finished. If 'close', blocks will be flushed when their transaction is committed. If 'never', blocks will never be flushed. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1410: } > Updated. To clarify, I did this because we were saying 'Finalize indicates Okay, but why bother maintaining FlushDataAsync() as a separate method? It certainly shouldn't be part of the WritableBlock interface because there's no need for anyone outside the block manager to know about the distinction. I suppose you could keep it as per-impl methods in LogWritableBlock/FileWritableBlock, but it's just one line of code, so the decomposition doesn't seem that useful. http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 260: finalize it if has not finalizing it if it has not yet been finalized. PS20, Line 260: And update "Also updates" PS20, Line 336: . , PS20, Line 337: th the PS20, Line 434: // Finalize a fully written block. It round up this container, truncate it if full : // and mark the container as available. : void FinalizeBlock(int64_t block_offset, int64_t block_length); Nit: would prefer to keep BlockCreated and BlockDeleted next to each other since they're fairly symmetric. PS20, Line 543: // : // Declared atomic because it is mutated from BlockDeleted(). Since all counters are now atomic, these comments are no longer interesting. PS20, Line 878: container "data" (to be consistent with "Syncing metadata file") PS20, Line 882: does so at any given point Replace with "is already made durable". Or rewrite as "Append metadata only after data is synced so that there's no chance of metadata landing on the disk before the data." PS20, Line 883: const auto & const auto* PS20, Line 890: if (mode == SYNC) I don't think this one needs to be conditioned on SYNC. Line 891: for (LogWritableBlock* block : blocks) { Would be nice if we could DCHECK that if blocks.size() > 1, all of the blocks are FINALIZED. If they aren't, we have a big problem: we'll end up calling FinalizeBlock() on the same container multiple times from within DoClose() and could corrupt on-disk data via truncate calls. PS20, Line 1040: // Note that this take places _after_ the container has been synced to disk. : // That's OK; truncation isn't needed for correctness, and in the event of a : // crash or error, it will be retried at startup. This needs to be updated. Line 1068: if (PREDICT_TRUE(new_next_block_offset >= next_block_offset())) { Don't need the if statement anymore. PS20, Line 1071: total_bytes_ .IncrementBy(fs_aligned_length(block_offset, block_length)); : total_blocks_.Increment(); Let's defer this to BlockCreated(). Then RoundUpContainer can be renamed to the more clear UpdateNextBlockOffset. PS20, Line 1206: {this} Nit: { this } Line 1216: RETURN_NOT_OK(container_->DoCloseBlocks({this}, LogBlockContainer::SyncMode::NO_SYNC)); Nit:
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 20: (26 comments) http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: s-per > Nit: can just omit this. Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 38: DECLARE_bool(cfile_write_checksums); > warning: redundant 'FLAGS_block_time_to_flush' declaration [readability-red Done Line 892: unique_ptr sink; > What happened to parameterizing this test for all block_time_to_flush value Refactored the code and the flag specific code in Cfile are removed. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS17, Line 266: RETURN_NOT_OK(block_->Finalize()); : transaction->AddCreatedBlock(std::move(block_)); : return Status::OK(); : } : : void CFileWriter::AddMetadataPair(const Slice , const Slice ) { : C > This is incorrect. We should always Finalize() the block; whether or not we Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 43: DEFINE_string(block_manager_flush_control, "finalize", > Let's rename to "block_manager_flush_control": Done Line 43: DEFINE_string(block_manager_flush_control, "finalize", > warning: redundant 'FLAGS_block_time_to_flush' declaration [readability-red Done Line 44: "Control when to flush a block, either at 'finalize', 'close'," > This should apply to any WritableBlock, not just those in a BlockTransactio Done PS17, Line 48: neve > Let's use "never" instead, since we're talking about a point in time. Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 32: namespace kudu { > This doesn't belong here (block_coalesce_close didn't belong either). Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 340: Status FileWritableBlock::Finalize() { > This is where we should check the value of block_time_to_flush. We should o Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1176: deleted_ = true; > I mentioned this to Dan in our impromptu meeting yesterday; perhaps you wer Done PS17, Line 1397: ReadableBlock::Close() { > You can use 'lbm' here. Done Line 1410: } > And here we also need to gate on block_time_to_flush == 'finalize'. Updated. To clarify, I did this because we were saying 'Finalize indicates Flushing'. But after thought more, I think your suggestion makes more sense. So I separated the flushing (I kept FlushDataAsync()) from Finalize and use the flag to control when to call FlushDataAsync(). http://gerrit.cloudera.org:8080/#/c/7207/19/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 245: > Don't need this; just change "enum CloseFlag" to "enum CloseFlags". Done Line 499: // Note: 'record' may be swapped into 'report'; do not use it after calling > Don't need this; it's only ever used internally (by LogBlockContainer). Done Line 582: total_bytes_(0), > Can we get by with a boolean? Seems like we could set it in RoundUpContaine Removed as is not necessary. PS19, Line 594: } : > I don't understand what protection the lock offers. Furthermore, PosixRWFil Discussed this with Adar offline. I will remove this lock as PosixRWFile::Sync() already tries to keeps track of whether a sync is necessary. Meanwhile, fix KUDU-2102 in another commit. Line 906: } > You should add a note here about mutating result_status on failure, like in Refactored the code and removed FinishBlock. Line 939: RETURN_NOT_OK_HANDLE_ERROR(data_dir_->RefreshIsFull(DataDir::RefreshMode::ALWAYS)); > I don't understand this, and it's making me nervous. Done PS19, Line 955: Status LogBlockContainer::AppendMetadata(const BlockRecordPB& pb) { : DCHECK(!read_only()); : // Note: We don't check for sufficient disk space for metadata writes in : // order to allow for block deletion on full disks. : RETURN_NOT_OK_HANDLE_ERROR(metadata_file_->Append(pb)); : return Status::OK(); : } : > Here's another case where, even though FinishBlock() can be called concurre Done Line 1076: "Container $0 with size $1 is now full, max size is $2", > These two accesses to next_sync_offset_ aren't atomic. Use StoreMax() inste Removed. PS19, Line 1173: : void LogBlock::Delete() { : DCHECK(!deleted_); : deleted_ = true; : }
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#20). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 563 insertions(+), 435 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/20 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 19: (14 comments) I think I reviewed everything, but I'm curious to see how this will evolve following my suggestions. http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: : Nit: can just omit this. http://gerrit.cloudera.org:8080/#/c/7207/19/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 245: typedef int CloseFlags; Don't need this; just change "enum CloseFlag" to "enum CloseFlags". Line 499: int64_t next_sync_offset() const { return next_sync_offset_.Load(); } Don't need this; it's only ever used internally (by LogBlockContainer). Line 582: // The offset of the container that next sync should start with. Can we get by with a boolean? Seems like we could set it in RoundUpContainer when the next_block_offset_ increases and clear it when we sync. PS19, Line 594: // Protects sync data operation to avoid unnecessary I/O. : Mutex lock_; I don't understand what protection the lock offers. Furthermore, PosixRWFile::Sync() already keeps track of whether a sync is actually necessary. Dan and I had a conversation about PosixRWFile::Sync. The use of pending_sync_ as an optimization is unsafe as it can cause the "losing" thread in a race to return early under the assumption that the data has been made durable when it hasn't (yet). I filed KUDU-2102 for this; feel free to tackle that next if you like. Line 906: auto cleanup = MakeScopedCleanup([&]() { You should add a note here about mutating result_status on failure, like in LogWritableBlock::DoClose(). Line 939: int64_t offset = mode == ROUND_UP ? next_block_offset() : block_offset; I don't understand this, and it's making me nervous. Let's simplify by exposing and using LogWritableBlock's offset. Hopefully we can do it without modifying WritableBlock itself (may need to down_cast earlier and deal with LogWritableBlocks through and through). PS19, Line 955: if (mode == ROUND_UP) { : WARN_NOT_OK(TruncateDataToNextBlockOffset(), : "could not truncate excess preallocated space"); : : if (full() && block_manager()->metrics()) { : block_manager()->metrics()->full_containers->Increment(); : } : } Here's another case where, even though FinishBlock() can be called concurrently, we would never want to call TruncateDataToNextBlockOffset() concurrently, and the ROUND_UP check prevents us from doing that. But it's tricky to follow 'mode' through the various call chains to prove this guarantee. Line 1076: if (next_sync_offset() < cur_next_block_offset) { These two accesses to next_sync_offset_ aren't atomic. Use StoreMax() instead. But since you can make this a bool, a simple CAS should do the trick. Look at how PosixRWFile uses pending_sync_. PS19, Line 1173: int64_t new_next_block_offset = KUDU_ALIGN_UP( : block_offset + block_length, : instance()->filesystem_block_size_bytes()); : if (PREDICT_FALSE(new_next_block_offset < next_block_offset())) { : LOG(WARNING) << Substitute( : "Container $0 unexpectedly tried to lower its next block offset " : "(from $1 to $2), ignoring", : ToString(), next_block_offset(), new_next_block_offset); : } else { : next_block_offset_.Store(new_next_block_offset); : } I keep circling back to this as a potential synchronization issue. The comparison and subsequent assignment to next_block_offset_ is not atomic. It didn't need to be prior to your change because BlockCreated() was guaranteed to be called by just one thread at a time. It may not need to be now, but I'm finding it really tough (based on tracing through the code paths) to guarantee that, due to the proliferation of RoundMode in various deep methods. I think this would be safer if we used StoreMax() instead. And maybe total_bytes_ and total_blocks_ should be atomic too. Line 1188: int64_t LogBlockContainer::fs_aligned_length(int64_t block_offset, Surely we needn't duplicate this from LogBlock; if you need it at the container level (for operations where LogBlocks may not exist), can't we move it instead? In general please try a little harder to avoid duplicating code, especially code with large comments like this. PS19, Line 1464: if ((flags & SYNC) && : (state_ == CLEAN || state_ == DIRTY || state_ == FINALIZED)) { : VLOG(3) << "Syncing block " << id(); : : // TODO(unknown): Sync just this block's dirty data.
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#19). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost: --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 594 insertions(+), 310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/19 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#18). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost: --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 591 insertions(+), 310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/18 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 17: (10 comments) I think I'm done reviewing everything outside log_block_manager.cc. Still a lot to take in there. I've been making my way through it, but thought I'd publish my comments so far. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 892: TEST_P(TestCFileBothCacheTypes, TestReleaseBlock) { What happened to parameterizing this test for all block_time_to_flush values? In my earlier comment I suggested that doing so would be moot only if there was no block_time_to_flush-specific code in here. But there is such code, so we should do the parameterization so as to avoid bit-rot. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS17, Line 266: if (FLAGS_block_time_to_flush == "finalize") { : RETURN_NOT_OK(block_->Finalize()); : } else if (FLAGS_block_time_to_flush != "close" && : FLAGS_block_time_to_flush != "none") { : LOG(FATAL) << "Unknown value for block_time_to_flush: " :<< FLAGS_block_time_to_flush; : } This is incorrect. We should always Finalize() the block; whether or not we flush inside Finalize() is determined by the block manager (and the value of this gflag). http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 43: DEFINE_string(block_time_to_flush, "finalize", > warning: redundant 'FLAGS_block_time_to_flush' declaration [readability-red Let's rename to "block_manager_flush_control": 1. "block_manager_" is the prefix we've been using for other BM-related flags. 2. "time_to_flush" sounds conspicuously like time to live and other such numerical values. Line 44: "When to flush blocks registered in a BlockTransaction. " This should apply to any WritableBlock, not just those in a BlockTransaction. PS17, Line 48: none Let's use "never" instead, since we're talking about a point in time. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 32: DECLARE_string(block_time_to_flush); This doesn't belong here (block_coalesce_close didn't belong either). http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 340: RETURN_NOT_OK_HANDLE_ERROR(writer_->Flush(WritableFile::FLUSH_ASYNC)); This is where we should check the value of block_time_to_flush. We should only flush if it's 'finalize'. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1176: LOG(WARNING) << Substitute( I mentioned this to Dan in our impromptu meeting yesterday; perhaps you weren't yet there: we should remove this WARNING because with out-of-order block metadata on disk it'll fire more frequently. PS17, Line 1397: container_->block_manager() You can use 'lbm' here. Line 1410: RETURN_NOT_OK(container_->FlushData(block_offset_, block_length_)); And here we also need to gate on block_time_to_flush == 'finalize'. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 16: > (5 comments) > > Hao, Dan, and I had a long discussion about this patch, and I > wanted to reproduce our conclusions here for everyone else: > > 1. The cfile and block performance-related knobs are useful for > testing, but are not as important as having a good API and a robust > implementation. IIUC, the resplitting of FlushDataAsync and > Finalize was due to retaining all the existing flags; let's not do > this. Let's stick to the earlier approach (where Finalize implies a > flush), and adjust the flags if need be. > > 2. What flags are worth preserving? We identified the need for just > one, which dictates when a block's data should be flushed. It has > three values: "finalize" (default value), "close" (defers the flush > to when the entire transaction is committed), and "none" (doesn't > flush at all, "flushing" will happen when the transaction commits > and files are fsynced). > > 3. When should AppendMetadata be called? One option is to do it at > Finalize time. Another is to defer it to Close. The problem with > doing it during Finalize is that it exacerbates the "1. AppendData, > 2. AppendMetadata, 3. SyncData, 4. SyncMetadata" race: if we crash > between #2 and #3, we may end up with just the metadata on disk, > and if we AppendMetadata during Finalize, we have many more blocks' > worth of metadata that can land on disk without corresponding data. > > 4. But AppendMetadata is called during Close, we may end up with > out-of-order metadata entries on disk, since transactions could > commit out of order. Do we care? We don't think so; the LBM > implementation should already be robust to this. An alternative is > to use an in-memory buffer to retain metadata order, but this > didn't seem worth the complexity. > > 5. What to do about zero length blocks? Should their metadata be > written, or skipped? The answer is: it must be written, or attempts > to Close such blocks should fail. If for whatever reason we skipped > the metadata append, we could end up with block IDs whose blocks > can't be found on startup. Thanks a lot Adar for the summary! -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 16: (4 comments) http://gerrit.cloudera.org:8080/#/c/7207/16/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 423: class TestCFileBothCacheTypesCloseTypes : public TestCFile, > You should be able to use this to replace TestCFileBothCacheTypes. Here's w Done Line 449: if (desc.close_type == CLOSE) FLAGS_cfile_close_blocks_on_finish = true; > If the default value of this gflag changes to true, we'll never test the fa Done http://gerrit.cloudera.org:8080/#/c/7207/16/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS16, Line 59: // - users could always change this to "close", which slows down throughput : // but may improve write latency. > This part isn't quite right anymore. Should the entire comment be rewritten Done PS16, Line 63: > Nit: got an extra space here. Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#17). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 591 insertions(+), 309 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/17 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 10: (5 comments) Hao, Dan, and I had a long discussion about this patch, and I wanted to reproduce our conclusions here for everyone else: 1. The cfile and block performance-related knobs are useful for testing, but are not as important as having a good API and a robust implementation. IIUC, the resplitting of FlushDataAsync and Finalize was due to retaining all the existing flags; let's not do this. Let's stick to the earlier approach (where Finalize implies a flush), and adjust the flags if need be. 2. What flags are worth preserving? We identified the need for just one, which dictates when a block's data should be flushed. It has three values: "finalize" (default value), "close" (defers the flush to when the entire transaction is committed), and "none" (doesn't flush at all, "flushing" will happen when the transaction commits and files are fsynced). 3. When should AppendMetadata be called? One option is to do it at Finalize time. Another is to defer it to Close. The problem with doing it during Finalize is that it exacerbates the "1. AppendData, 2. AppendMetadata, 3. SyncData, 4. SyncMetadata" race: if we crash between #2 and #3, we may end up with just the metadata on disk, and if we AppendMetadata during Finalize, we have many more blocks' worth of metadata that can land on disk without corresponding data. 4. But AppendMetadata is called during Close, we may end up with out-of-order metadata entries on disk, since transactions could commit out of order. Do we care? We don't think so; the LBM implementation should already be robust to this. An alternative is to use an in-memory buffer to retain metadata order, but this didn't seem worth the complexity. 5. What to do about zero length blocks? Should their metadata be written, or skipped? The answer is: it must be written, or attempts to Close such blocks should fail. If for whatever reason we skipped the metadata append, we could end up with block IDs whose blocks can't be found on startup. http://gerrit.cloudera.org:8080/#/c/7207/16/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 423: TEST_P(TestCFileBothCacheTypes, TestWrite100MFileInts) { You should be able to use this to replace TestCFileBothCacheTypes. Here's what I think you'll need to do: 1. Rewrite TestCFileBothCacheTypes to be an empty subclass of TestCFileBothCacheCloseTypes. 2. The first call to INSTANTIATE_TEST_CASE_P should be for TestCFileBothCacheTypes and its values should be two CFileDescriptors that vary the CacheType but both use the CLOSE CloseType. 3. The second call to INSTANTIATE_TEST_CASE_P should be for TestCFileBothCacheCloseTypes and should pass all four CFileDescriptors. After our long in-person discussion about other changes to this patch, it's possible all of this becomes moot because there's no longer a cfile flag to test. Line 449: If the default value of this gflag changes to true, we'll never test the false case. You should use a switch statement and explicitly set it to either true or false. After our long in-person discussion about other changes to this patch, it's possible all of this becomes moot because there's no longer a cfile flag to test. http://gerrit.cloudera.org:8080/#/c/7207/16/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS16, Line 59: // - users could always change this to "close", which slows down throughput : // but may improve write latency. This part isn't quite right anymore. Should the entire comment be rewritten in light of the split into two gflags? After our long in-person discussion about other changes to this patch, it's possible this becomes moot because there's no longer a cfile flag. Though you may still need to move this explanation (and rewrite it). PS16, Line 63: Nit: got an extra space here. http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: Line 1288: // Ensure the same container has not been marked as available twice. > I was trying to check in the available_containers_by_data_dir_ map, there i Ah, I forgot that available_containers_by_data_dir_::value_type is a vector. So indeed, if we marked the container as available twice, you'd see two entries in that vector. Makes sense. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#16). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 734 insertions(+), 304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/16 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/7207/14/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 396: Status FileWritableBlock::FlushAndFinalize(FlushMode mode) { > Good catch, thanks! Do you think it is better to remove 'FLAGS_block_manage Good question, it does seem a bit confused to have the two flags. I'm not entirely sure what we're using the two for at the moment, I'd want Adar to weigh in. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 278: Status FinishBlock(const Status& s, WritableBlock* block, > Thanks for pointing out! I personally prefer to keep it as the way it is, OK sounds good to me. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/7207/14/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 282: dirty_blocks.emplace_back(std::move(block)); > use emplace_back with r-value references. Done http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 278: Status FinishBlock(const Status& s, WritableBlock* block, > Yah I didn't realize it was being called in a closure. Still, you could re Thanks for pointing out! I personally prefer to keep it as the way it is, because inside FinishBlock(), if the input status is not ok, we need to mark the container as read-only. Or maybe you have other thought? http://gerrit.cloudera.org:8080/#/c/7207/14/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1524: RETURN_NOT_OK_PREPEND(s, "unable to append block metadata"); > here and abve, start the message with a lowercase ('unable', 'container'). Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#15). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 734 insertions(+), 304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/15 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 14: (4 comments) http://gerrit.cloudera.org:8080/#/c/7207/14/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 282: dirty_blocks.push_back(std::move(block)); use emplace_back with r-value references. http://gerrit.cloudera.org:8080/#/c/7207/14/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 396: Status FileWritableBlock::FlushAndFinalize(FlushMode mode) { Hmm I'm not sure this is doing what we want when block_manager_flush_on_finalize = false. Consider this sequence of events: FLAGS_block_manager_flush_on_finalize = false; FLAGS_block_coalesce_close = true; block.Finalize(); // Finalizes the block, but doesn't flush it block_manager.CloseBlocks( { move(block) } ); // Doesn't flush the block, since it's already finalized. I think in that case CloseBlocks actually should async flush the block. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 278: Status FinishBlock(const Status& s, WritableBlock* block, > Sorry, I do not quite follow your comment. You mean FinishBlock is always c Yah I didn't realize it was being called in a closure. Still, you could remove the Status param and change line 1431 to be: s = s.AndThen([&] { return container_->FinishBlock(this, round_mode, block_offset_); }); http://gerrit.cloudera.org:8080/#/c/7207/14/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1524: RETURN_NOT_OK_PREPEND(s, "Unable to append block metadata"); here and abve, start the message with a lowercase ('unable', 'container'). -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 14: (17 comments) http://gerrit.cloudera.org:8080/#/c/7207/13//COMMIT_MSG Commit Message: PS13, Line 10: new blo > s/creates which happen/new blocks Done PS13, Line 15: > no comma here Done PS13, Line 15: reused > reused Done Line 19: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing > I think you may need to make this viewable from anyone (I tried in an incog Done http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS13, Line 74: resulti > s/results/resulting Done Line 88: // No more data may be written to the block, but it is not yet guaranteed > 'not yet closed' doesn't really distinguish this state from CLOSED. I thin Done Line 281: // Group a set of block writes together in a transaction. This has two > prefer 'BlockTransaction() = default;' Done Line 283: // synchronization for a batch of blocks if possible to achieve better > Can't this just be set to the default dtor? Done PS13, Line 288: > emplace_back Done PS13, Line 293: > Might be more clear as 'CommitCreatedBlocks'. Done Line 305: Status s = bm->CloseBlocks(created_blocks_); > Would it be difficult to change BlockManager::CloseBlocks to take a 'const Updated it as you suggested. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 727: > I think this should be doing an async flush regardless of the block_manager Done http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 278: Status FinishBlock(const Status& s, WritableBlock* block, > It looks like the only place that calls FinishBlock calls it with an OK sta Sorry, I do not quite follow your comment. You mean FinishBlock is always called with 's' as OK status? It does not seem to be the case, either Sync() or AppendMetadata() can return bad status, right? PS13, Line 522: unnecessary I > unnecessary Done Line 1234: // Actually close the block, possibly synchronizing its dirty data and > Could DoClose and CloseFlags be private? Hmm, they are used in other class such as LogBlockManager. Line 1911: record.set_timestamp_us(GetCurrentTimeMicros()); > Same thing here - I think we want to do this depending only on the block_co Done PS13, Line 1917: that > s/belong/belonging Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#14). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 734 insertions(+), 304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/14 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 13: (19 comments) http://gerrit.cloudera.org:8080/#/c/7207/5//COMMIT_MSG Commit Message: PS5, Line 12: potentionaly typo: potentially http://gerrit.cloudera.org:8080/#/c/7207/13//COMMIT_MSG Commit Message: PS13, Line 10: creates s/creates which happen/new blocks PS13, Line 14: I FinalizeWrite() t add commas (API, FinalizeWrite(), to...) PS13, Line 15: resued reused PS13, Line 15: , no comma here Line 19: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing I think you may need to make this viewable from anyone (I tried in an incognito window and it asked me to log into my cloudera gmail account). http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS13, Line 74: results s/results/resulting Line 88: // No more data may be written to the block, but it is not yet closed. 'not yet closed' doesn't really distinguish this state from CLOSED. I think this would be more clear as 'but it is not yet guaranteed to be durably stored on disk'. Line 281: BlockTransaction() {} prefer 'BlockTransaction() = default;' Line 283: ~BlockTransaction() { Can't this just be set to the default dtor? ~BlockTransaction() = default; PS13, Line 288: push_back( emplace_back PS13, Line 293: CommitCreation Might be more clear as 'CommitCreatedBlocks'. Line 305: Status s = bm->CloseBlocks(blocks); Would it be difficult to change BlockManager::CloseBlocks to take a 'const vector&' so you don't have to make a vector copy here? If that would require changing a bunch of other call sites, feel free to ignore. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 727: // dirty pages if 'block_manager_flush_on_finalize' is true. I think this should be doing an async flush regardless of the block_manager_flush_on_finalize flag value. Unfortunately I think that means you will need to copy most of the body of Finalize and remove the check (maybe abstract it out?), but I think we definitely want to parallel flushing to happen here. block_coalesce_close shouldn't be tied to the flush on finalize flag. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 278: Status FinishBlock(const Status& s, WritableBlock* block, It looks like the only place that calls FinishBlock calls it with an OK status, so I think the s param can be removed. PS13, Line 522: unnecessarily unnecessary Line 1234: Status DoClose(CloseFlags flags); Could DoClose and CloseFlags be private? Line 1911: // dirty pages if 'block_manager_flush_on_finalize' is true. Same thing here - I think we want to do this depending only on the block_coalesce_close flag. PS13, Line 1917: belong s/belong/belonging -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#13). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate creates which happen in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API FinalizeWrite() to Block Manager, so that for LBM container can be resued, without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 18 files changed, 657 insertions(+), 262 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/13 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 12: (38 comments) http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 900: } > Could you further parameterize the test so that we run with both values of Done http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS10, Line 62: "Whether to finalize or close cfile blocks when writing " : "is finished. "); > "Whether to finalize or close cfile blocks when writing is finished." Done http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 998: TYPED_TEST(BlockManagerTest, ConcurrentCloseFinalizedWritableBlockTest) { > kTestData for this. Above and below too. Done Line 1002: // written, and then Close() it. > Since these operations happen on other threads, you should use CHECK_OK ins Done Line 1007: CHECK_OK(writer->Append(kTestData)); > Maybe inject a tiny (random) bit of sleeping between Finalize() and Close() Done PS10, Line 1010: CHECK_OK(writer->Close()); : } : }; : : vector threads; : for (int i = 0; i < 100; i++) { : threads.emplace_back(write_data); : } : for (auto& t : threads) { : > Use std::thread; they're less LOC and thus better suited for tests. Done Line 1032: ASSERT_OK(block->Read(0, )); > ASSERT_EQ Done PS10, Line 1043: a > "some" instead, so that if 20 changes to 30 or 40, we don't have to update Done Line 1053: ASSERT_OK(writer->Append(kTestData)); > What's this about? My bad, comment it out for testing, but forget to uncomment back. http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 22: #include > What happened here? The older style (and location) was correct. Done PS10, Line 73: 1) flushing of dirty : //blocks a > Not sure what this is saying; please reword. Done Line 88: // No more data may be written to the block, but it is not yet closed. > "No more data may be written to the block, but it is not yet closed." Done PS10, Line 123: // Signals that the block will no longer receive writes. Does not guarantee : // durability; Close() must still be called for t > "Signals that the block will no longer receive writes. Does not guarantee d Done PS10, Line 275: block > block Done PS10, Line 276: 1) the underlying block manager can optimize > Expand on this a bit (by describing in vagaries that the underlying block m Done PS10, Line 277: possi > logical Done Line 279: class BlockTransaction { > Nit: indent by one char. Done Line 312: } > Nit: indent by one char. Done http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: PS10, Line 335: : if (state_ == FINALIZED) { : return Status::OK(); : } : VLOG(3) << "Finalizing block " << id(); : if (state_ == DIRTY) { : if (FLAGS_block_manager_flush_on_finalize) { : > How about inverting this: Done http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: Line 38: #include "kudu/gutil/stringprintf.h" > Why is this include needed? Done http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: Line 1274: } > I'd also close all the blocks in 'blocks' and check all_containers_by_name_ Done Line 1277: for (const auto& block : blocks) { > Isn't this state transition already tested by BlockManagerTest.WritableBloc I combined this test with the test case above. Let me know if you still think it is redundant. Thanks! Line 1288: } // namespace fs > How does L1289 check this? I was trying to check in the available_containers_by_data_dir_ map, there is only one container for the corresponding datadir(should be the first element of the map). Maybe there is a more elegant way to do so? http://gerrit.cloudera.org:8080/#/c/7207/10/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 129: using pb_util::WritablePBContainerFile; > There's an existing internal::LogWritableBlock reference in the file; pleas Done Line 513: // The metrics. Not owned by the log container; it has the same lifespan > Please use AtomicInt rather than std::atomic, since that's what's already b Done Line 984: // is we sync the data/metadata more often than needed. > We shouldn't hold spinlocks while doing IO, because IO is
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#12). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate creates which happen in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API FinalizeWrite() to Block Manager, so that for LBM container can be resued, without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 18 files changed, 662 insertions(+), 267 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/12 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#11). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate creates which happen in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API FinalizeWrite() to Block Manager, so that for LBM container can be resued, without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/util/scoped_cleanup-test.cc 19 files changed, 627 insertions(+), 268 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/11 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#10). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate creates which happen in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API FinalizeWrite() to Block Manager, so that for LBM container can be resued, without flushing in-flight writable blocks to disk. A design doc can be found here: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 19 files changed, 548 insertions(+), 244 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/10 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/7207/9/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 38: DECLARE_bool(cfile_close_block_on_finish); > warning: redundant 'FLAGS_block_cache_type' declaration [readability-redund Done http://gerrit.cloudera.org:8080/#/c/7207/9/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 22: #include > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/7207/9/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1229: // metadata to disk. > warning: function 'kudu::fs::internal::LogWritableBlock::DoClose' has a def Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes