Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17942 )
Change subject: [LBM] Speed up server bootstrap by using multi-thread to compact containers ...................................................................... Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/17942/2/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/17942/2/src/kudu/fs/log_block_manager.cc@2151 PS2, Line 2151: container_result->need_repunching_blocks.begin(), > 1. In LogBlockManager::Repair, we will wait all metadata compact workers co Ah, sure. My point is the following: dd->ExecClosure in the original code is already scheduling a task on a pool (i.e. running the task asynchronously) if the pool has enough capacity, otherwise it's executing the task in this thread synchronously. Introducing a new thread pool helps to solve the problem of waiting on some particular task in this loop to be run synchronously while other tasks might be still scheduled on theirs thread pools when those pools still have enough capacity, right? There is nothing else which is run in parallel since the actual repair tasks are run on those dir's pools anyways. So, my point was: why not just to replace ExecClosure() with some other method which only schedules the task if the underlying dir's pool has capacity, and otherwise it just makes a mark that some more tasks need to be scheduled and loops over (i.e. continues looping through the cycle) until all the tasks are scheduled. If no other tasks can be scheduled for async execution, it simply sleeps a bit (ideally, it should await on a conditional variable on the completion of one of the tasks scheduled prior). This way there is no need to create extra threads, but the blocking dd->ExecClosure() is removed and allows for scheduling and running many tasks simultaneously. http://gerrit.cloudera.org:8080/#/c/17942/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/17942/5/src/kudu/fs/log_block_manager.cc@131 PS5, Line 131: Amount of latency in ms to inject before registering " : "an op with MVCC. > Done nit: this is marked as done, but I don't see the update. Is it planned to appear in PS6? http://gerrit.cloudera.org:8080/#/c/17942/5/src/kudu/fs/log_block_manager.cc@133 PS5, Line 133: TAG_FLAG(log_container_metadata_rewrite_inject_latency_ms, advanced); nit: I guess this tag isn't necessary since the flag is already hidden -- To view, visit http://gerrit.cloudera.org:8080/17942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie48211d9e8c1d74e520fcb04df25c1d681261bb5 Gerrit-Change-Number: 17942 Gerrit-PatchSet: 5 Gerrit-Owner: Yingchun Lai <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Greg Solovyev <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Wed, 03 Nov 2021 05:58:33 +0000 Gerrit-HasComments: Yes
