Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17942 )

Change subject: [LBM] Speed up server bootstrap by using multi-thread to 
compact containers
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17942/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17942/2/src/kudu/fs/log_block_manager.cc@2151
PS2, Line 2151:     container_result->need_repunching_blocks.end());
> My point is not ExecClosure will execute synchronously when the queue is fu
Exactly.  However, RepairTask are run on the dir's thread pool anyways, right?  
So, the newly introduced thread pool  doesn't increase the parallelism of 
running the actual repair tasks, it is allows to for better parallelism of 
scheduling tasks on the dirs' thread pools, so this for() cycle isn't blocked 
at dd->ExecClosure() of a single directory which cannot accept any more tasks.

You don't need to wait for the rewrite tasks on that thread pool, you could 
wait in the same thread which runs dd_manager_->WaitOnClosures() in the old 
code or rapair_pool->Wait() in the updated code.  The only question how to poll 
multiple threadpools effectively.

I meant to have something like below instead of spawning extra threads which 
just either schedule tasks on directories' threadpools or run the repair tasks 
synchronously on their own.

std::set<size_t> dir_indexes_to_run_repair;
for (int i = 0; i < dd_manager_->dirs().size(); ++i) {
  ...
  if (do_repair) {
    dir_indexes_to_run_repair.emplace(i);
  }
  ...
}

while (!dir_indexes_to_run_repair.empty()) {
  for (auto it = dir_indexes_to_run_repair.begin();
     it != dir_indexes_to_run_repair.end(); ) {
  const size_t idx = *it;
  const auto& dd = dd_manager_->dirs()[idx];
  auto* dd_raw = dd.get();
  auto* dr = dir_results[idx].get();
  ...
  if (dd->pool_->Submit([this, dd_raw, dr]() { this->RepairTask(dd_raw, dr); 
}).ok()) {
    // OK, the task has been submitted.
    // Now remove the index from the dir_indexes_to_run_repair and switch to 
next element in dir_indexes_to_run_repair
    it = dir_indexes_to_run_repair.erase();
  } else {
    // The dir's pool is busy.  Ideally,
    dd->pool_->WaitFor(MonoDelta::FromMilliseconds(1));
    ++it;
  }
}

The drawback of this approach is a busy-waiting on the dir's threadpools.

The approach you implemented spawns extra threads when it could be possible to 
schedule a task on the directory's thread pool right away, but it solves the 
issue of polling multiple threadpools their availability to schedule a task.

I'm OK with spawning a few threads to solve this issue, but maybe it's possible 
to spawn less in majority of cases?  What do you think if first we try to 
schedule on the dir's threadpool directly, and only after that switch to using 
the newly introduced thread pool to schedule the rest of tasks?  Something like 
below:


std::set<size_t> dir_indexes_for_repair_pool;
for (int i = 0; i < dd_manager_->dirs().size(); ++i) {
  ...
  if (do_repair) {
    const size_t idx = *it;
    const auto& dd = dd_manager_->dirs()[idx];
    auto* dd_raw = dd.get();
    auto* dr = dir_results[idx].get();
    ...
    if (!dd->pool_->Submit([this, dd_raw, dr]() { this->RepairTask(dd_raw, dr); 
}).ok()) {
      // The dir's threadpool is busy, so the task cannot be submitted. Let's 
submit it via the dedicated repair pool.
      dir_indexes_for_repair_pool.emplace(i);
    }
  }
  ...
}

for (auto idx : dir_indexes_for_repair_pool) {
  const auto& dd = dd_manager_->dirs()[idx];
  auto* dd_raw = dd.get();
  auto* dr = dir_results[idx].get();
  CHECK_OK(repair_pool->Submit([this, dd_raw, dr]() { this->RepairTask(dd_raw, 
dr); }));
}

With that, at least we could spawn less threads if the dirs' thread pools are 
able to accept tasks right away.



--
To view, visit http://gerrit.cloudera.org:8080/17942
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie48211d9e8c1d74e520fcb04df25c1d681261bb5
Gerrit-Change-Number: 17942
Gerrit-PatchSet: 8
Gerrit-Owner: Yingchun Lai <acelyc1112...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Greg Solovyev <gsolov...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <acelyc1112...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Nov 2021 19:47:30 +0000
Gerrit-HasComments: Yes

Reply via email to