Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14743 )
Change subject: KUDU-3001 Multi-thread to load containers in a data directory ...................................................................... Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager-test.cc@1028 PS5, Line 1028: const int kNumBlocks = AllowSlowTests() ? FLAGS_startup_benchmark_block_count_for_testing : 1000; > Probably cleaner to use OverrideFlagForSlowTests(). Then FLAGS_... is still Here I'm going to do benchmarks with different large values, OverrideFlagForSlowTests seems to override a FLAGS_xxx by one specified value, it dosen't match. http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@20 PS5, Line 20: #include <algorithm> > Why the reordering? In any case, since you're changing this, could you use Done http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@37 PS5, Line 37: #include <gflags/gflags.h> > I think you can use std::bind instead (from <functional>). Done http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2411 PS5, Line 2411: FsReport* report, > Each DataDir has a singleton thread pool, currently used to run OpenDataDir - Yes, especially for SSDs and NVMEs. But for HDDs, it's better to be 1, so this value should be a gflag? - If we reuse thread pool of DataDir, when DataDir itself call ExecClosure to excute LogBlockManager::OpenDataDir, then we SubmitFunc by the same thread pool to LogBlockManager::LoadRecords, how do we just wait the later tasks done? http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2414 PS5, Line 2414: data_dir_result.report.data_dirs.push_back(dir->dir()); > You should also experiment with different max thread values. Do you expect I think the best max thread value depends on containers count per data dir, total data dirs count and CPU cores, a gflag may be better? Yes, I expect they are equal. We can specify multiple data dirs on a single disk, right? I didn't try that ever. For historical reason, some users may specify only one data dir on a disk, "one data dir + many max threads" may improve startup time. http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2417 PS5, Line 2417: unordered_set<string> containers_seen; > These are all sized to "number of containers", right? Could you define a st Done -- To view, visit http://gerrit.cloudera.org:8080/14743 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0721ee4a5a6824db146ba0658e60eec25dd0c65c Gerrit-Change-Number: 14743 Gerrit-PatchSet: 6 Gerrit-Owner: Yingchun Lai <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Thu, 21 Nov 2019 08:39:40 +0000 Gerrit-HasComments: Yes
