Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14743 )
Change subject: KUDU-3001 Multi-thread to load containers in a data directory ...................................................................... Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager-test.cc@1028 PS5, Line 1028: const int kNumBlocks = AllowSlowTests() ? FLAGS_startup_benchmark_block_count_for_testing : 1000; Probably cleaner to use OverrideFlagForSlowTests(). Then FLAGS_... is still the right place to go for the value; no need to think about an intermediary kNumBlocks variable. http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@20 PS5, Line 20: #include <errno.h> Why the reordering? In any case, since you're changing this, could you use <cerrno> instead? http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@37 PS5, Line 37: #include <boost/bind.hpp> I think you can use std::bind instead (from <functional>). http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2411 PS5, Line 2411: // Create a per-dir thread pool. Each DataDir has a singleton thread pool, currently used to run OpenDataDir. I think your goal here is to queue more than one operation per disk, since disks (especially SSDs and NVMEs) can run concurrent I/Os, right? If so, why not reuse that thread pool and make it a non-singleton? There could be positive knock-on effects to doing that, like better concurrent hole punching. You'd still need to do the refactoring you did here, but you should be able to reuse those thread pools. http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2414 PS5, Line 2414: .set_max_threads(5) You should also experiment with different max thread values. Do you expect "one data dir + many max threads" per disk to be equivalent to "many data dirs + fewer max threads" per disk? http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2417 PS5, Line 2417: vector<shared_ptr<FsReport>> local_report_vec; These are all sized to "number of containers", right? Could you define a struct to include all of this, and one vector for that struct? -- To view, visit http://gerrit.cloudera.org:8080/14743 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0721ee4a5a6824db146ba0658e60eec25dd0c65c Gerrit-Change-Number: 14743 Gerrit-PatchSet: 5 Gerrit-Owner: Yingchun Lai <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 20 Nov 2019 20:17:11 +0000 Gerrit-HasComments: Yes
