Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14743 )

Change subject: KUDU-3001 Multi-thread to load containers in a data directory
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager-test.cc@1028
PS5, Line 1028:   const int kNumBlocks = AllowSlowTests() ? 
FLAGS_startup_benchmark_block_count_for_testing : 1000;
> Probably cleaner to use OverrideFlagForSlowTests(). Then FLAGS_... is still
Here I'm going to do benchmarks with different large values, 
OverrideFlagForSlowTests seems to override a FLAGS_xxx by one specified value, 
it dosen't match.


http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@20
PS5, Line 20: #include <algorithm>
> Why the reordering? In any case, since you're changing this, could you use
Done


http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@37
PS5, Line 37: #include <gflags/gflags.h>
> I think you can use std::bind instead (from <functional>).
Done


http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2411
PS5, Line 2411:                                   FsReport* report,
> Each DataDir has a singleton thread pool, currently used to run OpenDataDir
- Yes, especially for SSDs and NVMEs. But for HDDs, it's better to be 1, so 
this value should be a gflag?
- If we reuse thread pool of DataDir, when DataDir itself call ExecClosure to 
excute LogBlockManager::OpenDataDir, then we SubmitFunc by the same thread pool 
to LogBlockManager::LoadRecords, how do we just wait the later tasks done?


http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2414
PS5, Line 2414:   data_dir_result.report.data_dirs.push_back(dir->dir());
> You should also experiment with different max thread values. Do you expect
I think the best max thread value depends on containers count per data dir, 
total data dirs count and CPU cores, a gflag may be better?
Yes, I expect they are equal. We can specify multiple data dirs on a single 
disk, right? I didn't try that ever. For historical reason, some users may 
specify only one data dir on a disk, "one data dir + many max threads" may 
improve startup time.


http://gerrit.cloudera.org:8080/#/c/14743/5/src/kudu/fs/log_block_manager.cc@2417
PS5, Line 2417:   unordered_set<string> containers_seen;
> These are all sized to "number of containers", right? Could you define a st
Done



--
To view, visit http://gerrit.cloudera.org:8080/14743
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0721ee4a5a6824db146ba0658e60eec25dd0c65c
Gerrit-Change-Number: 14743
Gerrit-PatchSet: 6
Gerrit-Owner: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Comment-Date: Thu, 21 Nov 2019 08:39:40 +0000
Gerrit-HasComments: Yes

Reply via email to