Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13975 )

Change subject: Consider the available space when selecting data dirs for 
blocks.
......................................................................


Patch Set 7:

(9 comments)

I'll look into the test failure some more. If I check out this patch, I see it 
failing a small percentage (2-6%) of the time.

http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.h@207
PS7, Line 207: available_bytes_
nit: wrap this in ''s like the other variable names.


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.h@212
PS7, Line 212: , updated by RefreshIsFull.
nit: remove this, or update it to RefreshAvailableSpace


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.h@338
PS7, Line 338: and give preference to which with more free space
             :   // from the specified option's data dir group.
nit: Could you rephrase a bit?

"and gives preference to those with more free space in the specified option's 
data dir group."


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.h@471
PS7, Line 471: as the number of unique tablets in the directory.
             :   // The resulting behavior fills directories that have fewer 
tablets stored on
             :   // them while not completely neglecting those with more 
tablets.
nit: update this?


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.cc@90
PS7, Line 90:  "
            :              "During this time, will not really check disk space 
when using EXPIRED_ONLY."
nit: I would remove this sentence. These descriptions usually try to be 
user-understandable, and EXPIRED_ONLY isn't really user-facing.


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.cc@110
PS7, Line 110:
             :
             :
nit: can you remove this extraneous newline?


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.cc@264
PS7, Line 264: last_check_is_full_
nit: maybe we should rename this to last_space_check_ or something similar


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.cc@1002
PS7, Line 1002:   vector<int> random_indices(candidate_indices.size());
              :   iota(random_indices.begin(), random_indices.end(), 0);
Could we just shuffle candidate_indices directly? And then compare dirs 
candidate_indices[0] and candidate_indices[1]?


http://gerrit.cloudera.org:8080/#/c/13975/7/src/kudu/fs/data_dirs.cc@1083
PS7, Line 1083:       int tablets_in_first = 
FindOrDie(tablets_by_uuid_idx_map_, candidate_indices[0]).size();
              :       int tablets_in_second = 
FindOrDie(tablets_by_uuid_idx_map_, candidate_indices[1]).size();
              :       int selected_index = 0;
              :       if (tablets_in_first == tablets_in_second) {
              :         int64_t space_in_first = 
FindOrDie(data_dir_by_uuid_idx_,
              :                                            
candidate_indices[0])->available_bytes();
              :         int64_t space_in_second = 
FindOrDie(data_dir_by_uuid_idx_,
              :                                             
candidate_indices[1])->available_bytes();
              :         selected_index = space_in_first > space_in_second ? 0 : 
1;
              :       } else {
              :         selected_index = tablets_in_first < tablets_in_second ? 
0 : 1;
              :       }
              :       
group_indices->push_back(candidate_indices[selected_index]);
              :       candidate_indices.erase(candidate_indices.begin() + 
selected_index);
              :     }
To be clear, this is selecting by the tablet count first, and then breaking 
ties by space.

Is that preferred over selecting by space, and then breaking ties by tablet 
count?

This second scenario isn't as likely in production, but it definitely happens 
in tests since internal mini clusters share a single disk.



--
To view, visit http://gerrit.cloudera.org:8080/13975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I194c4965ee64aed728e3b84e684c04d445cbe529
Gerrit-Change-Number: 13975
Gerrit-PatchSet: 7
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Thu, 08 Aug 2019 18:20:05 +0000
Gerrit-HasComments: Yes

Reply via email to