David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement ......................................................................
Patch Set 35: (10 comments) http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS35, Line 485: DCHECK(IsGTest()); hum, can you at least change this to a CHECK? if we're missing coverage somewhere a release build could go through this path PS35, Line 541: Status DataDirManager::GetDirForGroupUnlocked(const vector<uint16_t>& group_indices, : boost::optional<uint16_t>* new_uuid_idx) { : DCHECK(dir_group_lock_.is_locked()); : if (group_indices.size() == data_dirs_.size()) { : return Status::OK(); : } : : // Determine all potential candidates to be added to the group. : vector<uint16_t> candidate_indices; : unordered_set<uint16_t> data_dir_set; : data_dir_set.insert(group_indices.begin(), group_indices.end()); : for (auto& e : data_dir_by_uuid_idx_) { : // If the directory is already in the group or it's full, ignore it. : RETURN_NOT_OK(e.second->RefreshIsFull(DataDir::RefreshMode::ALWAYS)); : if (ContainsKey(data_dir_set, e.first) || e.second->is_full()) { : continue; : } : candidate_indices.push_back(e.first); : } : : // Select two randomly, compare their load, and select the one with less load. : shuffle(candidate_indices.begin(), candidate_indices.end(), default_random_engine(rng_.Next())); : if (candidate_indices.empty()) { : *new_uuid_idx = boost::none; : } else if (candidate_indices.size() == 1 || : FindOrDie(tablets_by_uuid_idx_map_, candidate_indices[0]).size() < : FindOrDie(tablets_by_uuid_idx_map_, candidate_indices[1]).size()) { : *new_uuid_idx = candidate_indices[0]; : } else { : *new_uuid_idx = candidate_indices[1]; : } : return Status::OK(); : } wouldn't it be simpler and faster to get all the data dirs in one go? something like: go through the data dirs, not including the ones that are full to get dir_vec while (collected_dirs < needed_dirs) { shuffle vec pick 1 amont the 2 first erase chosen element } http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: PS35, Line 59: reperesented typo PS35, Line 65: uuid_indices nit: I don't think you need to refer to uuids all over this class, likely dd_index is enough PS35, Line 69: uuid_idx_by_uuid ..except here of course where you use "dd_uuid_to_idx_map PS35, Line 69: std::unordered_map<std::string, uint16_t> nit: typedef? PS35, Line 221: CreateDataDirGroup see my comments on the test about these method's names PS35, Line 254: Selects a directory from the available directories that aren't in the : // directory group. Selection is based on "The Power of Two Choices in : // Randomized Load Balancing", selecting two directories randomly and : // choosing the one with less load, quantified as the number of unique : // tablets in the directory. want to mention briefly what are the average case properties that you expect? PS35, Line 291: TabletsByUuidIndexMap oh you do have a typedef. move it/fwd declare somewhere you can use it also for DataDirGroup PS35, Line 301: do not block each other, while threads : // attempting to write, e.g. to create a new tablet, thereby creating a new : // data directory group, block all threads. this is hard to parse -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 35 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes