Andrew Wong has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement ......................................................................
Patch Set 18: (33 comments) Failure from LINT and a memory leak in ASAN data_dirs-test dist-test (looking into it). http://gerrit.cloudera.org:8080/#/c/6636/17//COMMIT_MSG Commit Message: PS17, Line 9: mitigate the : single-disk failure > nit: we aren't mitigating the failure, but rather mitigating the effects of Done PS17, Line 27: When loading : tablet data from a previous version of Kudu, the tablet's superblock : will not have a DataDirGroup. One will be generated containing all : data directories, as the tablet's data may already be spread across : any number of disks. > what happens if we add a table on the new version, then downgrade, run for Won't the superblock be rewritten with the old version of Kudu when new blocks are written? I thought the superblock was always updated as new things get flushed. If that's the case, shouldn't the metadata be replaced with a group-less version, and upon upgrading, we should be create a group with all dirs. http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 133: Status CountFilesCb(int* num_files, Env::FileType type, > would using Env::Walk() make this easier? Done Line 428: size_t size1 = 5; > Before asserting on the status' string, assert on its type (i.e. ASSERT_TRU Done http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS18, Line 154: t CountFiles_OLD(const string& path) { : vector<string> child_paths; : Status s = env_->GetChildren(path, &child_paths); : if (!s.ok()) { : return 0; : } : int count = 0; : for (const string& child_path : child_paths) { : if (child_path == "." || child_path == ".." || child_path == kInstanceMetadataFileName) { : continue; : } : string full_child_path = JoinPathSegments(path, child_path); : bool is_dir; : s = env_->IsDirectory(full_child_path, &is_dir); : if (is_dir) { : // Count the files in the child directories. : count += CountFiles_OLD(full_child_path); : } else { : // Increment if the child is a file. : count++; : } : } : return count; : } Removed. http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs-test.cc File src/kudu/fs/data_dirs-test.cc: PS17, Line 56: } : : pr > You can omit this since it doesn't do anything beyond what the base class d Done PS17, Line 79: Status s = dd_manager_->GetNextDataDir(non_existent_opts, nullptr); : ASSERT_STR_CONTAIN > Would it be possible to instantiate a DataDirManager and forgo the block ma The main reason I added a blockmanager since setting up the env seemed to be nicely handled. Line 96: } > Test the Status programmatically before testing its string representation. Done PS17, Line 115: TEST_F(DataDirGroupTest, TestDeleteDataDirGroup) { : ASSERT_OK(dd_manager_->CreateDataDirGroup(test_ > Why is it necessary to set these? Won't GetNextDataDir() return Status::OK( Left in from when I was experimenting with the flags. Removed Line 138: // Add 20 tablets, each with size 3. > Maybe do this once in the test constructor, and do it again in each test th Done http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS17, Line 62: riped ac > maybe even experimental, considering this isn't quite usable yet? Done Line 188: > ?? Good question. Removed. PS17, Line 254: if (metric_entity) { : metrics_.reset(new Data > I don't think it makes sense to modify process-level state deep in an objec RNG is used for creating groups and selecting directories within groups. I added this here so my dist-test runs would differ between runs. Without it, the groups would always be the same. I've changed it to use GetRandomSeed32 and random_shuffle, which afaik don't modify state. Line 394: InsertOrDie(&uuid_idx_by_dd, dd.get(), idx); > Don't need std:: prefixes here. Done PS17, Line 410: > i think this and the one below could be a bit more specific and refer to da Good call, done. Also removed "tracking" vernacular, since it's no longer used in function names. PS17, Line 422: : : // Adjust the disk group size to fit within the total number of data dirs. : i > use std::min? Done, but a little ugly due to static casting. Line 460: // natural tablet_id, select a data dir randomly. > maybe DCHECK(IsGTest()) from test_util_prod.h? Done Line 479: DataDir* candidate = FindOrDie(data_dir_by_uuid_idx_, uuid_idx); > FindOrDie() here? Done PS17, Line 518: _uuid_idx) { > Instead of the separate bool, could you wrap new_uuid_idx in a boost::optio Done Line 535: } > why not also just skip adding the full ones here? might make the lower part Hmm, mainly wanted to avoid calling RefreshIsFull() on all dirs if possible, since it's a disk operation, but from what I've read, it doesn't seem to be a very expensive call. Good point that below would be simpler. Done. Line 548: } else if (FindOrDie(tablets_by_uuid_idx_map_, candidate_indices[0]).size() < > Why do we need to refresh? If we don't refresh, there is a possibility that the group can be assigned a directory that's full. If there's a mix of full and non-full disks and we could happen to assign all full disks to the tablet. As an example, DataDirGroupTest.TestFullDisk, which expects groups to not be created given full disks, will fail and create a group even when there is no space if this is not ALWAYS. My worry is that if we leave fullness checking to when we create blocks, we may end up creating a group that has no space when there actually isn't any on the selected disks. http://gerrit.cloudera.org:8080/#/c/6636/18/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 545: } else { > warning: don't use else after return [readability-else-after-return] Done http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: Line 51: > I don't understand why you had to move DataDirGroup back to the header. The No, forward declaration usually doesn't work for STL containers since I'm not using pointers. PS17, Line 53: list of > maybe missed something somewhere in the design, but when we get to removing At least while the server is alive and the disk is still there, we do need to keep all the dead UUIDs so the indices within the PathSetPB are still valid. I think the only time we can remove the UUID is when the dead disk is physically removed and we're guaranteed nothing will be put on it. Will think about this more and update the doc. PS17, Line 66: > std::move Done Line 70: void CopyToPB(DataDirGroupPB* pb) const { > can just DCHECK(pb); Done Line 73: for (uint16_t uuid_idx : uuid_indices_) { > I think you can write this as group->add_uuid_indices(uuid_idx) Done Line 75: } > nit: pb->Swap(group); to avoid an extra allocation Good call. PS17, Line 251: is an optional output denoting which uuid_idx should be : // added to 'group_indices'. If there are no available > why not return a bad status instead of a second out-param? The only error that can be returned atm is an env IOError during dir fullness refresh. Being full isn't necessarily an error; it just means the group is at capacity, and that it should stop adding dirs. Going with Adar's proposal with boost::optional Line 259: // call will reflect the same initial state of directory load. > warning: function 'kudu::fs::DataDirManager::GetDirForGroupUnlocked' has a Done http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/file_block_manager.h File src/kudu/fs/file_block_manager.h: PS17, Line 67: CreateBlockOptions > typo? ? Not sure I see the typo. The caller of CreateBlock() provides a CreateBlockOptions indicating the tablet the block belongs to (and more in the future, eg disk class). http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/fs.proto File src/kudu/fs/fs.proto: Line 116: // This is stored in each TabletSuperBlockPB. > can you note where this ends up stored? Noted that it's stored in the TabletSuperBlockPB. http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: Line 65: string tablet_id) > nit: pass by value and then std::move below Done -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes