Todd Lipcon has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement ......................................................................
Patch Set 35: (11 comments) only about 50% through, but figured I'd post a couple comments I have so far http://gerrit.cloudera.org:8080/#/c/6636/17//COMMIT_MSG Commit Message: PS17, Line 27: When loading : tablet data from a previous version of Kudu, the tablet's metadata : will not have a DataDirGroup. One will be generated containing all : data directories, as the tablet's data may already be spread across : any number of disks. > Won't the superblock be rewritten with the old version of Kudu when new blo yea, I think you're right here. One thing to watch out for in other cases is that, if you deserialize a PB which has some fields that are unknown, and then serialize it back out, it actually preserves the fields (in proto2). This changed with protobuf 3, but they're thinking of adding back that behavior again in 3.4. In the case of TabletMetadata, though, it seems like we deserialize into our own structures, and then create a new PB on flush, rather than actually holding onto the original PB. So, in a downgrade-and-flush scenario, we will drop those fields as you described, and should be fine. http://gerrit.cloudera.org:8080/#/c/6636/35//COMMIT_MSG Commit Message: PS35, Line 35: failure disk-failure nit: typo http://gerrit.cloudera.org:8080/#/c/6636/30/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS30, Line 192: bl 'num_blocks_per_dir'? http://gerrit.cloudera.org:8080/#/c/6636/17/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: PS17, Line 53: > At least while the server is alive and the disk is still there, we do need hrm, but even then, don't we need the "placeholder" in the array such that the later indexes don't get "shifted forward"? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_bootstrap-test.cc File src/kudu/tablet/tablet_bootstrap-test.cc: PS35, Line 207: // Bootstrapping the tablet while it is copying will attempt to create a data : // dir group for the tablet when one already exists. : ASSERT_TRUE(s.IsAlreadyPresent()) << "Expected already present: " << s.ToString(); Not sure about this change. The test is supposed to be simulating the case where a server crashes mid-copy, and then restarts. Maybe this isn't realistic anymore (because we would roll-forward a COPYING into DELETED at actual startup)? But I dont think AlreadyPresent makes much sense to propagate up here. http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: PS35, Line 86: RETURN_NOT_OK can you PREPEND here to give better context? same below PS35, Line 199: // Keep a copy of the old data dir group in case of flush failure. : DataDirGroupPB pb; : bool old_group_exists = fs_manager_->dd_manager()->GetDataDirGroupPBForTablet(tablet_id_, &pb); : : // Remove the tablet's data dir group metadata tracked by the DataDirManager. : fs_manager_->dd_manager()->DeleteDataDirGroup(tablet_id_); : if (delete_type == TABLET_DATA_COPYING) { : // If the tablet deletion is a part of a tablet copy, a data dir group must : // created so the blocks can be placed for the new tablet. : RETURN_NOT_OK(fs_manager_->dd_manager()->CreateDataDirGroup(tablet_id_)); : } : : auto revert_group_cleanup = MakeScopedCleanup([&]() { : fs_manager_->dd_manager()->DeleteDataDirGroup(tablet_id_); : if (old_group_exists) { : fs_manager_->dd_manager()->LoadDataDirGroupFromPB(tablet_id_, pb); : } : }); Instead of doing this here, would it work to instead add a new method like 'ReallocateDataDirGroup' which is called separately by TabletCopy right before it starts writing out new blocks? It smells a little funny to me to piggy-back this on the deletion instead of on the setting-up of the new tablet metadata. In other words, when we call TabletCopyClient::Finish(), we're already using ReplaceSuperBlock at that point which deletes the data dir group and loads the new one from the superblock. So, if the new super block has been allocated a new group by TabletCopyClient::Start (eg near where it does clear_rowsets(), clear_orphaned_blocks(), etc) then this would be handled somewhat naturally. That said, if you considered that and went this way, happy to discuss Line 388: RETURN_NOT_OK(fs_manager_->dd_manager()->LoadDataDirGroupFromPB( PREPEND PS35, Line 393: // data data typo PS35, Line 398: true) we try to avoid boolean parameters to change behavior. Even though it seems a little silly, would be good to introduce an enum for this so the parameter becomes descriptive (like ACROSS_ALL_DISKS or something) PS35, Line 629: pb.mutable_data_dir_group()); note that this will end up adding an empty data_dir_group field to 'pb' even if it didn't add any elements. I'm not sure if that's intended but I'm guessing not? -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 35 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
