Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8376 )
Change subject: fs: defer failure from metadata load to bootstrap when data dir is missing ...................................................................... Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/fs/data_dirs.cc@327 PS8, Line 327: FindOrDie(uuid_by_uuid_idx, uuid_idx) > is this not just 'uuid' above? Yeah, not sure what happened here. http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/fs/data_dirs.cc@330 PS8, Line 330: pb->Swap(&group); > I think now that we are on newish protobuf we could do '*pb = std::move(gro Done http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc@557 PS8, Line 557: // Ensure the tablet's data dirs are present and healthy before it is opened. > not 100% sure I follow why you defer it all the way until bootstrap time vs The old location was in TSTabletManager::OpenTablet, which is run out of the bootstrap threadpool too. If you're suggesting I do it in TabletMetadata::LoadFromSuperBlock (called synchronously for each tablet in TSTabletManager::Init), it'd require more plumbing. Why did I move it at all? My preference is for TSTabletManager to hold tserver-specific behavior only, and while the "disk failure handling" logic isn't particularly useful for the master, it could be in the future, so locating it in TabletBootstrap is an easy way to ensure it makes it into every place that uses a tablet. Tablet::Open would be equally reasonable, I think. http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc@561 PS8, Line 561: error retrieving tablet data dir group > will this be easy enough for an operator to understand? perhaps we should s Done http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc@563 PS8, Line 563: return Status::IOError("tablet data is in a failed directory"); > perhaps more accurate to say that _some_ tablet data is in a failed directo I'll clarify with 'some', but if you follow the return chain up the stack you'll see that TSTabletManager::OpenTablet prefixes the message with the tablet ID before it logs it. -- To view, visit http://gerrit.cloudera.org:8080/8376 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e8d5697c2bb08287cce11fbdab6fb8d6e37d1ad Gerrit-Change-Number: 8376 Gerrit-PatchSet: 8 Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Thu, 18 Jan 2018 22:57:43 +0000 Gerrit-HasComments: Yes
