Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8376 )

Change subject: fs: defer failure from metadata load to bootstrap when data dir 
is missing
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/fs/data_dirs.cc@327
PS8, Line 327: FindOrDie(uuid_by_uuid_idx, uuid_idx)
> is this not just 'uuid' above?
Yeah, not sure what happened here.


http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/fs/data_dirs.cc@330
PS8, Line 330:   pb->Swap(&group);
> I think now that we are on newish protobuf we could do '*pb = std::move(gro
Done


http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc@557
PS8, Line 557:   // Ensure the tablet's data dirs are present and healthy 
before it is opened.
> not 100% sure I follow why you defer it all the way until bootstrap time vs
The old location was in TSTabletManager::OpenTablet, which is run out of the 
bootstrap threadpool too. If you're suggesting I do it  in 
TabletMetadata::LoadFromSuperBlock (called synchronously for each tablet in 
TSTabletManager::Init), it'd require more plumbing.

Why did I move it at all? My preference is for TSTabletManager to hold 
tserver-specific behavior only, and while the "disk failure handling" logic 
isn't particularly useful for the master, it could be in the future, so 
locating it in TabletBootstrap is an easy way to ensure it makes it into every 
place that uses a tablet. Tablet::Open would be equally reasonable, I think.


http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc@561
PS8, Line 561: error retrieving tablet data dir group
> will this be easy enough for an operator to understand? perhaps we should s
Done


http://gerrit.cloudera.org:8080/#/c/8376/8/src/kudu/tablet/tablet_bootstrap.cc@563
PS8, Line 563:     return Status::IOError("tablet data is in a failed 
directory");
> perhaps more accurate to say that _some_ tablet data is in a failed directo
I'll clarify with 'some', but if you follow the return chain up the stack 
you'll see that TSTabletManager::OpenTablet prefixes the message with the 
tablet ID before it logs it.



--
To view, visit http://gerrit.cloudera.org:8080/8376
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1e8d5697c2bb08287cce11fbdab6fb8d6e37d1ad
Gerrit-Change-Number: 8376
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Thu, 18 Jan 2018 22:57:43 +0000
Gerrit-HasComments: Yes

Reply via email to