Andrew Wong has posted comments on this change.

Change subject: disk failure: reassign failed tablets

Patch Set 8:

File src/kudu/client/

PS7, Line 232: case tserver::TabletServerErrorPB::TABLET_FAILED: // fall-through
> would it make more sense to have this be like: TABLET_NOT_FOUND? How do we 
Hrm, maybe, but I'm keeping this as is for now. Reasoning here was that before 
when a tablet was in the FAILED state, we would treat it as TABLET_NOT_RUNNING. 
I'm looking in client/ and it seems like we blacklist the 
location for TNR (if there's somewhere else I should be looking, please let me 

I'm not sure it makes sense to retry on TNR. I suppose it could retry if the 
tablet were NOT_STARTED or BOOTSTRAPPING, but tablets in QUIESCING and SHUTDOWN 
are also considered NOT_RUNNING.
File src/kudu/consensus/

PS7, Line 284: sponse_.error().code() == TabletServerErrorPB::TABLET_FAILED) 
> maybe in this case we should directly call: NotifyObserversOfFailedFollower
File src/kudu/consensus/

PS7, Line 638: // Initiate Tablet Copy on the peer if the tablet is not found.
             :     if (response.has_error()) {
             :       CHECK_EQ(tserver::TabletServerErrorPB::TABLET_NOT_FOUND, 
             :       peer->needs_tablet_copy = true;
             :       VLOG_WITH_PREFIX_UNLOCKED(1) << "Marked peer as needing 
tablet copy: "
             :                                     << peer->ToString();
             :       *more_pending = true;
             :       return;
             :     }
             :     // Sanity checks.
             :     // Some of these can be eventually removed, but they are 
handy for now.
             :     DCHECK(response.status().IsInitialized())
             :         << "Error: Uninitialized: " << 
             :         << ". Response: "<< SecureShortDebugString(response);
             :     // TODO: Include uuid in error messages as well.
             :     DCHECK(response.has_responder_uuid() && 
> see my comment on the call site
File src/kudu/master/

PS7, Line 170: DEFINE_bool(master_tombstone_failed_tablet_replicas, true,
             :             "Whether the master should tombstone (delete) tablet 
replicas that "
             :             "are reporting a failed state. Only for testing!");
             : TAG_FLAG(master_tombstone_failed_tablet_replicas, hidden);
> is this a test only thing?
As of now, yes. Will update to make that clear.
File src/kudu/tablet/metadata.proto:

PS7, Line 161: the tablet will be evicted and
> ??
Should be evicted and replaced.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <>
Gerrit-Reviewer: Andrew Wong <>
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to