David Ribeiro Alves has posted comments on this change.

Change subject: disk failure: reassign failed tablets
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS7, Line 232: case tserver::TabletServerErrorPB::TABLET_FAILED: // fall-through
would it make more sense to have this be like: TABLET_NOT_FOUND? How do we 
respond to TABLET_NOT_RUNNING on the client? do we retry?
my fear is that we might eventually do that (or might even do that already in 
some cases like single replica tablets) but this is a non-retriable error. 
Alternatively actually sending a tablet failed reply might make sense too.


http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

PS7, Line 284: response_.error().code() != TabletServerErrorPB::TABLET_FAILED
maybe in this case we should directly call: NotifyObserversOfFailedFollower() 
or a new analog public method (like queue_->NotifyPeerHasFailed())?


http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS7, Line 638: // We only let special types of errors through to this point 
from the peer.
             :     if (response.has_error()) {
             :       // Initiate Tablet Copy on the peer if the tablet is not 
found.
             :       if (tserver::TabletServerErrorPB::TABLET_NOT_FOUND == 
response.error().code()) {
             :         peer->needs_tablet_copy = true;
             :         VLOG_WITH_PREFIX_UNLOCKED(1) << "Marked peer as needing 
tablet copy: "
             :                                      << peer->ToString();
             :         *more_pending = true;
             :       } else {
             :         // Ignore the response from the peer if it is failed.
             :         CHECK(tserver::TabletServerErrorPB::TABLET_FAILED == 
response.error().code())
             :           << SecureShortDebugString(response);
             :         VLOG_WITH_PREFIX_UNLOCKED(1) << "Ignoring response from 
tablet to promote eviction: "
             :                                     << peer->ToString();
             :         *more_pending = false;
             :       }
             :       return;
             :     }
see my comment on the call site


http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS7, Line 170: DEFINE_bool(master_tombstone_failed_tablet_replicas, true,
             :             "Whether the master should tombstone (delete) tablet 
replicas that "
             :             "are reporting a failed state.");
             : TAG_FLAG(master_tombstone_failed_tablet_replicas, hidden);
is this a test only thing?


http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

PS7, Line 161: the tablet will be replicated.
??


-- 
To view, visit http://gerrit.cloudera.org:8080/7440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to