David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-2020: tserver failure causes multiple tablet copy 
operations per under-replicated tablet
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6925/8/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

PS8, Line 345: controller_.status().ok() &&
             :       (!tc_response_.has_error() ||
             :        tc_response_.error().code() ==
             :         
TabletServerErrorPB::TabletServerErrorPB::ALREADY_INPROGRESS)
so here we were not notifying the queue if everything went well? was this a bug 
or did we have other means to notify the queue (like with heartbeats)

also could pull this to another method to make it more tracktable?


http://gerrit.cloudera.org:8080/#/c/6925/8/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS8, Line 395: VLOG(3) << LogPrefixUnlocked() << "Peer " << uuid << " needs 
tablet copy" << THROTTLE_MSG;
how about we increase the number of seconds to something like 10?


http://gerrit.cloudera.org:8080/#/c/6925/8/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS8, Line 228: // Update the last successful communication timestamp for the 
given peer
             :   // to the current time. This should be called when a 
non-network related
             :   // error is received from the peer, indicating that it is 
alive, even if it
             :   // may not be fully up and running or able to accept updates.
update docs


http://gerrit.cloudera.org:8080/#/c/6925/8/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS8, Line 385: shared_ptr<TabletCopyRunnable> runnable(new 
TabletCopyRunnable(this, req, cb));
             :   Status s = tablet_copy_pool_->Submit(runnable);
             :   if (PREDICT_TRUE(s.ok())) {
             :     return;
             :   }
wanna add some docs regarding what you mentioned in person? i.e that we don't 
need check if the tablet is already in transition because that is already 
handled somewhere?


PS8, Line 405:   if (transition) {
             :     cb(Status::IllegalState(
             :           strings::Substitute("State transition of tablet $0 
already in progress: $1",
             :                               tablet_id, *transition)),
             :           TabletServerErrorPB::ALREADY_INPROGRESS);
             :     return;
             :   }
add docs stating that we do this to call the callback outside the lock?


-- 
To view, visit http://gerrit.cloudera.org:8080/6925
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iffa1f0fec4e882beabfee6e0f2672096caccdf75
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to