Dan Burkert has posted comments on this change.

Change subject: KUDU-2020: tserver failure causes multiple tablet copy 
operations per under-replicated tablet
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6925/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 1067:       // Skip calling SetupErrorAndRespond since this path doesn't 
need the
> Check out the 'Advanced per-instance throttling' section of util/logging.h 
OK so after trying this out on a cluster, I think we should allow it to log 
every time.  To balance this out, I think we should downgrade the 'tablet x 
needs tablet copy' message.  The net result is that we're logging the begin 
tablet copy result instead of the fact that we'll be requesting it.  For 
reference, here's a cross section of these logs for a particular tablet:

I0522 17:09:38.703362 15818 consensus_queue.cc:395] T 
c03811b02d7045e9a8cc426246c9595c P 70f7ee61ead54b1885d819f354eb3405 [LEADER]: 
Peer cc32936bc8594948a04fd4240da36aed needs tablet copy
W0522 17:09:38.703636  4776 consensus_peers.cc:352] T 
c03811b02d7045e9a8cc426246c9595c P 70f7ee61ead54b1885d819f354eb3405 -> Peer 
cc32936bc8594948a04fd4240da36aed (vd0236.halxg.cloudera.com:7050): Unable to 
begin Tablet Copy on peer: error { code: THROTTLED status { code: 
SERVICE_UNAVAILABLE message: "Thread pool is at capacity (10/10 tasks running, 
0/0 tasks queued)" } }
I0522 17:09:40.211633 15820 consensus_queue.cc:395] T 
c03811b02d7045e9a8cc426246c9595c P 70f7ee61ead54b1885d819f354eb3405 [LEADER]: 
Peer cc32936bc8594948a04fd4240da36aed needs tablet copy
W0522 17:09:40.211971  4776 consensus_peers.cc:352] T 
c03811b02d7045e9a8cc426246c9595c P 70f7ee61ead54b1885d819f354eb3405 -> Peer 
cc32936bc8594948a04fd4240da36aed (vd0236.halxg.cloudera.com:7050): Unable to 
begin Tablet Copy on peer: error { code: THROTTLED status { code: 
SERVICE_UNAVAILABLE message: "Thread pool is at capacity (10/10 tasks running, 
0/0 tasks queued)" } }
I0522 17:09:41.703528 11794 consensus_queue.cc:395] T 
c03811b02d7045e9a8cc426246c9595c P 70f7ee61ead54b1885d819f354eb3405 [LEADER]: 
Peer cc32936bc8594948a04fd4240da36aed needs tablet copy
W0522 17:09:41.703760  4776 consensus_peers.cc:352] T 
c03811b02d7045e9a8cc426246c9595c P 70f7ee61ead54b1885d819f354eb3405 -> Peer 
cc32936bc8594948a04fd4240da36aed (vd0236.halxg.cloudera.com:7050): Unable to 
begin Tablet Copy on peer: error { code: THROTTLED status { code: 
SERVICE_UNAVAILABLE message: "Thread pool is at capacity (10/10 tasks running, 
0/0 tasks queued)" } }


On clusters approaching normalcy, I wouldn't expect to see these logs much at 
all.


-- 
To view, visit http://gerrit.cloudera.org:8080/6925
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iffa1f0fec4e882beabfee6e0f2672096caccdf75
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to