Mike Percy has submitted this change and it was merged.

Change subject: rpc: show outbound call state in /rpcz dump
......................................................................


rpc: show outbound call state in /rpcz dump

Recently I've been looking at a stress cluster that is exhibiting lots
of consensus request timeouts (eg KUDU-1788). It seems that many of the
requests are timing out while the call is still in the process of being
transferred, or in some cases not even sent yet. However, that wasn't
obvious and took a lot of spelunking to figure out what was going on.

This adds a new state to the OutboundCall state machine for 'SENDING',
which is entered when we first start sending the request over the
socket. As before, it transitions to 'SENT' once the request has been
completely transferred.

The state at the time of the timeout is now also put into the
Status::TimedOut error string. It's a little bit ugly, but should be
very useful to see when an issue is client-side or network-related
rather than a server-side call timeout.

Change-Id: Id52bc627a25be87a73b4b75941d7dcc2cf95eaba
Reviewed-on: http://gerrit.cloudera.org:8080/5371
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <[email protected]>
---
M src/kudu/rpc/connection.cc
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/rpc_introspection.proto
4 files changed, 71 insertions(+), 9 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id52bc627a25be87a73b4b75941d7dcc2cf95eaba
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to