Mike Percy has submitted this change and it was merged. Change subject: rpc: show outbound call state in /rpcz dump ......................................................................
rpc: show outbound call state in /rpcz dump Recently I've been looking at a stress cluster that is exhibiting lots of consensus request timeouts (eg KUDU-1788). It seems that many of the requests are timing out while the call is still in the process of being transferred, or in some cases not even sent yet. However, that wasn't obvious and took a lot of spelunking to figure out what was going on. This adds a new state to the OutboundCall state machine for 'SENDING', which is entered when we first start sending the request over the socket. As before, it transitions to 'SENT' once the request has been completely transferred. The state at the time of the timeout is now also put into the Status::TimedOut error string. It's a little bit ugly, but should be very useful to see when an issue is client-side or network-related rather than a server-side call timeout. Change-Id: Id52bc627a25be87a73b4b75941d7dcc2cf95eaba Reviewed-on: http://gerrit.cloudera.org:8080/5371 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy <[email protected]> --- M src/kudu/rpc/connection.cc M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/rpc_introspection.proto 4 files changed, 71 insertions(+), 9 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id52bc627a25be87a73b4b75941d7dcc2cf95eaba Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]>
