Henry Robinson has posted comments on this change. Change subject: IMPALA-5388: Don't retry RPC calls on TSSLException ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7063/1/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS1, Line 258: Status(TErrorCode::RPC_GENERAL_ERROR, e.what()); > Should we consider returning RPC_RECV_TIMEOUT instead if e.what() contains I don't think so - RPC_RECV_TIMEOUT is only used elsewhere to properly print an error message. PS1, Line 259: catch (const apache::thrift::TException& e) { : if (IsRecvTimeoutTException(e)) { : return Status(TErrorCode::RPC_RECV_TIMEOUT, strings::Substitute( : "Client $0 timed-out during recv call.", TNetworkAddressToString(address_))); : } : VLOG(1) << "client " << client_ << " unexpected exception: " : << e.what() << ", type=" << typeid(e).name(); : : // Client may have unexpectedly been closed, so re-open and retry. : // TODO: ThriftClient should return proper error codes. : const Status& status = Reopen(); : if (!status.ok()) { : if (retry_is_safe != NULL) *retry_is_safe = true; : return Status(TErrorCode::RPC_CLIENT_CONNECT_FAILURE, status.GetDetail()); : } : try { : (client_->*f)(*response, request); : } catch (apache::thrift::TException& e) { : // By this point the RPC really has failed. : // TODO: Revisit this logic later. It's possible that the new connection : // works but we hit timeout here. : return Status(TErrorCode::RPC_GENERAL_ERROR, e.what()); The more I stare at this, the more I think it's broken even without SSL. It's pretty clear to see that TExceptions can be thrown by TSocket on its read() path which would lead to a spurious retry in any case. It looks like TSocket gives a narrow set of error codes for the 'socket not open / conn reset' error cases that would be better used here. In fact I just tried this, and by throwing a TException between writing and reading in one TransmitData() RPC, I can get wrong results pretty easily. I think we need to restructure this block to narrow the retried RPCs only to those a) on the write path and b) that have the error code NOT_OPEN. -- To view, visit http://gerrit.cloudera.org:8080/7063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I176975f2aa521d5be8a40de51067b1497923d09b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
