Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/7229
Change subject: IMPALA-5537: Retry RpcRecv on recv timeout exception for SSL connection ...................................................................... IMPALA-5537: Retry RpcRecv on recv timeout exception for SSL connection After the fix for IMPALA-5388, all TSSLException thrown will be treated as fatal error and the query will fail. Turns out that this is too strict and in a secure cluster under load, queries can easily hit timeout waiting for RPC response. When running without SSL, we call RetryRpcRecv() to retry the recv part of an RPC if recv() called by the TSocket underlying the RPC returns an EAGAIN. This change extends that logic to cover secure connection. In particular, we pattern match against the exception string "SSL_read: Resource temporarily unavailable" which corresponds to EAGAIN error code being thrown in the SSL_read() path. The fault injection utility has also been updated to distinguish between time out and lost connection to exercise different error handling paths in the send and recv paths. After this change, we will still fail the RPC if there is any error in the send part of a RPC with secure connection. Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c --- M be/src/rpc/thrift-util.cc M be/src/testutil/fault-injection-util.cc M be/src/testutil/fault-injection-util.h M tests/custom_cluster/test_rpc_exception.py 4 files changed, 63 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7229/1 -- To view, visit http://gerrit.cloudera.org:8080/7229 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]>
