Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5558/IMPALA-5576: Reopen stale client connection ......................................................................
IMPALA-5558/IMPALA-5576: Reopen stale client connection Previously, the retry logic in DoRpc() only allows retry to happen if send() didn't complete successfully and the exception indicates a closed connection. However, send() returning successfully doesn't guarantee that the bytes have actually reached the remote peer. According to the man page of send(), when the message does not fit into the send buffer of the socket, send() normally blocks. So the payload of RPC may be buffered in the kernel if there is room for it. TCP allows a connection to be half-open. If an Impalad node is restarted, a stale client connection to that node may still allow send() to appear to succeed even though the payload wasn't sent. However, upon calling recv() in the RPC call to fetch the response, the client will get a return value of 0. In which case, thrift will throw an exception as the connection to the remote peer is closed already. Apparently, the existing retry logic doesn't quite handle this case. One can consistently reproduce the problem by warming the client cache followed by restarting one of the Impalad nodes. It will result a series of query failures due to stale connections. This change augments the retry logic to also retry the entire RPC if the exception string contains the messages "No more data to read." or "SSL_read: Connection reset by peer" to capture the case of stale connections. Our usage of thrift doesn't involve half-open TCP connection so having a broken connection in recv() indicates the remote end has closed the socket already. The generated thrift code doesn't knowingly close the socket before an RPC completes unless the process crashes, the connection is stale (e.g. the remote node was restarted) or the remote end fails to read from the client. In either cases, the entire RPC should just be retried with a new connection. This change also fixes QueryState::ReportExecStatusAux() to unconditionally for up to 3 times when reporting exec status of a fragment instance. Previously, it may break out of the loop early if RPC fails with 'retry_is_safe' == true (e.g. due to recv() timeout) or if the connection to coordinator fails (IMPALA-5576). Declaring the RPC to have failed may cause all fragment instances of a query to be cancelled locally, triggering query hang due to IMPALA-2990. Similarly, the cancellation RPC is also idempotent so it should be unconditionally retried up to 3 times with 100ms sleep time in between. The status reporting is idempotent as the handler simply ignores RPC if it determines that all fragment instances on a given backend is done so it should be safe to retry the RPC. This change updates ApplyExecStatusReport() to handle duplicated status reports with done bit set. Previously we would drop some other fragment instances' statuses if we received duplicate 'done' statuses from the same fragment instance(s). Testing done: Warmed up client cache by running stress test followed by restarting some Impalad nodes. Running queries used to fail or hang consistently in the past. It now works with patch. Also ran CSL enduranace tests and exhaustive builds. Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 Reviewed-on: http://gerrit.cloudera.org:8080/7284 Reviewed-by: Michael Ho <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/runtime/backend-client.h M be/src/runtime/client-cache.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/query-state.cc M be/src/testutil/fault-injection-util.cc M be/src/testutil/fault-injection-util.h M tests/custom_cluster/test_rpc_exception.py 10 files changed, 158 insertions(+), 142 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]>
