Henry Robinson has posted comments on this change. Change subject: IMPALA-5388: Only retry RPC on lost connection in send call ......................................................................
Patch Set 2: (4 comments) Were you able to test this at all? The idea behind retrying in DoRpc() was to insulate callers from having to detect and implement re-open / retry loops themselves. Because the client-cache keeps connections around for a long time, they would sometimes get closed by the remote end. That expanded to include the logic around retrying the recv part of an RPC if the socket read timeout had expired, since TransmitData() can block for a long time before responding. Arguably it might be better to set the timeouts for the individual RPC (that's what happens in KRPC). Although I think this is brittle, I feel like it's relatively correct _if_ our assumption that the socket-closed exceptions we detect only get thrown on the write path is accurate. It seems to be the case for non-secure clusters; for Kerberos or TLS it's less clear (but it seems that receiving the response only involves read operations). Can you see a situation with this patch where an RPC will be spuriously retried? http://gerrit.cloudera.org:8080/#/c/7063/2/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS2, Line 252: else if nit: avoid 'else' since previous block will return. Here and below. PS2, Line 253: <F, Request, Response> can the compiler not figure out the specialization from the arguments? PS2, Line 262: Status(TErrorCode::RPC_GENERAL_ERROR, e.what()); > Please note that this will start returning "RPC: SSL resource temporarily u I think it's ok - it's actually hard to think of many cases where retrying an RPC was a good idea (cancellation, maybe profile reporting...). I'd rather get error messages than incorrect behaviour. PS2, Line 285: // If it's not timeout exception, then the connection is broken, stop retrying. Remove? -- To view, visit http://gerrit.cloudera.org:8080/7063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I176975f2aa521d5be8a40de51067b1497923d09b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Alan Choi <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
