Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5558/IMPALA-5576: Reopen stale client connection ......................................................................
Patch Set 9: (11 comments) http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: PS9, Line 200: nit: extra space http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS9, Line 243: ) { not suggesting we do this now, but why do we not handle recv cxn closed errors in a similar way as we do here, i.e. not resulting in a query failure as RPC_GENERAL_ERROR does. This question comes from looking at the query failure cases in test_rpc_exception.py PS9, Line 301: rpc send $3 done nit the rpc send wasn't done is a bit awkward, can you reword? "... rpc send completed: true/false" http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS9, Line 247: duplicated nit: duplicate http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: PS9, Line 138: duplicated nit: duplicate PS9, Line 138: is done can you make this more specific here? e.g. maybe "true if the final report has been received for the fragment instance." http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS9, Line 227: d nit: duplicate http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/testutil/fault-injection-util.h File be/src/testutil/fault-injection-util.h: PS9, Line 64: /// not clear this is the mod PS9, Line 66: freq maybe this should be every_nth_rpc or such PS9, Line 75: FAULT_INJECTION_RPC_EXCEPTION it'd be nice to have FAULT_INJECTION_SEND_RPC_EX and FAULT_INJECTION_RECV_RPC_EX rather than having to pass a bool. It's fine if you don't wanna do that in this patch, but leave a TODO at least http://gerrit.cloudera.org:8080/#/c/7284/9/tests/custom_cluster/test_rpc_exception.py File tests/custom_cluster/test_rpc_exception.py: PS9, Line 72: execute_test_query("Called read on non-open socket" why does this result in a query failure whereas recv_timed_out does not ? I see in the code where the handling is different, but not sure if there's a reason we should be failing the query here and not in recv_timed_out? I'm not suggesting we should change this now either way. -- To view, visit http://gerrit.cloudera.org:8080/7284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
