Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 )
Change subject: IMPALA-2990: timeout unresponsive queries in coordinator ...................................................................... Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG@34 PS8, Line 34: 'REPORT_EXEC_STATUS_SEND:FAIL@0.1|REPORT_EXEC_STATUS_RECV:FAIL@0.1' Stale ? http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@400 PS8, Line 400: FLAGS_backend_client_rpc_timeout_ms Can you please leave a TODO on rethinking the timeout period for RPCs in general. As we convert more of the RPCs from Thrift To KRPC, this flag's role has changed. Historically, this is used for setting socket timeout with Thrift RPC so that an Impala thread will pop out of the Thrift RPC stack and check for cancellation. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@414 PS8, Line 414: spend spent http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@217 PS8, Line 217: "reporting is disabled."); and only the final report is sent. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@218 PS8, Line 218: 180 To avoid regression, let's be more conservative and set it to be at least 300s, if not larger. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@221 PS8, Line 221: The percent padding that a " : "coordinator should wait to recieve a report after --status_report_max_retry_s " : "before deciding that a backend is unresponsive and the query should be cancelled. May be clearer to include the formula you have in the commit message. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@421 PS8, Line 421: } May wanna check if --status_report_cancellation_padding is set to something sane. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@2248 PS8, Line 2248: * (1 + FLAGS_status_report_cancellation_padding / 100.0); DCHECK_GT(max_lag_ms, 0); -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 23 Apr 2019 22:34:25 +0000 Gerrit-HasComments: Yes