Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 )
Change subject: IMPALA-2990: timeout unresponsive queries in coordinator ...................................................................... Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/12299/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12299/4//COMMIT_MSG@30 PS4, Line 30: - Ran the stress test on tpch 500 on a 10 node cluster for 1000 > not too concerned about this -- average query concurrency is going to be in Done http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/runtime/coordinator.h@140 PS4, Line 140: if > nit: lower case 'i' Done http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/runtime/coordinator.cc@750 PS4, Line 750: return 0; > this is racy since exec_rpcs_complete_barrier_ is changed from nullptr to a Done http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@215 PS4, Line 215: DEFINE_int32(status_report_interval_ms, 5000, > instead of introducing a new flag, can we just set this based on the config I think you're saying to set this to 1.1 * GetMaxReportRetryMs()? http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@398 PS4, Line 398: > unrelated to this patch, but why is this even an option? shouldn't the back Of course, even in this configuration, backends will report their final status. This allows users to significantly reduce the load on the coordinator at the cost of not being able to monitor the progress of in-flight queries. I don't have strong feelings about whether or not this should be deprecated. I'm not aware of any users who rely on this. I spent a little time in 'git blame' and its been an option for years but there doesn't seem to be a documented reason for it. http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2226 PS4, Line 2226: } > Instead of forcing that the coordinator and executors all agree on this con Sure, that's pretty easy. http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2227 PS4, Line 2227: } : > don't we already log all the flags at startup? I think it'd be better to av Done http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2237 PS4, Line 2237: ate-> > I noticed that 'Offer' seems to call to BlockingPut under the covers, which Done http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2288 PS4, Line 2288: return Status::OK(); : } : : void ImpalaServer::ExpireQuery(ClientRequestState* crs, const Status& status) { : DCHECK(!status.ok()); : cancellation_thread_pool_->Offer( > are these flags documented and user-settable? Is it too late to change the Yes, all of these flags are documented and settable. The flag 'status_report_max_retries' shipped in 3.2, though it could of course be deprecated if we feel a max retry time is a significant improvement. http://gerrit.cloudera.org:8080/#/c/12299/4/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/12299/4/common/thrift/generate_error_codes.py@403 PS4, Line 403: 3m > ms? Done -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Mon, 25 Mar 2019 23:00:27 +0000 Gerrit-HasComments: Yes