Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )
Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever ...................................................................... Patch Set 20: (13 comments) Address all comments except for the one that runs CREATE TABLE AS SELECT in async. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720 PS15, Line 720: : // Set the results to be reported to the client. : SetResultSet(catalog_op_executor_->ddl_exec_response()); : return Status::OK(); : } : : void ClientRequestState::ExecDdlRequestImplAsync() { : string op_type = catalog_op_type() == TCatalogOpType::DDL ? : > I hacked a bit on this, and the modifications are not that big. This is wha Thanks a lot Joe for this. I'll integrate your change in a separate patch set for this JIRA. http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@643 PS17, Line 643: DdlRequ > Nit: We have an unrelated feature called "sync_ddl". If we can avoid simila Done. Change the naming slightly to make it clearer. Impl for implementation ExecDdlRequestImplAsync() ExecDdlRequestImplSync() ShouldRunExecDdlAsync() http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@688 PS17, Line 688: Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request); : { > Same as other DebugAction location. Same fixes. This debug action was removed since it is in the sync path. http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@728 PS17, Line 728: > For this async code, I think we are better off with a void return value, be Done http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@735 PS17, Line 735: Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request); : { > For here, I think you can use DebugActionNoFail(), since we are using this Done http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@744 PS17, Line 744: exec_request_->query_options.sync_ddl); : { : lock_guard<mutex> l(lock_); > An important thing here is that nothing is actually using the return value Good point. Done http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@778 PS17, Line 778: RETURN_IF_ERROR(Thread::Create("impala-server", "async_exec_thread_", : &ClientRequestState::ExecDdlRequestImplAsync, this, &async_exec_thread > Please go look at the comment that I made on the previous upload about ABOR Done http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@781 PS17, Line 781: return Status::OK(); > Do this before spawning the thread (similar to how ExecAsyncQueryOrDmlReque Okay. Changed the order and set the state to PENDING to account for the possibility of failure to fork the thread. http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@643 PS19, Line 643: Status ClientRequestState::ExecDdlRequestImplSync() { > LOAD DATA might also reset metadata through CatalogOpExecutor::Exec which m Interesting. By looking at the code, it seems the real load data is done in java (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/Frontend.java#L748). After that, the meta-data refresh is done with Catalog Op Exec call (https://github.com/apache/impala/blob/master/be/src/service/client-request-state.cc#L259). Looks like some more research is needed to correctly handle AWS NLB-type timeout for data loading in general. My guess is that frontend_->LoadData() probably should be running in a thread too. For this JIRA, I will document the limitation in the commit message and file a new JIRA. http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@776 PS19, Line 776: { > I think earlier this was not being counted for 'EXEC_TIME_LIMIT_S', but aft Did a test with CREAT TABLE with execution time limit and found the limit is tracked by admission control regardless of how the ddl is executed. See https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L1336, where queries_by_timestamp_ is updated with the DDL query. Thus I think we are OK here. http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@779 PS19, Line 779: &ClientRequestState::ExecDdlRequestImplAsync, this, &async_exec_thread_)); > Thread spawned just above (Line 776) might finish off before the execution Done. See my reply to Joe's comment in this area. http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@913 PS19, Line 913: cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension()) : cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension( : sync_ddl=[0])) > Test dimensions run tests multiple times with different configurations. If Good to know! Done. http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@921 PS19, Line 921: dding/dropping of .jar and .so in the lib cache. > You can omit multiple_impalad. Done -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Comment-Date: Tue, 12 Oct 2021 15:49:48 +0000 Gerrit-HasComments: Yes
