Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )
Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever ...................................................................... Patch Set 15: (7 comments) http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@256 PS15, Line 256: // Specifically run most DDLs asynchronously in async_exec_thread_ except : // for CREATE_TABLE_AS_SELECT which runs the execution step asynchronously. : if (catalog_op_type() == TCatalogOpType::DDL && : ddl_type() == TDdlType::CREATE_TABLE_AS_SELECT) { : LOG_AND_RETURN_IF_ERROR(ExecDdlRequest()); : } else { : RunExecDdlRequestAsynchronously(); : } Style point: Elsewhere I'm going to be making the decision to be async more complicated, so I'm thinking we can push the decision to go async into ExecDdlRequest() and the call would be like it was. Inside ExecDdlRequest() we'll decide whether to go async. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@366 PS15, Line 366: case TCatalogOpType::USE: { : lock_guard<mutex> l(session_->lock); : session_->database = exec_request_->catalog_op_request.use_db_params.db; : return Status::OK(); : } Most of the actions in this function could require metadata operations, but this one is purely a session state update. It would be nice not to use the async logic for this. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@655 PS15, Line 655: if (catalog_op_type() != TCatalogOpType::DDL && : catalog_op_type() != TCatalogOpType::RESET_METADATA) { : Status status = ExecLocalCatalogOp(exec_request_->catalog_op_request); : lock_guard<mutex> l(lock_); : return UpdateQueryStatus(status); : } See my comment in ExecLocalCatalogOp() about USE. It would be nice to avoid async for that case. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@662 PS15, Line 662: if (ddl_type() == TDdlType::COMPUTE_STATS) { : TComputeStatsParams& compute_stats_params = : exec_request_->catalog_op_request.ddl_params.compute_stats_params; : RuntimeProfile* child_profile = : RuntimeProfile::Create(&profile_pool_, "Child Queries"); : profile_->AddChild(child_profile); : // Add child queries for computing table and column stats. : vector<ChildQuery> child_queries; : if (compute_stats_params.__isset.tbl_stats_query) { : RuntimeProfile* profile = : RuntimeProfile::Create(&profile_pool_, "Table Stats Query"); : child_profile->AddChild(profile); : child_queries.emplace_back(compute_stats_params.tbl_stats_query, this, : parent_server_, profile, &profile_pool_); : } : if (compute_stats_params.__isset.col_stats_query) { : RuntimeProfile* profile = : RuntimeProfile::Create(&profile_pool_, "Column Stats Query"); : child_profile->AddChild(profile); : child_queries.emplace_back(compute_stats_params.col_stats_query, this, : parent_server_, profile, &profile_pool_); : } : : if (child_queries.size() > 0) { : RETURN_IF_ERROR(child_query_executor_->ExecAsync(move(child_queries))); : } else { : SetResultSet({"No partitions selected for incremental stats update."}); : } : return Status::OK(); : } I think we can skip going async for the compute stats case. It is starting a couple child queries and returning quickly. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720 PS15, Line 720: if (catalog_op_type() == TCatalogOpType::DDL && : ddl_type() == TDdlType::CREATE_TABLE_AS_SELECT) { : // At this point, the remainder of the CTAS request executes : // like a normal DML request. As with other DML requests, it will : // wait for another catalog update if any partitions were altered as a result : // of the operation. : DCHECK(exec_request_->__isset.query_exec_request); : RETURN_IF_ERROR(ExecAsyncQueryOrDmlRequest(exec_request_->query_exec_request)); : } To handle CTAS, one option is to have ExecAsyncQueryOrDmlRequest() have a mode that stays executing in the same thread. In other words, ExecDdlRequest() spawns a thread for the part of this function that does the catalog op + everything after, and that thread ends up calling ExecAsyncQueryOrDmlRequest() with a parameter telling it to stay in this thread. (Naming of these functions would start to be an issue.) I think that would be doable but we need to think through the state transitions. ExecAsyncQueryOrDmlRequest() currently starts with ExecState::INITIALIZED and transitions to ExecState::PENDING. We'd have to decide how to handle that. Either ExecDdlRequest() would return with the status still at INITIALIZED and then ExecAsyncQueryOrDmlRequest() would behave like normal or ExecDddlRequest() would return with status PENDING and ExecAsyncQueryOrDmlRequest() would need to be able to handle that case. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@738 PS15, Line 738: ABORT_IF_ERROR( ABORT_IF_ERROR will crash Impala if the Status is not OK. not-OK status for queries should be handled and returned to the client. This should be RETURN_IF_ERROR and the function around it should have a return type of Status. http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@743 PS15, Line 743: UpdateNonErrorExecState(ExecState::PENDING); For DDLs, today we skip the PENDING state, so I think it makes sense to go directly to RUNNING. I don't think clients make a major distinction. We use PENDING in the query case to correspond to waiting for admission control, which doesn't apply here. -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Comment-Date: Fri, 08 Oct 2021 19:44:54 +0000 Gerrit-HasComments: Yes
