Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
......................................................................


Patch Set 15:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@256
PS15, Line 256:       // Specifically run most DDLs asynchronously in 
async_exec_thread_ except
              :       // for CREATE_TABLE_AS_SELECT which runs the execution 
step asynchronously.
              :       if (catalog_op_type() == TCatalogOpType::DDL &&
              :           ddl_type() == TDdlType::CREATE_TABLE_AS_SELECT) {
              :         LOG_AND_RETURN_IF_ERROR(ExecDdlRequest());
              :       } else {
              :         RunExecDdlRequestAsynchronously();
              :       }
Style point:
Elsewhere I'm going to be making the decision to be async more complicated, so 
I'm thinking we can push the decision to go async into ExecDdlRequest() and the 
call would be like it was. Inside ExecDdlRequest() we'll decide whether to go 
async.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@366
PS15, Line 366:     case TCatalogOpType::USE: {
              :       lock_guard<mutex> l(session_->lock);
              :       session_->database = 
exec_request_->catalog_op_request.use_db_params.db;
              :       return Status::OK();
              :     }
Most of the actions in this function could require metadata operations, but 
this one is purely a session state update. It would be nice not to use the 
async logic for this.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@655
PS15, Line 655:   if (catalog_op_type() != TCatalogOpType::DDL &&
              :       catalog_op_type() != TCatalogOpType::RESET_METADATA) {
              :     Status status = 
ExecLocalCatalogOp(exec_request_->catalog_op_request);
              :     lock_guard<mutex> l(lock_);
              :     return UpdateQueryStatus(status);
              :   }
See my comment in ExecLocalCatalogOp() about USE. It would be nice to avoid 
async for that case.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@662
PS15, Line 662:   if (ddl_type() == TDdlType::COMPUTE_STATS) {
              :     TComputeStatsParams& compute_stats_params =
              :         
exec_request_->catalog_op_request.ddl_params.compute_stats_params;
              :     RuntimeProfile* child_profile =
              :         RuntimeProfile::Create(&profile_pool_, "Child Queries");
              :     profile_->AddChild(child_profile);
              :     // Add child queries for computing table and column stats.
              :     vector<ChildQuery> child_queries;
              :     if (compute_stats_params.__isset.tbl_stats_query) {
              :       RuntimeProfile* profile =
              :           RuntimeProfile::Create(&profile_pool_, "Table Stats 
Query");
              :       child_profile->AddChild(profile);
              :       
child_queries.emplace_back(compute_stats_params.tbl_stats_query, this,
              :           parent_server_, profile, &profile_pool_);
              :     }
              :     if (compute_stats_params.__isset.col_stats_query) {
              :       RuntimeProfile* profile =
              :           RuntimeProfile::Create(&profile_pool_, "Column Stats 
Query");
              :       child_profile->AddChild(profile);
              :       
child_queries.emplace_back(compute_stats_params.col_stats_query, this,
              :           parent_server_, profile, &profile_pool_);
              :     }
              :
              :     if (child_queries.size() > 0) {
              :       
RETURN_IF_ERROR(child_query_executor_->ExecAsync(move(child_queries)));
              :     } else {
              :       SetResultSet({"No partitions selected for incremental 
stats update."});
              :     }
              :     return Status::OK();
              :   }
I think we can skip going async for the compute stats case. It is starting a 
couple child queries and returning quickly.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720
PS15, Line 720:   if (catalog_op_type() == TCatalogOpType::DDL &&
              :       ddl_type() == TDdlType::CREATE_TABLE_AS_SELECT) {
              :     // At this point, the remainder of the CTAS request executes
              :     // like a normal DML request. As with other DML requests, 
it will
              :     // wait for another catalog update if any partitions were 
altered as a result
              :     // of the operation.
              :     DCHECK(exec_request_->__isset.query_exec_request);
              :     
RETURN_IF_ERROR(ExecAsyncQueryOrDmlRequest(exec_request_->query_exec_request));
              :   }
To handle CTAS, one option is to have ExecAsyncQueryOrDmlRequest() have a mode 
that stays executing in the same thread. In other words, ExecDdlRequest() 
spawns a thread for the part of this function that does the catalog op + 
everything after, and that thread ends up calling ExecAsyncQueryOrDmlRequest() 
with a parameter telling it to stay in this thread. (Naming of these functions 
would start to be an issue.)

I think that would be doable but we need to think through the state 
transitions. ExecAsyncQueryOrDmlRequest() currently starts with 
ExecState::INITIALIZED and transitions to ExecState::PENDING. We'd have to 
decide how to handle that. Either ExecDdlRequest() would return with the status 
still at INITIALIZED and then ExecAsyncQueryOrDmlRequest() would behave like 
normal or ExecDddlRequest() would return with status PENDING and 
ExecAsyncQueryOrDmlRequest() would need to be able to handle that case.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@738
PS15, Line 738:     ABORT_IF_ERROR(
ABORT_IF_ERROR will crash Impala if the Status is not OK. not-OK status for 
queries should be handled and returned to the client. This should be 
RETURN_IF_ERROR and the function around it should have a return type of Status.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@743
PS15, Line 743:   UpdateNonErrorExecState(ExecState::PENDING);
For DDLs, today we skip the PENDING state, so I think it makes sense to go 
directly to RUNNING. I don't think clients make a major distinction. We use 
PENDING in the query case to correspond to waiting for admission control, which 
doesn't apply here.



--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Comment-Date: Fri, 08 Oct 2021 19:44:54 +0000
Gerrit-HasComments: Yes

Reply via email to