Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
......................................................................


Patch Set 19:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720
PS15, Line 720:
              :   // Set the results to be reported to the client.
              :   SetResultSet(catalog_op_executor_->ddl_exec_response());
              :   return Status::OK();
              : }
              :
              : Status ClientRequestState::ExecAsyncDdlRequest() {
              :   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
              :
> Decided to keep this particular case (CREATE_TABLE_AS_SELECT) to run (the e
I hacked a bit on this, and the modifications are not that big. This is what 
that would look like:

http://gerrit.cloudera.org:8080/17916


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@643
PS17, Line 643: SyncDdl
Nit: We have an unrelated feature called "sync_ddl". If we can avoid similar 
names, we should.

One naming scheme would be:
ExecDdlRequest() - entrypoint that picks between sync/async
ShouldRunDdlAsync() - decision to go sync/async
ExecDdlRequestInt() - synchronous version
ExecDdlRequestIntAsync() - asynchronous version


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@688
PS17, Line 688:   Status status = 
catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
Same as other DebugAction location. Same fixes.


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@728
PS17, Line 728:  
For this async code, I think we are better off with a void return value, 
because nothing is actually consuming the return value. Status would be 
conveyed through UpdateQueryStatus().


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@735
PS17, Line 735:   Status status = 
catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
For here, I think you can use DebugActionNoFail(), since we are using this to 
add sleeps rather than actually failing.

Separately, a small nit: let's make the location strings similar to our other 
location strings in this file. Something like "CRS_BEFORE_CATALOG_OP_EXEC"


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@744
PS17, Line 744:       exec_request_->query_options.sync_ddl));
              :
              :   // Set the results to be reported to the cli
An important thing here is that nothing is actually using the return value of 
this function. So, I think we need to change this to use UpdateQueryStatus()

Status catalog_update_status = parent_server_->ProcessCatalogUpdateResult(...)
{
  lock_guard<mutex> l(lock_);
  RETURN_IF_ERROR(UpdateQueryStatus(catalog_update_status));
}


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@778
PS17, Line 778:     }
              :     UpdateNonErrorExecState(ExecState::RUNNING);
Please go look at the comment that I made on the previous upload about 
ABORT_IF_ERROR.

Separately, we don't need to be holding the lock when executing this.


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@781
PS17, Line 781:   }
Do this before spawning the thread (similar to how ExecAsyncQueryOrDmlRequest() 
does).


http://gerrit.cloudera.org:8080/#/c/17872/19/testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test
File 
testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test:

http://gerrit.cloudera.org:8080/#/c/17872/19/testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test@7
PS19, Line 7: drop table if exists alltypes_clone;
            : create external table alltypes_clone like 
functional_parquet.alltypes
            : location '$FILESYSTEM_PREFIX/test-warehouse/alltypes_parquet';
            : set debug_action="TIMED_WAIT_BEFORE_CATALOG_OP_EXEC:SLEEP@15000";
            : alter table alltypes_clone recover partitions;
Ideally, a test would fail if you ran it without the corresponding code 
changes. Timing changes are harder to test, because we aren't really changing 
the behavior of a SQL statement.

If you were doing the individual exec / getoperationstatus calls, you could 
verify that the exec is fast even with 
TIMED_WAIT_BEFORE_CATALOG_OP_EXEC:SLEEP@15000.

I think the HS2 tests can test this. This test seems to have useful pieces:
https://github.com/apache/impala/blob/master/tests/hs2/test_hs2.py#L282


http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@913
PS19, Line 913:     
cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_constraint(lambda v:
              :         v.get_value('protocol') == 'hs2')
Test dimensions run tests multiple times with different configurations. If you 
remove the add_constraint(), it would run this test with each of hs2, beeswax, 
and hs2-http. That seems reasonable, and then you don't need 
TestAlterTableRecoverWithBeeswax.


http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@921
PS19, Line 921:  multiple_impalad=self._use_multiple_impalad(vector)
You can omit multiple_impalad.

When sync_ddl=0, _use_multiple_impalad is false, and that is the default value.



--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Comment-Date: Tue, 12 Oct 2021 02:23:08 +0000
Gerrit-HasComments: Yes

Reply via email to