[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

Csaba Ringhofer (Code Review) Tue, 06 Feb 2024 09:32:22 -0800

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )


Change subject: IMPALA-12533: Support row materialization outside of fetch lock
......................................................................


Patch Set 13:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@21
PS11, Line 21: (default 0.2).
> This is effectively inactive without setting options in the Hive JDBC drive
Ack, I agree that it is better if clients don't need to set this.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23
PS11, Line 23: Once enabled, delayed materialization will be used for the 
remainder of
             : the rows fetched for the current query
> The Hive JDBC driver ramps up threads slowly as row batches are fetched. Th
Thanks for the info, I agree that 2 is solved by gradual ramping on client side.

How does the JDBC client control maximum stream count? I am still concerned 
about 1, so eating to many cores with query that has a large materialization 
cost.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24
PS11, Line 24:  It is expected that the default
             : threshold will never be reached with a single fetch stream.
>There are no parallel fetches as they are serialized by the fetch lock

I meant counting this before taking the fetch lock to track the number of times 
fetch is called in parallel, for example at 
https://github.com/apache/impala/blob/0d0a410cf65951d634f81ec14b474d663f9cf587/be/src/service/impala-hs2-server.cc#L987

> Using the fetch lock acquisition latency is superior to counting

I agree that using latency is better to finetune this. My goal with the counter 
is to be 100% sure that Impala will not enter the new code path unless there 
are parallel fetches. So I think that either this counter or the default change 
suggested in line 21
would make this patch safer. Besides safety, accidentally delaying 
materialization for a single threaded client would be also confusing in the 
profile, as non 0 ResultFlushTimer usually indicates multithread client.


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@186
PS13, Line 186: Initialize
This is only done in BufferedPlanRootSink, but not in class 
BlockingPlanRootSink. It could be mentioned in the commit message parallel 
fetches are only supported in the result spooling case.


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@187
PS13, Line 187: ,
nit: .


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@188
PS13, Line 188:       
RETURN_IF_ERROR(results->InitDelayedMaterialization(*row_desc_,
If I understand correctly this will create a new RowBatchContext on each fetch 
RPC. I am not sure about the cost of this, but as it calls Prepare() for each 
expression with THREAD_LOCAL scope, it may have some nontrivial CPU and memory 
overhead in complex expressions.
I wouldn't worry about this with default fetch size of 8192, but in the long 
term it would probably better to have a pool of RowBatchContext's that can be 
reused.


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc@254
PS11, Line 254:     RETURN_IF_ERROR(result_set->Flush());
              :     result_flush_timer.Stop();
              :     
query_handle->AddResultFlushTime(result_flush_timer.ElapsedTime());
> This is not necessary since we would not want to report flush time in the c
ack


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc@251
PS13, Line 251:   if (result_set->DelayedMaterializationEnabled()) {
If I understand correctly at this point the coordinator may have already hit 
EOS and started releasing its resources. A comment could be added about this + 
a debug action to be able to force doing the Flush() with released coordinator 
resources.

I am worried about RowBatchContext's validity at this point (in case there was 
an EOS) - e.g. what guarantees that its RuntimeState* state_ is not already 
deleted? Or some mechanism guarantees that the coordinator's fragment resources 
are kept alive until the fetch completes?


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc@477
PS13, Line 477:
nit: extra space



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 13
Gerrit-Owner: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Tue, 06 Feb 2024 17:31:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

Reply via email to