Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@21
PS11, Line 21: (default 0.2).
> As there is only minimal testing in the patch, it may better to disable thi
This is effectively inactive without setting options in the Hive JDBC driver or 
setting the threshold to 0, and that potential performance regressions are 
minimal (see comments below). I would rather leave this with the threshold set 
as-is so that it is not required to set additional options from JDBC to benefit 
from this enhancement.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23
PS11, Line 23: Once enabled, delayed materialization will be used for the 
remainder of
             : the rows fetched for the current query
> Just brain dump, I don't think that the patch needs to be changed:
The Hive JDBC driver ramps up threads slowly as row batches are fetched. That 
should avoid most of these cases.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24
PS11, Line 24:  It is expected that the default
             : threshold will never be reached with a single fetch stream.
> It seems safer to guarantee this by detecting parallel fetches and only ena
There are no parallel fetches as they are serialized by the fetch lock. Using 
the fetch lock acquisition latency is superior to counting how many streams are 
referencing the statement since we can avoid delayed materialization when the 
benefits would be negligible. The current threshold (20%) will typically only 
delay when there are more than 2 streams.
I also measured performance with narrow and wide table and the performance 
difference was very negligible, <5% for degenerate wide table cases and <1% for 
narrow tables.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@31
PS11, Line 31: -Manual testing with Multi-stream Hive JDBC driver. This showed 
a 20%
> A follow up Jira would be nice about creating some simple multi thread clie
Possibly once the Hive driver changes are available in a release. I did test 
manually with asan and tsan and the multi-stream driver.


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h@329
PS11, Line 329: ror
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc@254
PS11, Line 254:     RETURN_IF_ERROR(result_set->Flush());
              :     result_flush_timer.Stop();
              :     
query_handle->AddResultFlushTime(result_flush_timer.ElapsedTime());
> the timer is not stopped in case line 254 returns an error
This is not necessary since we would not want to report flush time in the case 
of an error.


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc@56
PS11, Line 56: class RowBatchContext {
> Can you add some comment about the role of this class?
Done



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Mon, 05 Feb 2024 19:23:25 +0000
Gerrit-HasComments: Yes

Reply via email to