Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@21
PS11, Line 21: (default 0.2).
As there is only minimal testing in the patch, it may better to disable this by 
default. If someone can test this properly than they could also enable it in 
their environment.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23
PS11, Line 23: Once enabled, delayed materialization will be used for the 
remainder of
             : the rows fetched for the current query
Just brain dump, I don't think that the patch needs to be changed:
It would be nice to have a more sophisticated heuristic in the future. I can 
imagine 2 scenarios where the current one leads to regression:
1. the coordinator already employs all cores in the CPU
2. while the client starts with parallel fetches, later it becomes practically 
single threaded

In both cases the overhead of the extra copy is added without real gains.

2. seems typical with a client that does something expensive with the fetched 
data, like writing to file. At the beginning it starts with many threads, but 
it won't be able to process the results returned by all these threads


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24
PS11, Line 24:  It is expected that the default
             : threshold will never be reached with a single fetch stream.
It seems safer to guarantee this by detecting parallel fetches and only enable 
delayed materialization for a query if there were parallel fetches at least 
once.

Theoretically unlucky GC or CPU scheduling could lead to hitting the threshold, 
especially if row materialization time is very small.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@31
PS11, Line 31: -Manual testing with Multi-stream Hive JDBC driver. This showed 
a 20%
A follow up Jira would be nice about creating some simple multi thread client 
that can be executed in the tests. Automatic tests would be great as they would 
be also exercised with ASAN and TSAN builds.


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h@329
PS11, Line 329: ror
typo


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc@254
PS11, Line 254:     RETURN_IF_ERROR(result_set->Flush());
              :     result_flush_timer.Stop();
              :     
query_handle->AddResultFlushTime(result_flush_timer.ElapsedTime());
the timer is not stopped in case line 254 returns an error


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc@56
PS11, Line 56: class RowBatchContext {
Can you add some comment about the role of this class?



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Wed, 31 Jan 2024 17:51:14 +0000
Gerrit-HasComments: Yes

Reply via email to