Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18327 )

Change subject: IMPALA-11123: Optimize count(star) for ORC scans
......................................................................


Patch Set 4:

(6 comments)

The patch looks pretty good now!

http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@776
PS4, Line 776:   int64_t num_rows = 
static_cast<int64_t>(reader_->getNumberOfRows());
Could you comment that only the scanner of the footer split will run in this 
case? Also mention we have the special logics in 
HdfsScanner::IssueFooterRanges().


http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@806
PS4, Line 806: This is an unoptimized count(*) case.
I think count(*) won't go here now. If there are any conjuncts for the 
count(*), we will need to materialize some slots thus it's not a zero slot 
table scan. I think we should use the comment at line 796 (move it here) and 
change the example to "select 1" over the table.


http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@807
PS4, Line 807:       // Insert 'num_to_commit' template tuples into 'row_batch'.
Could you comment that only the scanner of the footer split will run in this 
case? Also mention we have the special logics in 
HdfsScanner::IssueFooterRanges().


http://gerrit.cloudera.org:8080/#/c/18327/2/testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test:

http://gerrit.cloudera.org:8080/#/c/18327/2/testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test@4
PS2, Line 4: functional_orc_def.uncomp_src_alltypes
> This table follow schema from functional.alltypes, but without "transaction
I see. I thought managed tables can only be transactional but that's wrong. 
Double checked that the file schema is non-transactional. Thanks for the 
explanation!


http://gerrit.cloudera.org:8080/#/c/18327/4/testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test:

http://gerrit.cloudera.org:8080/#/c/18327/4/testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test@5
PS4, Line 5: from functional_orc_def.uncomp_src_alltypes
Could you add a test to cover the old optimization (ie. zero slot table scan)? 
E.g.

  select 1 from functional_orc_def.alltypestiny


http://gerrit.cloudera.org:8080/#/c/18327/2/tests/query_test/test_aggregation.py
File tests/query_test/test_aggregation.py:

http://gerrit.cloudera.org:8080/#/c/18327/2/tests/query_test/test_aggregation.py@279
PS2, Line 279:     if (vector.get_value('table_format').file_format != 'text' or
             :           vector.get_value('table_format').compression_codec != 
'none'):
> Looking again, the core exploration of this test only have single 'text/non
Thanks for looking into this! I think it worths a comment here to save time of 
other developers.



--
To view, visit http://gerrit.cloudera.org:8080/18327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fafa1182f97323aeb9ee39dd4e8ecd418fa6091
Gerrit-Change-Number: 18327
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Sat, 26 Mar 2022 09:52:35 +0000
Gerrit-HasComments: Yes

Reply via email to