Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18327 )

Change subject: IMPALA-11123: Optimize count(star) for ORC scans
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-orc-scanner.h@435
PS3, Line 435:
> nit. Handle count(*) queries by reading the row count
Done


http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-orc-scanner.cc@780
PS3, Line 780:   *dst_slot = num_rows;
> Can you unify this with the Parquet implementation? If I didn't miss someth
Done


http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@776
PS4, Line 776:   DCHECK_LE(stripe_rows_read_, num_rows);
> Could you comment that only the scanner of the footer split will run in thi
Added the comment in GetNextInternal. Reading it again later and realized it is 
still a little bit confusing.
Might need to reorder the branches.


http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@806
PS4, Line 806: can_node_->IsZeroSlotTableScan()) {
> I think count(*) won't go here now. If there are any conjuncts for the coun
We can still get here for count(*) against full acid table. Added example in 
the comment.


http://gerrit.cloudera.org:8080/#/c/18327/4/be/src/exec/hdfs-orc-scanner.cc@807
PS4, Line 807:       // This is an unoptimized count(*) case.
> Could you comment that only the scanner of the footer split will run in thi
Added the comment in GetNextInternal.


http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-scanner.cc@846
PS3, Line 846: !scan_node->ReadsFileMetadataOnly() || footer_split == split) {
> Having a function like "readsOnlyMetadata()" could be more descriptive.
Done


http://gerrit.cloudera.org:8080/#/c/18327/3/be/src/exec/hdfs-scanner.cc@847
PS3, Line 847:         ScanRangeMetadata* split_metadata =
> Shouldn't we also check row_batches_need_validation_ like in HdfsOrcScanner
We don't need to. The reason is this logic is a more like a prediction rather 
than a guarantee.
row_batches_need_validation_ can not be concluded until we read the file. 
Therefore, it is possible that footer scanner fallback to full scan behavior 
even when we thought we'll read file metadata only here.
Added related logging in hdfs-orc-scanner.cc.


http://gerrit.cloudera.org:8080/#/c/18327/3/testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test:

http://gerrit.cloudera.org:8080/#/c/18327/3/testdata/workloads/functional-planner/queries/PlannerTest/orc-stats-agg.test@82
PS3, Line 82: cardinality=24
> We may need to compute the cardinality differently for count star optimizat
Modified the cardinality count in HdfsScanNode.java.
However, the PlannerTest does not seem to validate the cardinality part of the 
planner. Changed them anyway in patch set 5.


http://gerrit.cloudera.org:8080/#/c/18327/4/testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test
File testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test:

http://gerrit.cloudera.org:8080/#/c/18327/4/testdata/workloads/functional-query/queries/QueryTest/orc-stats-agg.test@5
PS4, Line 5: from functional_orc_def.uncomp_src_alltypes
> Could you add a test to cover the old optimization (ie. zero slot table sca
Done



--
To view, visit http://gerrit.cloudera.org:8080/18327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fafa1182f97323aeb9ee39dd4e8ecd418fa6091
Gerrit-Change-Number: 18327
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Sat, 26 Mar 2022 18:43:13 +0000
Gerrit-HasComments: Yes

Reply via email to