Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 )
Change subject: IMPALA-6636: Use async IO in ORC scanner ...................................................................... Patch Set 25: (2 comments) http://gerrit.cloudera.org:8080/#/c/15370/25//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15370/25//COMMIT_MSG@24 PS25, Line 24: relies on the backend to divide them as : needed. > I think we can do this in the orc-scanner as well. There are some APIs like Filed IMPALA-11099 for this. http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2199 PS25, Line 2199: the current ORC scanner does not have the select : // count(*) optimization yet like in Parquet. > Isn't this the optimization for count(*)? https://github.com/apache/impala/ Select count(*) over nested column still need to read an ORC column, for example: select count(*) from complextypes_partitioned.int_array For this kind of query, that code region will not be evaluated since IsZeroSlotTableScan() == false (materialized_slots is empty, but tuple_desc()->tuple_path() is not empty). Therefore, we still need to allocate memory to read column int_array in this example. I check for parquet and it's select count optimization is not turned on in this example. returned columnByteSizes is also empty. I suppose it still read the column, but not doing it in sync manner. We can do it too if we want, skip async io if materialized_slots is empty. -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 25 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Tue, 01 Feb 2022 04:40:36 +0000 Gerrit-HasComments: Yes
