Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered 
out rows in Parquet table.
......................................................................


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG@17
PS4, Line 17: Upto 2.5x improvement for non-page indexed and
            : upto 4x improvement in page index seen
Great results! Are these whole query speedups or only scan-time speedups?


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.h
File be/src/exec/parquet/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.h@928
PS4, Line 928: (vector<ScalarExpr*>
chould be const&


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.h@942
PS4, Line 942: 1-20
nit: maybe you could say a few words about how rows 9-10 are filtered out in 
this case.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@252
PS4, Line 252: conjuncts
We could reserve() capacity for this vector.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@258
PS4, Line 258:   unordered_set<SlotId> slot_ids;
             :   slot_ids.insert(std::begin(conjunct_slot_ids), 
std::end(conjunct_slot_ids));
nit: unordered_set has a constructor that takes a two iterators.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@271
PS4, Line 271: vector<SlotId> slot_ids;
We should reserve() capacity for the vector.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2131
PS4, Line 2131: AssembleRowsInternal
Maybe AssembleRowsWithoutLateMaterialization()?


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2193
PS4, Line 2193: /// High-level steps of this function:
This comments needs to be extended with the explanation of late materialization.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2234
PS4, Line 2234: 1
nit: magic number


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2240
PS4, Line 2240: row_end
nit: row_group_end?


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2255
PS4, Line 2255:       Status fill_status;
We already have a 'fill_status' at L2233. Maybe we can just reuse that one.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2329
PS4, Line 2329: ConvertToRange
Could be static, then maybe we could add backend tests for this.



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 06 Oct 2021 14:14:10 +0000
Gerrit-HasComments: Yes

Reply via email to