Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 4: (12 comments) http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG@17 PS4, Line 17: Upto 2.5x improvement for non-page indexed and : upto 4x improvement in page index seen Great results! Are these whole query speedups or only scan-time speedups? http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.h File be/src/exec/parquet/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.h@928 PS4, Line 928: (vector<ScalarExpr*> chould be const& http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.h@942 PS4, Line 942: 1-20 nit: maybe you could say a few words about how rows 9-10 are filtered out in this case. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@252 PS4, Line 252: conjuncts We could reserve() capacity for this vector. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@258 PS4, Line 258: unordered_set<SlotId> slot_ids; : slot_ids.insert(std::begin(conjunct_slot_ids), std::end(conjunct_slot_ids)); nit: unordered_set has a constructor that takes a two iterators. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@271 PS4, Line 271: vector<SlotId> slot_ids; We should reserve() capacity for the vector. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2131 PS4, Line 2131: AssembleRowsInternal Maybe AssembleRowsWithoutLateMaterialization()? http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2193 PS4, Line 2193: /// High-level steps of this function: This comments needs to be extended with the explanation of late materialization. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2234 PS4, Line 2234: 1 nit: magic number http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2240 PS4, Line 2240: row_end nit: row_group_end? http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2255 PS4, Line 2255: Status fill_status; We already have a 'fill_status' at L2233. Maybe we can just reuse that one. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2329 PS4, Line 2329: ConvertToRange Could be static, then maybe we could add backend tests for this. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 06 Oct 2021 14:14:10 +0000 Gerrit-HasComments: Yes
