Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG@17 PS4, Line 17: on 'parquet_materialization_threshold' is introduced, : which is minimum number of consecutive > Great results! Are these whole query speedups or only scan-time speedups? For high selectivity queries, I measured complete query time speedup. For Low selectivity queries I measured just the scan time as queries for them were not simple select on filter queries. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner.h File be/src/exec/hdfs-columnar-scanner.h: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner.h@60 PS3, Line 60: > I am thinking to push it to ScratchTupleBatch. If not will document it in n Pushed it to ScratchTupleBatch. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2377 PS3, Line 2377: col_reader->Rea > right, missed this one. will fix it in next revision. will keep this open f Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@252 PS4, Line 252: conjuncts > We could reserve() capacity for this vector. Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@258 PS4, Line 258: std::end(scan_node_->filter_exprs())); : auto conjunct_slot_ids = GetSlotIdsForConjuncts(conjuncts); > nit: unordered_set has a constructor that takes a two iterators. not using it anymore. removed it. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@271 PS4, Line 271: ctor<SlotId> HdfsParquet > We should reserve() capacity for the vector. Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2131 PS4, Line 2131: er::Read*ValueBatch( > Maybe AssembleRowsWithoutLateMaterialization()? Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2193 PS4, Line 2193: row_group_rows_read_ += num_rows_rea > This comments needs to be extended with the explanation of late materializa Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2234 PS4, Line 2234: > nit: magic number commented it. http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2240 PS4, Line 2240: !column > nit: row_group_end? Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2255 PS4, Line 2255: num_rows_read += scratch_batch_->num_tuples; > We already have a 'fill_status' at L2233. Maybe we can just reuse that one. Done http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2329 PS4, Line 2329: 0) { > Could be static, then maybe we could add backend tests for this. Have moved it to ScratchTupleBatch as 'GetMicroBatches' and added tests for it. http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1538 PS3, Line 1538: } > Hmm. It seems a false returning status means likely the data in the page is ok. bailing out if it returns false. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 08 Oct 2021 16:26:03 +0000 Gerrit-HasComments: Yes
