Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/parquet/hdfs-parquet-scanner.cc@253 PS6, Line 253: conjuncts.reserve(scan_node_->conjuncts().size() + > Pass reserve to ctor instead. Avoiding it for the reason mentioned in another similar comment. http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/parquet/hdfs-parquet-scanner.cc@274 PS6, Line 274: const vector<ScalarExpr*>& conjuncts) const { > Pass reserve to ctor instead. Passing size to ctor would not only reserve the memory it would also initialise it with some values (0 for int, default constructor for user defined class etc ). So using push_back and insert after that would give a different result: https://onlinegdb.com/uyvYdBNZ1 http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/parquet/parquet-column-chunk-reader.h File be/src/exec/parquet/parquet-column-chunk-reader.h: http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/parquet/parquet-column-chunk-reader.h@124 PS6, Line 124: Status ReadNextDataPage( > Probably not necessary/good to templatize page-level functions like this un I agree. Since it is at the page level we can avoid it. Removed it now. http://gerrit.cloudera.org:8080/#/c/17860/5/be/src/exec/parquet/parquet-column-chunk-reader.h File be/src/exec/parquet/parquet-column-chunk-reader.h: http://gerrit.cloudera.org:8080/#/c/17860/5/be/src/exec/parquet/parquet-column-chunk-reader.h@125 PS5, Line 125: bool* eos, uint8_t** data, int* data_size, bool read_data = true); > What was the motivation for inlining this (page-level) function? inlining was side effect of using template; since definition for templated function has to be visible everywhere it is used, it was pulled into header. But since I removed template I have removed inlining too. http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/scratch-tuple-batch-test.cc File be/src/exec/scratch-tuple-batch-test.cc: http://gerrit.cloudera.org:8080/#/c/17860/6/be/src/exec/scratch-tuple-batch-test.cc@64 PS6, Line 64: scoped_ptr<ScratchTupleBatch> scratch_batch( > line too long (100 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 12 Oct 2021 13:24:29 +0000 Gerrit-HasComments: Yes
