Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materilization of columns for filtered out 
rows in Parquet table.
......................................................................


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17860/4//COMMIT_MSG@17
PS4, Line 17: on 'parquet_materialization_threshold' is introduced,
            : which is minimum number of consecutive
> Great results! Are these whole query speedups or only scan-time speedups?
For high selectivity queries, I measured complete query time speedup. For Low 
selectivity queries I measured just the scan time as queries for them were not 
simple select on filter queries.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner.h
File be/src/exec/hdfs-columnar-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/hdfs-columnar-scanner.h@60
PS3, Line 60:
> I am thinking to push it to ScratchTupleBatch. If not will document it in n
Pushed it to ScratchTupleBatch.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@2377
PS3, Line 2377:    col_reader->Rea
> right, missed this one. will fix it in next revision. will keep this open f
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@252
PS4, Line 252: conjuncts
> We could reserve() capacity for this vector.
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@258
PS4, Line 258:       std::end(scan_node_->filter_exprs()));
             :   auto conjunct_slot_ids = GetSlotIdsForConjuncts(conjuncts);
> nit: unordered_set has a constructor that takes a two iterators.
not using it anymore. removed it.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@271
PS4, Line 271: ctor<SlotId> HdfsParquet
> We should reserve() capacity for the vector.
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2131
PS4, Line 2131: er::Read*ValueBatch(
> Maybe AssembleRowsWithoutLateMaterialization()?
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2193
PS4, Line 2193:   row_group_rows_read_ += num_rows_rea
> This comments needs to be extended with the explanation of late materializa
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2234
PS4, Line 2234:
> nit: magic number
commented it.


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2240
PS4, Line 2240: !column
> nit: row_group_end?
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2255
PS4, Line 2255:     num_rows_read += scratch_batch_->num_tuples;
> We already have a 'fill_status' at L2233. Maybe we can just reuse that one.
Done


http://gerrit.cloudera.org:8080/#/c/17860/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@2329
PS4, Line 2329:  0) {
> Could be static, then maybe we could add backend tests for this.
Have moved it to ScratchTupleBatch as 'GetMicroBatches' and added tests for it.


http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17860/3/be/src/exec/parquet/parquet-column-readers.cc@1538
PS3, Line 1538: }
> Hmm. It seems a false returning status means likely the data in the page is
ok. bailing out if it returns false.



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 08 Oct 2021 16:26:03 +0000
Gerrit-HasComments: Yes

Reply via email to