Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 )
Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters ...................................................................... Patch Set 19: (8 comments) Did a quick walkthrough, will look into it in detail next week. http://gerrit.cloudera.org:8080/#/c/17075/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17075/19//COMMIT_MSG@19 PS19, Line 19: show_column_minmax_stats Do we need this query option? I mean if we have min/max stats then we'd probably want to show them. http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc@447 PS19, Line 447: ( nit: parentheses are not needed as the '.' member access takes precedence over the '&' adress-of operator, http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc@492 PS19, Line 492: ( nit: parentheses not needed http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/hdfs-scanner.h@348 PS19, Line 348: uint8_t enabled_for_rowgroup; Why do we need this flag? If enabled_for_rowgroup is false, then the min/max filter is completely turned off, right? In that case we shouldn't even send them to the scanner, or am I missing something? http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/parquet/hdfs-parquet-scanner.cc@662 PS19, Line 662: } nit: EvaluateOverlapForRowGroup() is already quite long, maybe this code could go into a separate function. http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc@950 PS19, Line 950: nit: indentation http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java File fe/src/main/java/org/apache/impala/catalog/ColumnStats.java: http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java@50 PS19, Line 50: LOG seems unused http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/17075/19/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@278 PS19, Line 278: */ It would be good to handle Iceberg tables that use Parquet data files. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 12 Mar 2021 17:02:09 +0000 Gerrit-HasComments: Yes