Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 )
Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate ...................................................................... Patch Set 45: (2 comments) Couple of comments on the disabling logic that I think are bugs. Haven't done a review pass but thought I should mention while I noticed. http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@652 PS45, Line 652: minmax_filter->DecideAlwaysTrueForOverlap(col_type, min_slot, max_slot, threshold); I think this would disable it for all subsequent row groups across all threads (since the filter object is shared), probably need to store this in the HdfsParquetScanner object and overwrite it for each row group. The HdfsParquetScanner object is local to the thread. http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/util/min-max-filter.h@76 PS45, Line 76: always_true_ = !(ComputeOverlapRatio(type, data_min, data_max) < threshold); Filters can be read/evaluated from multiple threads, so this will be flagged as a data race by TSAN afaict. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 45 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 08 Jan 2021 21:12:42 +0000 Gerrit-HasComments: Yes
