Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 )
Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate ...................................................................... Patch Set 45: (5 comments) http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@652 PS45, Line 652: minmax_filter->DecideAlwaysTrueForOverlap(col_type, min_slot, max_slot, threshold); > I think this would disable it for all subsequent row groups across all thre Done http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@657 PS45, Line 657: << ", columnType=" << col_type.DebugString() > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/exec/parquet/hdfs-parquet-scanner.cc@659 PS45, Line 659: << ", data max=" << GetIntTypeValue(col_type, max_slot) > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/16720/45/be/src/util/min-max-filter.h@76 PS45, Line 76: always_true_ = !(ComputeOverlapRatio(type, data_min, data_max) < threshold); > Filters can be read/evaluated from multiple threads, so this will be flagge Good point! Created a local copy in HdfsParquetScanner as suggested. Plan to keep keep the modified logic for alwaysTrue_ in (base class) min max filter to allow alwaysTrue_ to be set. The use cases can be the following in hash join builder. 1. Too many data values have been inserted (say over a threshold of 1000); 2. Sub-ranges are not selective enough. http://gerrit.cloudera.org:8080/#/c/16720/45/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/16720/45/tests/query_test/test_runtime_filters.py@267 PS45, Line 267: @SkipIfLocal.multiple_impalad > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 45 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 11 Jan 2021 17:41:57 +0000 Gerrit-HasComments: Yes
