Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21377 )
Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column ...................................................................... IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly. It needs to transform a BetweenPredicate into a CompoundPredicate consisting of upper bound and lower bound BinaryPredicate through BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down or rewritten into other form by another expression rewrite rule. However, the selectivity of BetweenPredicate or its derivatives remains unassigned and often collapses with other unknown selectivity predicates to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1). This patch adds a narrow optimization of BetweenPredicate selectivity when the following criteria are met: 1. The BetweenPredicate is bound to a slot reference of a single column of a table. 2. The column type is discrete, such as INTEGER or DATE. 3. The column stats are available. 4. The column is sufficiently unique based on available stats. 5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value <= upper bound value). 6. The final calculated selectivity is less than or equal to Expr.DEFAULT_SELECTIVITY. If these criteria are unmet, the Planner will revert to the old behavior, which is letting the selectivity unassigned. Since this patch only target BetweenPredicate over unique column, the following query will still have the default scan selectivity (0.1): select count(*) from tpch.customer c where c.c_custkey >= 1234 and c.c_custkey <= 2345; While this equivalent query written with BETWEEN predicate will have lower scan selectivity: select count(*) from tpch.customer c where c.c_custkey between 1234 and 2345; This patch calculates the BetweenPredicate selectivity during transformation at BetweenToCompoundRule.java. The selectivity is piggy-backed into the resulting CompoundPredicate and BinaryPredicate as betweenSelectivity_ field, separate from the selectivity_ field. Analyzer.getBoundPredicates() is modified to prioritize the derived BinaryPredicate over ordinary BinaryPredicate in its return value to prevent the derived BinaryPredicate from being eliminated by a matching ordinary BinaryPredicate. Testing: - Add table functional_parquet.unique_with_nulls. - Add FE tests in ExprCardinalityTest#testBetweenSelectivity, ExprCardinalityTest#testNotBetweenSelectivity, and PlannerTest#testScanCardinality. - Pass core tests. Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Reviewed-on: http://gerrit.cloudera.org:8080/21377 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q40.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q77.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q92.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q95.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test 27 files changed, 3,908 insertions(+), 3,502 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]>
