wangsheng has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18023
Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities ...................................................................... IMPALA-7942: Add query hints for cardinalities and selectivities Currently, Impala only use simple estimation to compute selectivity for some predicates, other predicates, and this maybe lead to worse query plan due to CBO. Hence, we add new hints to set these stats manually in query to help us get better CBO. Maybe in the future, we can use histograms to get more precise query plan. This patch adds two query hints: 'HDFS_NUM_ROWS' and 'SELECTIVITY'. We can add 'HDFS_NUM_ROWS' after a hdfs table in query like this: select col from t /* +HDFS_NUM_ROWS(1000) */; If set, Impala will use this value as table scanned rows. But this hint value only valid when table does not have stats or stats is corrupt. Otherwise, Impala will use table original stats. For 'SELECTIVITY' hint, we can use in these predicates: * BinaryPredicate * InPredicate * IsNullPredicate * LikePredicate, including 'not like' syntax * BetweenPredicate, including 'not between and' syntax Format like this: select col from t where a=1 /* +SELECTIVITY(0.5) */; This value will replace original selectivity computing. These format are not allowed: select col from t where (a=1) /* +SELECTIVITY(0.5) */; select col from t where (a=1 and b<2) /* +SELECTIVITY(0.5) */; select col from t1 where exists (...) /* +SELECTIVITY(0.5) */; Pay attention, if you set selectivity hint like this: select col from t where (a=1 /* +SELECTIVITY(0.5) */ and b>2); Impala will set 0.5 for first binary predicate, second is -1, so Impala can not compute this predicate.The whole compound predicate selectivity is still unavailable. Hence, for compound predicate, we need ensure that each child selectivity is been set by hint or computable. Otherwise, this hint maybe does not take effect as you expected. Another thing, for 'BetweenPredicate', Impala will transfom this predicate to a 'CompoundPredicate' with two 'BinaryPredicate', if set hint for 'BetweenPredicate' in query, we will split this hint value for two 'BinaryPredicate' children. Testing: - Added new fe tests in 'PlannerTest' - Added new fe tests in 'AnalyzeStmtsTest' for negative cases Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test A testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test 13 files changed, 1,445 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/18023/1 -- To view, visit http://gerrit.cloudera.org:8080/18023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b Gerrit-Change-Number: 18023 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng <[email protected]>
