wangsheng has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18023


Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities
......................................................................

IMPALA-7942: Add query hints for cardinalities and selectivities

Currently, Impala only use simple estimation to compute selectivity
for some predicates, other predicates, and this maybe lead to worse
query plan due to CBO. Hence, we add new hints to set these stats
manually in query to help us get better CBO. Maybe in the future,
we can use histograms to get more precise query plan.

This patch adds two query hints: 'HDFS_NUM_ROWS' and 'SELECTIVITY'.
We can add 'HDFS_NUM_ROWS' after a hdfs table in query like this:

  select col from t /* +HDFS_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows. But this
hint value only valid when table does not have stats or stats is corrupt.
Otherwise, Impala will use table original stats.

For 'SELECTIVITY' hint, we can use in these predicates:
* BinaryPredicate
* InPredicate
* IsNullPredicate
* LikePredicate, including 'not like' syntax
* BetweenPredicate, including 'not between and' syntax
Format like this:

  select col from t where a=1 /* +SELECTIVITY(0.5) */;

This value will replace original selectivity computing. These format
are not allowed:

  select col from t where (a=1) /* +SELECTIVITY(0.5) */;
  select col from t where (a=1 and b<2) /* +SELECTIVITY(0.5) */;
  select col from t1 where exists (...) /* +SELECTIVITY(0.5) */;

Pay attention, if you set selectivity hint like this:

  select col from t where (a=1 /* +SELECTIVITY(0.5) */ and b>2);

Impala will set 0.5 for first binary predicate, second is -1, so
Impala can not compute this predicate.The whole compound predicate
selectivity is still unavailable. Hence, for compound predicate, we
need ensure that each child selectivity is been set by hint or
computable. Otherwise, this hint maybe does not take effect as you
expected.
Another thing, for 'BetweenPredicate', Impala will transfom this
predicate to a 'CompoundPredicate' with two 'BinaryPredicate', if
set hint for 'BetweenPredicate' in query, we will split this hint
value for two 'BinaryPredicate' children.

Testing:
- Added new fe tests in 'PlannerTest'
- Added new fe tests in 'AnalyzeStmtsTest' for negative cases

Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test
13 files changed, 1,445 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/18023/1
--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng <[email protected]>

Reply via email to