weixiuli opened a new pull request #35605:
URL: https://github.com/apache/spark/pull/35605
### What changes were proposed in this pull request?
A query with Rank windows will fail if there is no order specified in the
windows , we add the partition expressions as order expressions.
Hive has done that:
https://github.com/apache/hive/blob/f15de94c617c4566c87293479463cd90437beed5/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java#L494-L503
```
/*
* When there is no Order specified, we add the Partition expressions as
* Order expressions. This is an implementation artifact. For UDAFS that
* imply order (like rank, dense_rank) depend on the Order Expressions to
* work. Internally we pass the Order Expressions as Args to these
functions.
* We could change the translation so that the Functions are setup with
* Partition expressions when the OrderSpec is null; but for now we are
setting up
* an OrderSpec that copies the Partition expressions.
*/
protected void ensureOrderSpec(WindowFunctionSpec wFn) throws
SemanticException {
```
### Why are the changes needed?
A query example:
` sql(”SELECT a, b, Rank(b) OVER (PARTITION BY a ) FROM VALUES ('A1', 2),
('A1', 1), ('A2', 3), ('A1', 1) as tbl(a,b)").show`
before :
```log
Window function rank(b#12) requires window to be ordered, please add ORDER
BY clause. For example SELECT rank(b#12)(value_expr) OVER (PARTITION BY
window_partition ORDER BY window_ordering) from table
org.apache.spark.sql.AnalysisException: Window function rank(b#12) requires
window to be ordered, please add ORDER BY clause. For example SELECT
rank(b#12)(value_expr) OVER (PARTITION BY window_partition ORDER BY
window_ordering) from table
at
org.apache.spark.sql.errors.QueryCompilationErrors$.windowFunctionWithWindowFrameNotOrderedError(QueryCompilationErrors.scala:380)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder$$anonfun$apply$49.applyOrElse(Analyzer.scala:3263)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder$$anonfun$apply$49.applyOrElse(Analyzer.scala:3260)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1125)
at
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1124)
at
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:503)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:159)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211)
at
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216)
```
after :
```log
+---+---+-----------------------------------------------------------------------------+
| a| b|RANK() OVER (PARTITION BY a ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW)|
+---+---+-----------------------------------------------------------------------------+
| A1| 2|
1|
| A1| 1|
1|
| A1| 1|
1|
| A2| 3|
1|
+---+---+-----------------------------------------------------------------------------+
```
### Does this PR introduce _any_ user-facing change?
Yes, the users can use the Rank windows without order by.
### How was this patch tested?
Add unittests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]