[GitHub] [spark] weixiuli opened a new pull request #35605: [SPARK-38280][SQL] The Rank windows to be ordered is not necessary in a query.

GitBox Mon, 21 Feb 2022 23:35:36 -0800


weixiuli opened a new pull request #35605:
URL: https://github.com/apache/spark/pull/35605



   
   ### What changes were proposed in this pull request?
   
    A query with Rank windows will fail if there is no order specified in the 
windows , we add the partition expressions as order expressions.
   
   Hive has done that:
   
https://github.com/apache/hive/blob/f15de94c617c4566c87293479463cd90437beed5/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java#L494-L503
   
   ```
       /*
        * When there is no Order specified, we add the Partition expressions as
        * Order expressions. This is an implementation artifact. For UDAFS that
        * imply order (like rank, dense_rank) depend on the Order Expressions to
        * work. Internally we pass the Order Expressions as Args to these 
functions.
        * We could change the translation so that the Functions are setup with
        * Partition expressions when the OrderSpec is null; but for now we are 
setting up
        * an OrderSpec that copies the Partition expressions.
        */
       protected void ensureOrderSpec(WindowFunctionSpec wFn) throws 
SemanticException {
   ```
   ### Why are the changes needed?
   
   A query example：
   
   ` sql(”SELECT a, b, Rank(b) OVER (PARTITION BY a ) FROM  VALUES ('A1', 2), 
('A1', 1), ('A2', 3), ('A1', 1) as tbl(a,b)").show`
   
   before ：
   
   ```log
   Window function rank(b#12) requires window to be ordered, please add ORDER 
BY clause. For example SELECT rank(b#12)(value_expr) OVER (PARTITION BY 
window_partition ORDER BY window_ordering) from table
   org.apache.spark.sql.AnalysisException: Window function rank(b#12) requires 
window to be ordered, please add ORDER BY clause. For example SELECT 
rank(b#12)(value_expr) OVER (PARTITION BY window_partition ORDER BY 
window_ordering) from table
        at 
org.apache.spark.sql.errors.QueryCompilationErrors$.windowFunctionWithWindowFrameNotOrderedError(QueryCompilationErrors.scala:380)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder$$anonfun$apply$49.applyOrElse(Analyzer.scala:3263)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder$$anonfun$apply$49.applyOrElse(Analyzer.scala:3260)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1125)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1124)
        at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:503)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:159)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216)
   ```
   
   after ：
   ```log
   
+---+---+-----------------------------------------------------------------------------+
   |  a|  b|RANK() OVER (PARTITION BY a ROWS BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW)|
   
+---+---+-----------------------------------------------------------------------------+
   | A1|  2|                                                                    
        1|
   | A1|  1|                                                                    
        1|
   | A1|  1|                                                                    
        1|
   | A2|  3|                                                                    
        1|
   
+---+---+-----------------------------------------------------------------------------+
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the users can use the Rank windows without order by.
   
   ### How was this patch tested?
   Add unittests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] weixiuli opened a new pull request #35605: [SPARK-38280][SQL] The Rank windows to be ordered is not necessary in a query.

Reply via email to