AngersZhuuuu commented on a change in pull request #27861:
URL: https://github.com/apache/spark/pull/27861#discussion_r413465498
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends
SqlBaseBaseVisitor[AnyRef] with Logging
override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition =
withOrigin(ctx) {
// CLUSTER BY ... | PARTITION BY ... ORDER BY ...
val partition = ctx.partition.asScala.map(expression)
- val order = ctx.sortItem.asScala.map(visitSortItem)
+ val order = if (ctx.sortItem.asScala.nonEmpty) {
+ ctx.sortItem.asScala.map(visitSortItem)
+ } else if (ctx.windowFrame != null &&
+ ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) {
+ // for RANGE window frame, we won't add default order spec
+ ctx.sortItem.asScala.map(visitSortItem)
+ } else {
+ // Same default behaviors like hive, when order spec is null
+ // set partition spec expression as order spec
+ ctx.partition.asScala.map { expr =>
+ SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering,
Set.empty)
Review comment:
> But the results will be useless. When can it be useful if the order is
indeterministic for the functions dependent on the order .. ?
In postgre sql , if we don't specify order column, the result is according
to partition column 's default sort order.
```
angerszhu=# explain analyze verbose select id, num, lead(id) over (partition
by num) from s4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=158.51..198.06 rows=2260 width=12) (actual
time=0.107..0.122 rows=6 loops=1)
Output: id, num, lead(id) OVER (?)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual
time=0.079..0.081 rows=6 loops=1)
Output: num, id
Sort Key: s4.num
Sort Method: quicksort Memory: 25kB
-> Seq Scan on public.s4 (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.057..0.061 rows=6 loops=1)
Output: num, id
Planning Time: 0.114 ms
Execution Time: 0.214 ms
angerszhu=# explain analyze verbose select id, num, lead(id) over (partition
by num order by id) from s4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=158.51..203.71 rows=2260 width=12) (actual
time=0.976..1.017 rows=6 loops=1)
Output: id, num, lead(id) OVER (?)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual
time=0.067..0.070 rows=6 loops=1)
Output: id, num
Sort Key: s4.num, s4.id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on public.s4 (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.042..0.045 rows=6 loops=1)
Output: id, num
Planning Time: 0.155 ms
Execution Time: 1.208 ms
(10 rows)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]