Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21330#discussion_r212800158
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1883,7 +1883,19 @@ class Analyzer(
// Second, we group extractedWindowExprBuffer based on their
Partition and Order Specs.
val groupedWindowExpressions = extractedWindowExprBuffer.groupBy {
expr =>
val distinctWindowSpec = expr.collect {
- case window: WindowExpression => window.windowSpec
+ case window: WindowExpression =>
+ val winExpr = window.windowFunction
+ val distinctOpt = winExpr.find (expr =>
expr.isInstanceOf[AggregateExpression]
+ && expr.asInstanceOf[AggregateExpression].isDistinct)
+ if (distinctOpt.nonEmpty &&
window.windowSpec.orderSpec.nonEmpty) {
+ failAnalysis(s"ORDER BY cannot be used with DISTINCT:
$window")
--- End diff --
Just out of curiosity, does hive have the same limitation? If so, the
current way, roughly ordered rows and checking previous row for distinct
windows makes sense to me.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]