Angryrou commented on code in PR #48649:
URL: https://github.com/apache/spark/pull/48649#discussion_r1836865790
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -1260,8 +1264,32 @@ class AstBuilder extends DataTypeAstBuilder
def createProject() = if (namedExpressions.nonEmpty) {
val newProjectList: Seq[NamedExpression] = if (isPipeOperatorSelect) {
- // If this is a pipe operator |> SELECT clause, add a [[PipeSelect]]
expression wrapping
- // each alias in the project list, so the analyzer can check
invariants later.
+ // If this is a pipe operator |> SELECT clause,
+ // (1) validate all the window references after OVER are valid, and
+ val windowDefs = Option(windowClause)
+ .map(_.namedWindow.asScala.map(_.name.getText).toSet)
+ .getOrElse(collection.immutable.Set.empty[String])
+ // Collect all window names from UnresolvedWindowExpressions
+ val unresolvedWindowNames = namedExpressions.collect {
+ case Alias(wExpr: UnresolvedWindowExpression, _) =>
wExpr.windowSpec.name
+ case UnresolvedAlias(wExpr: UnresolvedWindowExpression, _) =>
wExpr.windowSpec.name
+ }
+ if (unresolvedWindowNames.nonEmpty) {
+ if (windowDefs.isEmpty) {
+ // No window definitions provided, throw error for the first
unresolved window
+ throw
QueryParsingErrors.cannotFindWindowReferenceError(unresolvedWindowNames.head,
ctx)
+ } else {
+ // Find any unresolved window names not defined in windowDefs
+ unresolvedWindowNames.find(!windowDefs.contains(_)) match {
Review Comment:
Differences @cloud-fan
1. with the pipe syntax, the `unresolvedwindowexpression` will be wrapped by
an additional `pipeselect` expression
2. the classic syntax has an auto-generated subquery name for the subquery
In the pipe syntax, the WINDOW clause in the 2nd `|> SELECT` leads to a
WithWindowDefinition for `w` at the top of the plan tree. The 1st `|> SELECT`
seems to be able to get the information as well.
In the classic syntax, the outer query defines `w` and gets a
`WithWindowDefinition` at the top of the tree as well. So the inner subquery is
able to get the window definition.
**SQLs**
```sql
-- pipe syntax
table windowTestData
|> select cate, val, first_value(cate) over w as first_val
|> select cate, val, sum(val) over w as sum_val
window w as (order by val);
-- classic syntax
select cate, val, sum(val) over w as sum_val
from (
select cate, val, first_value(cate) over w as first_val
from windowTestData
)
window w as (order by val);
```
**Parsed Logical Plans** (`parser.parsePlan(...)`)
```sql
-- from pipe syntax
'WithWindowDefinition [w=windowspecdefinition('val ASC NULLS FIRST,
unspecifiedframe$())]
+- 'Project ['cate, 'val, pipeselect(unresolvedwindowexpression('sum('val),
WindowSpecReference(w))) AS sum_val#3]
+- 'Project ['cate, 'val,
pipeselect(unresolvedwindowexpression('first_value('cate),
WindowSpecReference(w))) AS first_val#2]
+- 'UnresolvedRelation [windowTestData], [], false
-- from classic syntax
'WithWindowDefinition [w=windowspecdefinition('val ASC NULLS FIRST,
unspecifiedframe$())]
+- 'Project ['cate, 'val, unresolvedwindowexpression('sum('val),
WindowSpecReference(w)) AS sum_val#1]
+- 'SubqueryAlias __auto_generated_subquery_name
+- 'Project ['cate, 'val,
unresolvedwindowexpression('first_value('cate), WindowSpecReference(w)) AS
first_val#0]
+- 'UnresolvedRelation [windowTestData], [], false
```
**Analyzed Logical Plans**
https://docs.google.com/document/d/1qa1jUAoWa0aS5037oazydJRYQSAFFklUlPmnwLxOZDM/edit?usp=sharing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]