dtenedor commented on code in PR #48529:
URL: https://github.com/apache/spark/pull/48529#discussion_r1813268872
##########
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##########
@@ -765,7 +765,7 @@ temporalClause
aggregationClause
: GROUP BY groupingExpressionsWithGroupingAnalytics+=groupByClause
(COMMA groupingExpressionsWithGroupingAnalytics+=groupByClause)*
- | GROUP BY groupingExpressions+=expression (COMMA
groupingExpressions+=expression)* (
+ | GROUP BY groupingExpressions+=namedExpression (COMMA
groupingExpressions+=namedExpression)* (
Review Comment:
Sure, good question. So the answer is that with SQL pipe semantics, the
AGGREGATE operator's output attribute comprise all the grouping expressions
first (in order), then all the aggregate functions next (in order). This is the
specification from the original research paper [1].
The idea is that the list of expressions after the AGGREGATE keyword do not
need to include any of the grouping keys; they may only comprise aggregate
function calls. Then following operators can refer to the grouping keys
(possibly using aliases) and/or aggregate functions as needed, including just
`|> SELECT`ing a subset.
So this example:
```
| AGGREGATE max(i) GROUP BY j + 1 as i
```
would be equivalent to this using classic SQL:
```
SELECT j + 1 as i, max(i) FROM t
GROUP BY j + 1
```
<img width="905" alt="image"
src="https://github.com/user-attachments/assets/de8c0bf2-7c33-4ac7-a946-d4f93162fc71">
[1]
https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]