peter-toth opened a new pull request, #56570:
URL: https://github.com/apache/spark/pull/56570

   ### What changes were proposed in this pull request?
   
   `ColumnNodeToExpressionConverter` (sql/core 
.../classic/columnNodeSupport.scala) converts an expression-backed `Column` (an 
`ExpressionColumnNode`) into a Catalyst `Expression` and, if the wrapped 
expression is an `AggregateFunction`, wraps it in an `AggregateExpression` via 
`toAggregateExpression()`. Because `AggregateWindowFunction extends 
DeclarativeAggregate (which extends AggregateFunction) with WindowFunction`, a 
window function also matched this branch and got wrapped, producing 
`WindowExpression(AggregateExpression(windowFunc), spec)` instead of 
`WindowExpression(windowFunc, spec)`. This PR adds a guard so that an 
`AggregateFunction` that is also a `WindowFunction` is left untouched and used 
directly as the child of the `WindowExpression`. Regular aggregate functions 
used as window functions (e.g. `sum`) are unaffected because wrapping them in 
an `AggregateExpression` is correct.
   
   ### Why are the changes needed?
   
   Wrapping a window function in an `AggregateExpression` makes analysis fail: 
`CheckAnalysis` sees an `AggregateExpression` whose child is a `WindowFunction` 
but which is not itself a `WindowExpression`, and throws 
`WINDOW_FUNCTION_WITHOUT_OVER_CLAUSE`. As a result, a user-defined 
`AggregateWindowFunction` (or any built-in window function expression) wrapped 
into a `Column` through the `ClassicConversions` / `ColumnConversions` 
DeveloperApi and used with `over(...)` could not be analyzed. This blocks 
extension developers from plugging custom window functions in via the Column 
API, a pattern that worked before the Column API was decoupled from Catalyst.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Before this change, wrapping a window function expression into a 
`Column` and calling `.over(window)` failed analysis with 
`WINDOW_FUNCTION_WITHOUT_OVER_CLAUSE`; now it analyzes and executes correctly, 
matching the behavior of the equivalent by-name window function. This is a fix 
within the unreleased master branch relative to the existing Column/Catalyst 
conversion behavior.
   
   ### How was this patch tested?
   
   Added two regression tests to `DataFrameWindowFunctionsSuite`: one wraps the 
built-in `RowNumber()` expression into a `Column` and checks it matches by-name 
`row_number()`, and one defines a minimal custom `AggregateWindowFunction` 
(`NonNullRunningCount`) and checks it produces correct results when used with 
`over(...)`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.8
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to