peter-toth opened a new pull request, #56570: URL: https://github.com/apache/spark/pull/56570
### What changes were proposed in this pull request? `ColumnNodeToExpressionConverter` (sql/core .../classic/columnNodeSupport.scala) converts an expression-backed `Column` (an `ExpressionColumnNode`) into a Catalyst `Expression` and, if the wrapped expression is an `AggregateFunction`, wraps it in an `AggregateExpression` via `toAggregateExpression()`. Because `AggregateWindowFunction extends DeclarativeAggregate (which extends AggregateFunction) with WindowFunction`, a window function also matched this branch and got wrapped, producing `WindowExpression(AggregateExpression(windowFunc), spec)` instead of `WindowExpression(windowFunc, spec)`. This PR adds a guard so that an `AggregateFunction` that is also a `WindowFunction` is left untouched and used directly as the child of the `WindowExpression`. Regular aggregate functions used as window functions (e.g. `sum`) are unaffected because wrapping them in an `AggregateExpression` is correct. ### Why are the changes needed? Wrapping a window function in an `AggregateExpression` makes analysis fail: `CheckAnalysis` sees an `AggregateExpression` whose child is a `WindowFunction` but which is not itself a `WindowExpression`, and throws `WINDOW_FUNCTION_WITHOUT_OVER_CLAUSE`. As a result, a user-defined `AggregateWindowFunction` (or any built-in window function expression) wrapped into a `Column` through the `ClassicConversions` / `ColumnConversions` DeveloperApi and used with `over(...)` could not be analyzed. This blocks extension developers from plugging custom window functions in via the Column API, a pattern that worked before the Column API was decoupled from Catalyst. ### Does this PR introduce _any_ user-facing change? Yes. Before this change, wrapping a window function expression into a `Column` and calling `.over(window)` failed analysis with `WINDOW_FUNCTION_WITHOUT_OVER_CLAUSE`; now it analyzes and executes correctly, matching the behavior of the equivalent by-name window function. This is a fix within the unreleased master branch relative to the existing Column/Catalyst conversion behavior. ### How was this patch tested? Added two regression tests to `DataFrameWindowFunctionsSuite`: one wraps the built-in `RowNumber()` expression into a `Column` and checks it matches by-name `row_number()`, and one defines a minimal custom `AggregateWindowFunction` (`NonNullRunningCount`) and checks it produces correct results when used with `over(...)`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
