vranes opened a new pull request, #56545:
URL: https://github.com/apache/spark/pull/56545

   ### What changes were proposed in this pull request?
   
   This PR adds single-pass analyzer (resolver) support for the `BIN BY` 
relation operator. Today `UnresolvedBinBy` is not in the `ResolverGuard` 
allowlist, so it falls through to the legacy fixed-point analyzer; this wires 
it into the single-pass resolver so a `BIN BY` query resolves to a `BinBy` node 
identical to the one the fixed-point `ResolveBinBy` rule produces. It builds on 
the merged parsing/resolution work (#56426) and is analyzer-routing only (no 
execution change).
   
   - New `BinByResolver`: resolves the child, the range/distribute column 
references, and the bin-width/origin expressions, then validates and folds them.
   - New `BinByResolution`: the validation/fold logic (type and foldability 
checks, microsecond folding, default origin, session-zone capture) extracted 
from `ResolveBinBy` into a shared object, so both analyzers produce identical 
results without one invoking the other.
   - `ResolverGuard`: allowlist arm plus `checkBinBy`, so `BIN BY` routes into 
single-pass.
   - `Resolver`: dispatch arm to `BinByResolver`.
   - `ResolutionValidator`: a `BinBy` validation arm (the validator rejects 
unknown operators, so a new node needs one).
   - `ResolveBinBy`: delegates its validation to the shared `BinByResolution`.
   - A missing `BIN BY` column now reports the standard 
`UNRESOLVED_COLUMN.WITH_SUGGESTION` instead of the bespoke 
`BIN_BY_COLUMN_NOT_FOUND`, which is removed. The top-level-only check keeps its 
dedicated `BIN_BY_REQUIRES_TOP_LEVEL_COLUMN`.
   
   ### Why are the changes needed?
   
   The single-pass resolver is the upstream analyzer replacing the legacy 
fixed-point analyzer; every operator left on the `ResolverGuard` fallback keeps 
the legacy analyzer alive for that operator. Wiring `BIN BY` into single-pass 
lets its queries resolve without falling back. Reporting a missing column with 
the standard `UNRESOLVED_COLUMN` (rather than a `BIN BY`-specific error) 
matches how other operators such as `UNPIVOT` and `SORT` behave, gives the user 
column suggestions, and avoids hand-written single-pass code to reproduce a 
non-standard error.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but only on unreleased `master`, and only when the operator is 
explicitly enabled (`BIN BY` is off by default via 
`spark.sql.binByRelationOperator.enabled` and is not in any released version). 
A `BIN BY` query referencing a missing column previously raised 
`BIN_BY_COLUMN_NOT_FOUND` ("The column `<c>` referenced in BIN BY was not found 
in the input relation."); it now raises `UNRESOLVED_COLUMN.WITH_SUGGESTION` ("A 
column ... with name `<c>` cannot be resolved. Did you mean ...?"). The 
resolved plan and all other behavior are unchanged.
   
   ### How was this patch tested?
   
   - `ResolverGuardSuite`: routing test confirming `BIN BY` is accepted into 
single-pass and resolves through the full path including `ResolutionValidator`.
   - `BinBySuite` (sql/core): dual-run parity tests 
(`spark.sql.analyzer.singlePassResolver.dualRunWithLegacy=true`) for LTZ and 
NTZ inputs, which compare the single-pass and fixed-point resolved plans and 
fail on any mismatch; a dual-run rejects-invalid test for a missing column 
(`UNRESOLVED_COLUMN`) and a nested column (`BIN_BY_REQUIRES_TOP_LEVEL_COLUMN`); 
the disabled-operator test pins the flag off explicitly.
   - `ResolveBinBySuite` (catalyst): existing fixed-point resolution and error 
tests, updated for the `UNRESOLVED_COLUMN` change and explicit flag pinning.
   - `SparkThrowableSuite`: validates the error-condition catalog after 
removing `BIN_BY_COLUMN_NOT_FOUND`.
   - scalastyle passes on the changed modules.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes. Generated-by: Claude Code (Claude Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to