vranes opened a new pull request, #56545:
URL: https://github.com/apache/spark/pull/56545
### What changes were proposed in this pull request?
This PR adds single-pass analyzer (resolver) support for the `BIN BY`
relation operator. Today `UnresolvedBinBy` is not in the `ResolverGuard`
allowlist, so it falls through to the legacy fixed-point analyzer; this wires
it into the single-pass resolver so a `BIN BY` query resolves to a `BinBy` node
identical to the one the fixed-point `ResolveBinBy` rule produces. It builds on
the merged parsing/resolution work (#56426) and is analyzer-routing only (no
execution change).
- New `BinByResolver`: resolves the child, the range/distribute column
references, and the bin-width/origin expressions, then validates and folds them.
- New `BinByResolution`: the validation/fold logic (type and foldability
checks, microsecond folding, default origin, session-zone capture) extracted
from `ResolveBinBy` into a shared object, so both analyzers produce identical
results without one invoking the other.
- `ResolverGuard`: allowlist arm plus `checkBinBy`, so `BIN BY` routes into
single-pass.
- `Resolver`: dispatch arm to `BinByResolver`.
- `ResolutionValidator`: a `BinBy` validation arm (the validator rejects
unknown operators, so a new node needs one).
- `ResolveBinBy`: delegates its validation to the shared `BinByResolution`.
- A missing `BIN BY` column now reports the standard
`UNRESOLVED_COLUMN.WITH_SUGGESTION` instead of the bespoke
`BIN_BY_COLUMN_NOT_FOUND`, which is removed. The top-level-only check keeps its
dedicated `BIN_BY_REQUIRES_TOP_LEVEL_COLUMN`.
### Why are the changes needed?
The single-pass resolver is the upstream analyzer replacing the legacy
fixed-point analyzer; every operator left on the `ResolverGuard` fallback keeps
the legacy analyzer alive for that operator. Wiring `BIN BY` into single-pass
lets its queries resolve without falling back. Reporting a missing column with
the standard `UNRESOLVED_COLUMN` (rather than a `BIN BY`-specific error)
matches how other operators such as `UNPIVOT` and `SORT` behave, gives the user
column suggestions, and avoids hand-written single-pass code to reproduce a
non-standard error.
### Does this PR introduce _any_ user-facing change?
Yes, but only on unreleased `master`, and only when the operator is
explicitly enabled (`BIN BY` is off by default via
`spark.sql.binByRelationOperator.enabled` and is not in any released version).
A `BIN BY` query referencing a missing column previously raised
`BIN_BY_COLUMN_NOT_FOUND` ("The column `<c>` referenced in BIN BY was not found
in the input relation."); it now raises `UNRESOLVED_COLUMN.WITH_SUGGESTION` ("A
column ... with name `<c>` cannot be resolved. Did you mean ...?"). The
resolved plan and all other behavior are unchanged.
### How was this patch tested?
- `ResolverGuardSuite`: routing test confirming `BIN BY` is accepted into
single-pass and resolves through the full path including `ResolutionValidator`.
- `BinBySuite` (sql/core): dual-run parity tests
(`spark.sql.analyzer.singlePassResolver.dualRunWithLegacy=true`) for LTZ and
NTZ inputs, which compare the single-pass and fixed-point resolved plans and
fail on any mismatch; a dual-run rejects-invalid test for a missing column
(`UNRESOLVED_COLUMN`) and a nested column (`BIN_BY_REQUIRES_TOP_LEVEL_COLUMN`);
the disabled-operator test pins the flag off explicitly.
- `ResolveBinBySuite` (catalyst): existing fixed-point resolution and error
tests, updated for the `UNRESOLVED_COLUMN` change and explicit flag pinning.
- `SparkThrowableSuite`: validates the error-condition catalog after
removing `BIN_BY_COLUMN_NOT_FOUND`.
- scalastyle passes on the changed modules.
### Was this patch authored or co-authored using generative AI tooling?
Yes. Generated-by: Claude Code (Claude Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]