cloud-fan opened a new pull request, #56036:
URL: https://github.com/apache/spark/pull/56036
## What changes were proposed in this pull request?
Two semantic-preserving fixes to the SPARK-56627 widened-Cast peel in
`DecimalAggregates`:
1. **SUM arm**: Replace `Cast(MakeDecimal(_, p + 10, s), bounded(pPrime +
10, s))`
with `MakeDecimal(_, min(pPrime + 10, 38), s)`. The merged form's inner
`MakeDecimal` narrowed the overflow check to `10^(p+10)`, where the
un-rewritten `SUM(Cast(x, dec(pPrime, s)))` accepted up to
`10^min(pPrime+10, 38)`. For `pPrime + 10 > 18`, `Decimal.setOrNull` falls
into the BigDecimal branch and never rejects a Long, so the cleaner form
preserves the original overflow boundary for all Long-fit sums.
2. **AVG arm**: Change the guard from `p <= 7`
(`AVG_PEEL_MAX_INNER_PRECISION`) to `pPrime + 4 <= 15`
(`MAX_DOUBLE_DIGITS`). The merged guard switched `AVG(CAST(x AS
DECIMAL(pPrime, s)))`
from full Decimal arithmetic to Double-regime whenever `p <= 7`, including
the `pPrime > 11` band where un-rewritten was Decimal-exact. The new guard
ensures the rewrite only fires inside the existing AVG fast-path's Double
envelope.
## Why are the changes needed?
A query rewrite should preserve the query's observable semantics. The
SPARK-56627 rule changed semantics in two ways:
- **SUM**: Inner `MakeDecimal(_, p + 10, s)` could return null (non-ANSI) or
throw (ANSI) for long-fit sums in \`(10^(p+10), 10^min(pPrime+10, 38))\`.
Example: \`SUM(CAST(x AS DECIMAL(15, 2)))\` where \`x: DECIMAL(5, 2)\`
rejected at \`10^15\` instead of \`10^25\` -- reachable around \`~10^10\`
rows of small-precision input.
- **AVG**: For \`pPrime > 11, p <= 7\`, the rule switched an exact-Decimal
computation into Double-regime, visible as last-digit rounding differences
at any input size. TPC-DS q18 (\`p=7, pPrime=12\`) was affected.
## Does this PR introduce _any_ user-facing change?
Yes -- restores the un-rewritten semantics for affected queries:
- \`SUM(CAST(x AS DECIMAL(pPrime, s)))\` no longer rejects long-fit sums
that the un-rewritten expression would have accepted.
- \`AVG(CAST(x AS DECIMAL(pPrime, s)))\` with \`pPrime + 4 >
MAX_DOUBLE_DIGITS\`
is no longer peeled -- the un-rewritten Decimal-exact path is preserved.
TPC-DS q18 AVG aggregations revert to the un-peeled form (visible in the
regenerated \`q18\` plan-stability files).
## How was this patch tested?
- \`DecimalAggregatesSuite\`: 37/37 pass, including updated SUM arm shape
assertions, the property-test invariants for the new MakeDecimal-only
form, and the new AVG bound boundary tests.
- \`DataFrameAggregateSuite\`: SPARK-56627 numerical-equivalence PBTs pass
under the new AVG generator domain (\`pPrime in [p+1, 11]\`).
- TPC-DS q18 plan-stability files regenerated locally.
- \`DecimalAggregatesBenchmark\` AVG cases (B2, B4) updated to \`pPrime =
11\`
(the new bound). Result files left stale relative to the code change --
the \`benchmark.yml\` GHA workflow can regenerate them.
## Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]