gengliangwang opened a new pull request, #56256:
URL: https://github.com/apache/spark/pull/56256
### What changes were proposed in this pull request?
The aggregate out-of-memory error (`AGGREGATE_OUT_OF_MEMORY`) is constructed
inline in two places:
- `HashAggregateExec`, whose whole-stage codegen emits `throw new
<SparkOutOfMemoryError>("AGGREGATE_OUT_OF_MEMORY", new java.util.HashMap());`
into every generated aggregate class.
- `TungstenAggregationIterator` (the interpreted fallback), which throws the
same `new SparkOutOfMemoryError(...)` and needs a `// scalastyle:off
throwerror` suppression.
This PR adds a `QueryExecutionErrors.aggregateOutOfMemoryError()` factory
(next to the existing `cannotAcquireMemory*` OOM factories) and routes both
call sites through it. In the codegen path the emitted Java becomes `throw
QueryExecutionErrors.aggregateOutOfMemoryError();`.
### Why are the changes needed?
Sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen).
Dumping the whole-stage codegen of the TPC-DS queries shows the inline `throw
new org.apache.spark.memory.SparkOutOfMemoryError("AGGREGATE_OUT_OF_MEMORY",
new java.util.HashMap());` line **445 times** across 142 of 150 generated
classes -- it is the single most-repeated `throw` in the corpus. Replacing it
with a factory call shrinks each generated aggregate class and moves the
error-class string and the empty message-parameter map out of every generated
class's constant pool into one compiled method. It also consolidates the error
construction shared with the interpreted path and removes the `throwerror`
scalastyle suppression there.
### Does this PR introduce _any_ user-facing change?
No. The same `AGGREGATE_OUT_OF_MEMORY` error with the same (empty) message
parameters is thrown; only where it is constructed changes.
### How was this patch tested?
This is a behavior-preserving refactor, covered by the existing aggregate
suites (e.g. `DataFrameAggregateSuite`, 163 tests, pass). The change was
additionally verified by re-dumping the TPC-DS whole-stage codegen: all 445
inline throws are now `QueryExecutionErrors.aggregateOutOfMemoryError()` calls,
and every generated subtree still compiles (the Janino default imports already
make `QueryExecutionErrors` available unqualified, as used by other generated
error calls such as `divideByZeroError`). This mirrors the sibling
`DateTimeExpressionUtils` codegen extractions, which likewise relied on
existing expression-suite coverage.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]