cloud-fan opened a new pull request, #56604: URL: https://github.com/apache/spark/pull/56604
### What changes were proposed in this pull request? A follow-up of the `FilterExec` whole-stage-codegen subexpression elimination (CSE) work. `FilterExec` takes the CSE codegen path whenever `otherPreds` contain a common subexpression, where "common subexpression" is anything `EquivalentExpressions` counts more than once -- which includes bare leaf columns. `c BETWEEN lo AND hi` lowers to `c >= lo AND c <= hi`, so any `BETWEEN` (or a column referenced in several conjuncts) makes that column a common subexpression. Taking the CSE path then emits the eager `inputVarsEvalCode` prologue, which evaluates **every** column referenced by `otherPreds` at the top of the per-row loop. Caching a bare column load gains nothing -- the non-CSE path already loads each column lazily into a variable on demand -- so when the only common subexpressions are leaves, the prologue is pure overhead that defeats short-circuiting. This PR requires a **non-leaf** common subexpression before taking the CSE path. Filters with a genuine repeated computation (e.g. `a + b`) are unaffected and still benefit from CSE. ### Why are the changes needed? TPC-DS q28 filters as `ss_quantity BETWEEN ... AND (ss_list_price BETWEEN ... OR ss_coupon_amt BETWEEN ... OR ss_wholesale_cost BETWEEN ...)`. Its only repeated expressions are the bare columns, so the gate wrongly took the CSE path and eagerly decoded the high-precision decimal columns on every row -- including rows the cheap `ss_quantity` integer predicate would have rejected -- allocating a `BigInteger`/`BigDecimal` per decoded decimal. On a 3TB run this showed up as a ~40% slowdown on q28, which this change removes (the filter falls back to the lazy, short-circuiting path). ### Does this PR introduce _any_ user-facing change? No. This is a codegen-only change; query results are unchanged. ### How was this patch tested? New unit test in `WholeStageCodegenSuite` asserting that, for the q28 `BETWEEN` shape (whose only common subexpressions are leaf columns), CSE-enabled generated code is identical to CSE-disabled code -- i.e. it falls back to the lazy, short-circuiting non-CSE path. The existing `FilterExec` CSE tests, which use genuine non-leaf common subexpressions, still exercise the CSE path and pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Claude Code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
