rdtr opened a new pull request, #56180:
URL: https://github.com/apache/spark/pull/56180
### What changes were proposed in this pull request?
`LimitPushDown` in `Optimizer.scala` rewrites `LocalLimit(le, Offset(oe,
child))` to
`Offset(oe, LocalLimit(Add(le, oe), child))`, leaving the `Add` unfolded.
The rule
relies on a subsequent `ConstantFolding` pass to turn `Add(Literal(N),
Literal(M))`
into `Literal(N + M)` so that `BasicOperators` (which only matches
`LocalLimit(IntegerLiteral, _)`) can produce a physical plan.
This patch folds the sum eagerly when both operands are integer literals —
the
realistic case from `LIMIT N OFFSET M` clauses — so the rule produces a
planable
logical plan in a single application, without depending on a downstream rule.
```scala
case LocalLimit(le, Offset(oe, grandChild)) =>
val mergedLimit = (le, oe) match {
case (IntegerLiteral(l), IntegerLiteral(o)) => Literal(l + o,
IntegerType)
case _ => Add(le, oe)
}
Offset(oe, LocalLimit(mergedLimit, grandChild))
```
### Why are the changes needed?
If `ConstantFolding` is excluded via `spark.sql.optimizer.excludedRules`, the
unfolded `LocalLimit(Add(Literal(N), Literal(M)), ...)` reaches physical
planning. `BasicOperators` only matches `LocalLimit(IntegerLiteral, _)`, so
planning fails with:
```
java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
at scala.Predef$.assert(Predef.scala:279)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
...
```
wrapped as `[INTERNAL_ERROR] The Spark SQL phase planning failed with an
internal error`.
Repro (Scala):
```scala
val spark = SparkSession.builder().master("local[2]")
.config("spark.sql.optimizer.excludedRules",
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
.getOrCreate()
spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES
(10,'d1'),(20,'d2'),(30,'d3') AS t(id, name)")
spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS t(id,
dept_id)")
spark.sql("""
SELECT * FROM emp
WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
""").show()
```
Self-sufficient rules also reduce fragility for downstream consumers —
custom optimizer pipelines and plugins that legitimately exclude folding rules
(for example to force certain code paths during testing) shouldn't have queries
that
crash at physical planning when a logically-equivalent plan exists.
### Scope and follow-up
This patch fixes the literal-only case `LIMIT N OFFSET M`, which is what the
SQL parser produces for the common usage.
It does **not** fix the case where a user writes literal arithmetic in the
SQL,
e.g. `LIMIT 3 - 1 OFFSET 5 - 3`, because:
- The optimizer rule's input expressions are themselves unfolded
(`Subtract(Literal(3), Literal(1))`, not `Literal(2)`).
- `BasicOperators` patterns for `LocalLimit` / `GlobalLimit` / `Offset` (and
the
composite extractors `OffsetAndLimit` / `LimitAndOffset`) all match
`IntegerLiteral` only, so the unfolded `Subtract` and the resulting
`Offset(Subtract, ...)` still wouldn't plan.
Fixing that broader case cleanly requires teaching `BasicOperators` to
evaluate any foldable `IntegerType` expression at planning time (5 patterns: 3
simple + 2 composite extractors). I'd be happy to follow up with that change in
a separate PR if reviewers feel the broader fix is appropriate. For now this PR
targets the narrow, realistic case.
### Does this PR introduce *any* user-facing change?
No. Default config behavior is unchanged because `ConstantFolding` would
fold the `Add` anyway. Only queries that previously crashed when
`OPTIMIZER_EXCLUDED_RULES` included `ConstantFolding` now succeed.
### How was this patch tested?
Added a new test in `LimitPushdownSuite` that runs only `LimitPushDown` (no
subsequent `ConstantFolding`) and verifies the output limit is already folded
to a literal. Verified the test fails on master and passes with this fix.
Existing `LimitPushdownSuite` tests continue to pass (the existing "Push
down limit 1 through Offset" test exercises the full pipeline including
`ConstantFolding` and remains unchanged in output).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]