rdtr opened a new pull request, #56180:
URL: https://github.com/apache/spark/pull/56180

   ### What changes were proposed in this pull request?
   
   `LimitPushDown` in `Optimizer.scala` rewrites `LocalLimit(le, Offset(oe, 
child))` to
   `Offset(oe, LocalLimit(Add(le, oe), child))`, leaving the `Add` unfolded. 
The rule
   relies on a subsequent `ConstantFolding` pass to turn `Add(Literal(N), 
Literal(M))`
   into `Literal(N + M)` so that `BasicOperators` (which only matches
   `LocalLimit(IntegerLiteral, _)`) can produce a physical plan.
   
   This patch folds the sum eagerly when both operands are integer literals — 
the
   realistic case from `LIMIT N OFFSET M` clauses — so the rule produces a 
planable
   logical plan in a single application, without depending on a downstream rule.
   
   ```scala
   case LocalLimit(le, Offset(oe, grandChild)) =>
     val mergedLimit = (le, oe) match {
       case (IntegerLiteral(l), IntegerLiteral(o)) => Literal(l + o, 
IntegerType)
       case _ => Add(le, oe)
     }
     Offset(oe, LocalLimit(mergedLimit, grandChild))
   ```
   
   ### Why are the changes needed?
   
   If `ConstantFolding` is excluded via `spark.sql.optimizer.excludedRules`, the
   unfolded `LocalLimit(Add(Literal(N), Literal(M)), ...)` reaches physical
   planning. `BasicOperators` only matches `LocalLimit(IntegerLiteral, _)`, so
   planning fails with:
   
   ```
   java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
     at scala.Predef$.assert(Predef.scala:279)
     at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
     at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
     ...
   ```
   
   wrapped as `[INTERNAL_ERROR] The Spark SQL phase planning failed with an 
internal error`.
   
   Repro (Scala):
   
   ```scala
   val spark = SparkSession.builder().master("local[2]")
     .config("spark.sql.optimizer.excludedRules",
       "org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
     .getOrCreate()
   spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES 
(10,'d1'),(20,'d2'),(30,'d3') AS t(id, name)")
   spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS t(id, 
dept_id)")
   spark.sql("""
     SELECT * FROM emp
     WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
   """).show()
   ```
   
   Self-sufficient rules also reduce fragility for downstream consumers — 
custom optimizer pipelines and plugins that legitimately exclude folding rules 
(for example to force certain code paths during testing) shouldn't have queries 
that
   crash at physical planning when a logically-equivalent plan exists.
   
   ### Scope and follow-up
   
   This patch fixes the literal-only case `LIMIT N OFFSET M`, which is what the 
SQL parser produces for the common usage.
   
   It does **not** fix the case where a user writes literal arithmetic in the 
SQL,
   e.g. `LIMIT 3 - 1 OFFSET 5 - 3`, because:
   - The optimizer rule's input expressions are themselves unfolded
     (`Subtract(Literal(3), Literal(1))`, not `Literal(2)`).
   - `BasicOperators` patterns for `LocalLimit` / `GlobalLimit` / `Offset` (and 
the
     composite extractors `OffsetAndLimit` / `LimitAndOffset`) all match
     `IntegerLiteral` only, so the unfolded `Subtract` and the resulting
     `Offset(Subtract, ...)` still wouldn't plan.
   
   Fixing that broader case cleanly requires teaching `BasicOperators` to 
evaluate any foldable `IntegerType` expression at planning time (5 patterns: 3 
simple + 2 composite extractors). I'd be happy to follow up with that change in 
a separate PR if reviewers feel the broader fix is appropriate. For now this PR 
targets the narrow, realistic case.
   
   ### Does this PR introduce *any* user-facing change?
   
   No. Default config behavior is unchanged because `ConstantFolding` would 
fold the `Add` anyway. Only queries that previously crashed when 
`OPTIMIZER_EXCLUDED_RULES` included `ConstantFolding` now succeed.
   
   ### How was this patch tested?
   
   Added a new test in `LimitPushdownSuite` that runs only `LimitPushDown` (no 
subsequent `ConstantFolding`) and verifies the output limit is already folded 
to a literal. Verified the test fails on master and passes with this fix.
   
   Existing `LimitPushdownSuite` tests continue to pass (the existing "Push 
down limit 1 through Offset" test exercises the full pipeline including 
`ConstantFolding` and remains unchanged in output).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to