cloud-fan commented on code in PR #56575:
URL: https://github.com/apache/spark/pull/56575#discussion_r3454653453


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala:
##########
@@ -155,6 +155,12 @@ case class AdaptiveSparkPlanExec(
   private def postStageCreationRules(outputsColumnar: Boolean) = Seq(
     ApplyColumnarRulesAndInsertTransitions(
       context.session.sessionState.columnarRules, outputsColumnar),
+    // Mirror `QueryExecution.preparations`: materialize any surviving 
`RuntimeReplaceable` after
+    // columnar/native conversion and before codegen. AQE re-optimizes each 
stage (which can
+    // re-insert a surviving `RuntimeReplaceable`, e.g. via 
`OptimizeCsvJsonExprs`), and
+    // `AdaptiveSparkPlanExec` is a `LeafExecNode` that the outer 
`preparations` rule can't reach,
+    // so the materialization must also run here.
+    MaterializeRuntimeReplaceable,

Review Comment:
   Good catch — fixed in 50cc6a1, though at a slightly different layer than 
suggested.
   
   I materialized at the **pushdown consumer** (`InMemoryTableScanExec`, 
unfolding `RuntimeReplaceable` in the `predicates` handed to `buildFilter`) 
rather than applying `MaterializeRuntimeReplaceable` to the optimized 
`InMemoryTableScanLike` before wrapping it. Reasoning:
   
   - The scan is a `LeafExecNode` that never reaches whole-stage codegen, so 
this isn't a codegen-prep concern. `MaterializeRuntimeReplaceable` sits between 
columnar conversion and `CollapseCodegenStages` specifically for codegen, which 
is exactly why the `InMemoryTableScanLike` branch skips 
`postStageCreationRules` — extending that rule into a non-codegen leaf would be 
misplaced.
   - `predicates` only feed `buildFilter` (a runtime closure that 
pattern-matches expression shape), so unfolding them at the consumer is the 
right layer. It also covers AQE and non-AQE uniformly (non-AQE was relying on 
`QueryExecution.preparations` reaching the `predicates` field; now both paths 
go through the same unfold) while keeping the readable node in the plan/EXPLAIN 
output.
   
   Added an AQE regression test in `MaterializeRuntimeReplaceableSuite` using 
an adaptive cached plan, so the scan is wrapped in a `TableCacheQueryStageExec` 
leaf (the gap you flagged). A surviving predicate `RuntimeReplaceable` is 
pushed into the scan; the test asserts the wrapper survives into 
`scan.predicates` and that cached-batch pruning still kicks in — it fails 
without the fix (`readBatches == 10`, no pruning). I also added comments at all 
three sites recording the codegen-vs-consumer split.
   
   One scoping note: this is forward-looking. The current opt-in survivor 
(`multi_get_json_object`) isn't a `buildFilter`-recognizable predicate shape, 
so no current query regresses; the fix matters for a future survivor whose 
`replacement` is a prunable comparison.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to