cloud-fan opened a new pull request, #56636: URL: https://github.com/apache/spark/pull/56636
### What changes were proposed in this pull request? Follow-up to #56535 (SPARK-54593). That PR narrowed materialized-input DPP eligibility from "the filtering side contains a materialized input" to a structural allowlist (`isRepeatableMaterializedPlan`: a materialized leaf composed only through deterministic `Project`/`Filter`/`Union`/`SubqueryAlias`). This reverts that narrowing: eligibility again only checks that the side contains an already-materialized input (a `LocalRelation`, or a checkpoint-derived `LogicalRDD`). The materialization guard from #56535 -- `isCheckpointedInput` requiring `rdd.isCheckpointed`, so a lazy checkpoint isn't treated as materialized -- is **kept**. ### Why are the changes needed? The allowlist tried to ensure the operators *above* the materialized leaf are repeatable. But that is the **general DPP re-evaluation concern**, not specific to materialized inputs: DPP duplicates the filtering side on every eligibility path, so a non-deterministic operator (a `mapPartitions` closure, a UDF over a non-deterministic source) is non-repeatable on the selective-predicate path too -- and Spark cannot decide a plan's repeatability in general (opaque RDD/closure non-determinism is invisible to Catalyst). So the allowlist (a) does not solve the general problem and (b) over-rejects legitimate deterministic materialized sides (e.g. an aggregate, or any non-allowlisted operator, over a materialized input) that re-evaluate fine. The one genuinely materialized-input-specific hazard -- a lazy checkpoint that has not been materialized yet -- is handled by `isCheckpointedInput` requiring `rdd.isCheckpointed`, which is retained. A non-repeatable plan above a materialized input (e.g. `checkpoint.mapPartitions(counter)`) can again be DPP-eligible. That is the same pre-existing, universal DPP re-evaluation limitation that the selective-predicate path already has; if we want to address it, it should be a uniform DPP-wide change, not a materialized-input-only narrowing. ### Does this PR introduce _any_ user-facing change? No. DPP is an optimization; results are unchanged for repeatable filtering sides, which is the supported case. ### How was this patch tested? Existing `DynamicPartitionPruning*Suite`s. Removes the two tests added in #56535 that asserted the reverted operators-above narrowing. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Anthropic Claude Opus) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
