sunchao opened a new pull request, #56071:
URL: https://github.com/apache/spark/pull/56071
### What changes were proposed in this pull request?
This PR extends dynamic partition pruning (DPP) eligibility for small,
already
materialized filtering sides:
- `LocalRelation`, which represents locally available rows.
- `LogicalRDD` produced by `checkpoint()` or `localCheckpoint()`.
Checkpoint-created `LogicalRDD`s carry an explicit marker so that DPP is not
enabled for arbitrary `LogicalRDD` inputs that may require recomputing an
upstream query. This also keeps recursive CTE and `foreachBatch`-constructed
inputs outside the new eligibility rule.
This supersedes the unmerged approach in #53324 with narrower `LogicalRDD`
handling while addressing SPARK-54593.
### Why are the changes needed?
DPP currently requires a filtering predicate in the build-side logical plan.
When a small filtering side is already materialized as a `LocalRelation` or a
checkpointed `LogicalRDD`, that predicate is no longer present, so Spark
misses
partition pruning opportunities.
This occurs for joins where a partition expression is matched to a small set
of
keys, for example `concat_ws("||", hour, category) = hc_key`. Although the
expression is composed only from partition columns, the partitioned scan is
not
dynamically pruned when the filtering side is materialized.
### Does this PR introduce _any_ user-facing change?
Yes. Queries joining a partitioned file-source table with a small
`LocalRelation` or checkpointed filtering side may now perform dynamic
partition pruning and scan fewer partitions. There is no API change.
### How was this patch tested?
- Added positive coverage for DPP using a `LocalRelation` build side with an
expression over partition columns.
- Added positive coverage for DPP using a `localCheckpoint()` build side with
the same expression form.
- Added negative coverage confirming that a non-checkpointed `LogicalRDD`
does
not become DPP-eligible.
- Ran `build/sbt 'sql/testOnly
org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOn
org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff'`.
- Ran `build/sbt sql/scalastyle sql/Test/scalastyle`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: OpenAI Codex
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]