sunchao commented on code in PR #56603:
URL: https://github.com/apache/spark/pull/56603#discussion_r3440311657


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala:
##########
@@ -110,8 +110,14 @@ object PartitionPruning extends Rule[LogicalPlan] with 
PredicateHelper with Join
     require(filteringKeys.size == 1, "DPP Filters should only have a single 
broadcasting key " +
       "since there are no usage for multiple broadcasting keys at the moment.")
     val indices = Seq(joinKeys.indexOf(filteringKeys.head))
-    lazy val hasBenefit = pruningHasBenefit(
-      pruningKey, partScan, filteringKeys.head, filteringPlan)
+    // A filtering side that is eligible only because it is already 
materialized (a LocalRelation
+    // or a checkpoint-derived LogicalRDD) carries no selective predicate. 
pruningHasBenefit's
+    // fallback filtering ratio is justified only "when CBO stats are missing, 
but there is a
+    // predicate that is likely to be selective", which does not hold here, so 
it would assume a
+    // benefit that may not exist. Without an estimated benefit such a side is 
only injected when it
+    // can reuse a broadcast (onlyInBroadcast), never as a standalone 
always-applied subquery.
+    lazy val hasBenefit = hasSelectivePredicate(filteringPlan) &&

Review Comment:
   [P2] Preserve stats-backed standalone pruning
   
   This `hasSelectivePredicate(filteringPlan) && pruningHasBenefit(...)` guard 
short-circuits before `pruningHasBenefit` can consult column statistics. A 
checkpoint-derived `LogicalRDD` can retain valid `attributeStats` even though 
it no longer contains a `Filter`. I reproduced this with a 100-partition table, 
key-side NDV = 1, fact-side NDV > 1, `fallbackFilterRatio=0`, 
`reuseBroadcastOnly=false`, and no reusable broadcast: the parent injects 
standalone DPP and prunes the scan, while this head removes DPP and scans all 
100 partitions with identical results. Please gate only the fallback-ratio path 
on `hasSelectivePredicate`, while preserving the NDV-backed benefit path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to