Re: [PR] [SPARK-54593][SQL][FOLLOWUP] Require a statistics-based benefit, not the fallback ratio, for materialized-input DPP [spark]

via GitHub Mon, 22 Jun 2026 14:20:27 -0700


sunchao commented on code in PR #56603:
URL: https://github.com/apache/spark/pull/56603#discussion_r3455506518



##########
sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala:
##########
@@ -1880,33 +1992,20 @@ abstract class DynamicPartitionPruningV1Suite extends 
DynamicPartitionPruningDat
           DppMaterializedInputTestState.reset(counterId)
           assert(df.collect().toSeq === Seq(Row(1)))
           assert(activeDppSubqueries(df).isEmpty,
-            s"Shouldn't trigger DPP for an opaque materialized plan:\n" +
+            s"Shouldn't execute standalone DPP for this filtering side:\n" +
               df.queryExecution)
         }
 
+        // A LocalRelation is cheaply recomputable and exposes an exact 
maxRows, a conservative
+        // bound on its join-key NDV, so its pruning benefit is estimable and 
it gets a standalone
+        // DPP subquery.
         checkStandaloneDpp(Seq(1).toDF("p"))

Review Comment:
   [P2] Preserve the recompute-cost guard coverage from #56636
   
   The merge resolution removes all three assertions added by #56636 for the 
remaining branches of `isCheaplyRecomputableMaterializedPlan`: a cheap 
non-selective `Filter`/`Project` stays eligible, while an `Aggregate` and a UDF 
projection remain excluded. Those production branches still exist, but I could 
not find replacement coverage for them.
   
   I understand that the old fixtures forced benefit through the fallback 
ratio, which this PR intentionally gates. Could we adapt those cases to 
establish a statistics/`maxRows` benefit or exercise broadcast reuse instead of 
deleting them? Otherwise regressions in the just-merged recompute-cost guard 
will no longer fail CI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54593][SQL][FOLLOWUP] Require a statistics-based benefit, not the fallback ratio, for materialized-input DPP [spark]

Reply via email to