sunchao commented on code in PR #56603:
URL: https://github.com/apache/spark/pull/56603#discussion_r3455506518
##########
sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala:
##########
@@ -1880,33 +1992,20 @@ abstract class DynamicPartitionPruningV1Suite extends
DynamicPartitionPruningDat
DppMaterializedInputTestState.reset(counterId)
assert(df.collect().toSeq === Seq(Row(1)))
assert(activeDppSubqueries(df).isEmpty,
- s"Shouldn't trigger DPP for an opaque materialized plan:\n" +
+ s"Shouldn't execute standalone DPP for this filtering side:\n" +
df.queryExecution)
}
+ // A LocalRelation is cheaply recomputable and exposes an exact
maxRows, a conservative
+ // bound on its join-key NDV, so its pruning benefit is estimable and
it gets a standalone
+ // DPP subquery.
checkStandaloneDpp(Seq(1).toDF("p"))
Review Comment:
[P2] Preserve the recompute-cost guard coverage from #56636
The merge resolution removes all three assertions added by #56636 for the
remaining branches of `isCheaplyRecomputableMaterializedPlan`: a cheap
non-selective `Filter`/`Project` stays eligible, while an `Aggregate` and a UDF
projection remain excluded. Those production branches still exist, but I could
not find replacement coverage for them.
I understand that the old fixtures forced benefit through the fallback
ratio, which this PR intentionally gates. Could we adapt those cases to
establish a statistics/`maxRows` benefit or exercise broadcast reuse instead of
deleting them? Otherwise regressions in the just-merged recompute-cost guard
will no longer fail CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]