sunchao commented on code in PR #56636:
URL: https://github.com/apache/spark/pull/56636#discussion_r3454115940


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##########
@@ -112,7 +112,13 @@ class SparkOptimizer(
       V2ScanRelationPushDown.ruleName,
       V2ScanPartitioningAndOrdering.ruleName,
       V2Writes.ruleName,
-      ReplaceCTERefWithRepartition.ruleName)
+      ReplaceCTERefWithRepartition.ruleName,
+      // CleanupDynamicPruningFilters finalizes the DPP predicates inserted by 
PartitionPruning --
+      // notably rewriting non-deterministic ones to `true` so they are not 
re-evaluated. That is
+      // correctness behavior, not an optional optimization, so the rule must 
not be excludable.
+      // Disabling DPP is done by excluding PartitionPruning (the inserter), 
after which this rule
+      // is a no-op.
+      CleanupDynamicPruningFilters.ruleName)

Review Comment:
   [P2] Add a regression test for this correctness boundary
   
   This entry now protects against `spark.sql.optimizer.excludedRules` 
reopening the silent-row-loss path from the prior thread, but the latest commit 
adds no test that attempts to exclude `CleanupDynamicPruningFilters`. 
`OptimizerRuleExclusionSuite` only exercises `SimpleTestOptimizer`, while 
`DynamicPartitionPruningSuite` covers cleanup under the default configuration, 
so removing or renaming this line would leave the current tests green.
   
   Could we add a focused regression that sets `OPTIMIZER_EXCLUDED_RULES` to 
`CleanupDynamicPruningFilters.ruleName` and verifies the rule remains installed 
and removes a Catalyst-visible nondeterministic DPP predicate? The 
counter-backed `reflect`/checkpoint repro from the prior thread would exercise 
the full wrong-result path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to