cloud-fan commented on code in PR #56499:
URL: https://github.com/apache/spark/pull/56499#discussion_r3449190107


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala:
##########
@@ -290,8 +296,9 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
       testConstraintsAfterJoin(
         originalLeft,
         originalRight,
-        testRelation1.where(IsNotNull($"a")).subquery("left"),
-        right,
+        testRelation1.where(IsNotNull($"a") && ($"a" === 1)).subquery("left"),
+        testRelation2.where(IsNotNull($"b") && ($"b" === 1L) &&
+          ($"b".attr.cast(IntegerType) === 1)).subquery("right"),

Review Comment:
   This expected plan now carries both `($"b" === 1L)` and 
`($"b".attr.cast(IntegerType) === 1)` — two encodings of `b = 1` — and the 
suite's optimizer batch gained `ConstantFolding` + 
`UnwrapCastInBinaryComparison` (lines 41/43) whose comment says they "collapse 
and deduplicate" these, yet both still appear, so the dedup is incomplete. This 
is the same over-eager-substitution smell as the design comment above; once the 
inference is reshaped (or at least narrowed), these batch additions should no 
longer be needed and the golden expected plan shouldn't bake in both predicate 
forms.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -71,12 +72,17 @@ trait ConstraintHelper {
         val candidateConstraints = predicates - eq - EqualNullSafe(l, r)
         inferredConstraints ++= replaceConstraints(candidateConstraints, l, r)
         inferredConstraints ++= replaceConstraints(candidateConstraints, r, l)
+      case eq @ EqualTo(l: Attribute, r: Literal) if 
isBinaryStable(l.dataType) =>

Review Comment:
   The two new cases fold the bound literal into *every* remaining predicate 
via `replaceConstraints(predicates - eq, l, r)`. That generality — not the 
optimization itself — is the common source of all three problems this PR keeps 
running into:
   
   1. **collation** — substituting a literal into a re-collating comparison 
(now patched with `isBinaryStable`);
   2. **`EqualNullSafe` / q78** — substituting `ss_sold_year = 2000` into the 
`EqualNullSafe(ws_sold_year, ss_sold_year)` that a LEFT JOIN emits for null 
semantics yields `EqualNullSafe(ws_sold_year, 2000)`, a predicate the pre-PR 
code never produced. Pushed into the `ws` date_dim dynamic-pruning subquery it 
becomes `EqualNullSafe(d_year, 2000)`, which subquery-reuse treats as distinct 
from store_sales' `EqualTo(d_year, 2000)` — hence the duplicate 
`SubqueryBroadcast` and the q78 plan-stability failure;
   3. **redundant duplicates** — the `cast(b)=1` clone alongside `b = 1L` (see 
the test nit).
   
   **Recommended direction:** rather than generic substitution, make this a 
focused *inequality-bound* inference — scan range predicates (`<`, `<=`, `>`, 
`>=`) that bound one attribute in terms of a constant-pinned attribute 
(optionally through a numeric `+`/`-` offset) and emit the corresponding 
constant bound (`b.v >= a.k` with `a.k = 5` => `b.v >= 5`; `b.v <= a.k + 10` => 
`b.v <= 15`). Leave `EqualTo` / `EqualNullSafe` to the existing `EqualTo(attr, 
attr)` transitivity case above. Scoped this way the rule emits only the bounds 
that are the goal and structurally cannot produce the collation, null-safe, or 
duplicate problems — it collapses the collation guard, the `EqualNullSafe` 
exclusion, and the cast-dedup batch rules into one coherent design.
   
   If you prefer to keep the current general shape this round, the **minimum** 
to unblock q78 is to exclude `EqualNullSafe` from the targets in both literal 
cases:
   
   ```scala
   case eq @ EqualTo(l: Attribute, r: Literal) if isBinaryStable(l.dataType) =>
     inferredConstraints ++=
       replaceConstraints((predicates - 
eq).filterNot(_.isInstanceOf[EqualNullSafe]), l, r)
   ```
   
   That restores the original q78 plan with no golden-file churn (the old code 
inferred nothing from the LEFT JOIN's `EqualNullSafe`, since the attr-attr case 
only matched `EqualTo`), but leaves the duplicate-equality churn in place. Note 
the sibling `EqualTo(attr, attr)` case at lines 70-72 already excludes 
`EqualNullSafe` for the analogous idempotence reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to