[PR] [SPARK-57442][SQL] Add opt-in nullability check to logical plan integrity validation [spark]

via GitHub Sun, 14 Jun 2026 10:53:40 -0700


cloud-fan opened a new pull request, #56502:
URL: https://github.com/apache/spark/pull/56502


   ### What changes were proposed in this pull request?
   
   This PR adds an opt-in nullability check to logical plan integrity 
validation.
   
   - A new internal config `spark.sql.planChangeValidation.nullability` 
(default `false`) gates the check. It is a sub-flag of the existing 
`spark.sql.planChangeValidation`.
   - A new `LogicalPlanIntegrity.validateNullability` walks the plan and flags 
any operator that references one of its child's output attributes with a 
**narrower** nullability than the child produces — i.e. declaring the attribute 
`nullable = false` while the child produces it as `nullable = true`. It is 
wired into `validateOptimizedPlan`'s existing check chain.
   - References declared **wider** (nullable over a non-nullable child) are 
safe and not flagged; references whose `ExprId` is not produced by any child 
(e.g. outer references) are ignored.
   
   ### Why are the changes needed?
   
   When a rule synthesizes operators, every reference to a child output must 
carry the nullability the child actually produces. A classic miss is an 
outer-join rewrite that references the null-widened side with its original 
non-nullable attributes — for example SPARK-56395 in `RewriteNearestByJoin`, 
where the synthesized `Aggregate` referenced the right side (widened to 
nullable by the synthetic `LEFT OUTER` join) with its original non-nullable 
metadata.
   
   Such a mismatch is a latent plan-integrity defect: the rows are unchanged, 
so it is invisible to result-equivalence reasoning, to `validateSchemaOutput` 
(which compares types `asNullable`), and to green CI. Yet a later optimization 
that trusts the non-nullable declaration can drop a null check and produce 
wrong results. There is currently no integrity check that catches this class of 
defect; this adds one.
   
   The check is disabled by default — even though 
`spark.sql.planChangeValidation` defaults to on in test runs — because not all 
existing rules maintain precise nullability today, and enabling it globally 
would surface pre-existing benign mismatches. Suites that have been audited (or 
rules that have been fixed) can opt in to guard against regressions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. The config is internal and disabled by default; the check only runs 
under plan-change validation when explicitly enabled.
   
   ### How was this patch tested?
   
   New unit test in `LogicalPlanIntegritySuite` that, with the config enabled, 
asserts:
   - a reference declared non-nullable over a nullable child is flagged (and 
the message names the offending attribute);
   - a faithful reference (same nullability as the child) passes;
   - a wider reference (nullable over a non-nullable child) passes;
   
   and that the check is a no-op when the config is disabled.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.8)
   
   This pull request and its description were written by Isaac.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-57442][SQL] Add opt-in nullability check to logical plan integrity validation [spark]

Reply via email to