gengliangwang opened a new pull request, #56711:
URL: https://github.com/apache/spark/pull/56711
### What changes were proposed in this pull request?
This PR makes `predicateSql` a mandatory field of the DSv2 `Check`
constraint (`org.apache.spark.sql.connector.catalog.constraints.Check`).
Previously, `Check.Builder.build()` only rejected the case where **both**
`predicateSql` and `predicate` were `null`, which allowed a `Check` to be
constructed with only a structured `predicate`. This PR tightens the validation
so that `predicateSql` must always be provided. `predicate` remains optional
and is the structured form used when the condition can be expressed with
supported expressions.
Specifically:
- `Check.Builder.build()` now throws when `predicateSql` is `null`,
regardless of `predicate`.
- `Check.definition()` is simplified to always render `predicateSql` (the
previous fallback to `predicate` is dead code now that `predicateSql` is
guaranteed to be present).
- Javadoc on the class, `predicateSql()`, and `predicate()` is updated to
document that `predicateSql` is the canonical representation and is always
present, while `predicate` is optional and may be `null`.
### Why are the changes needed?
`predicateSql` is the canonical representation of a CHECK condition. Spark
always populates it from the original SQL text in
`CheckConstraint.toV2Constraint`, while `predicate` is only set when the
condition can be translated to a supported `Predicate` (it is `null` otherwise,
e.g. for `from_json(j, 'a INT').a > 1`).
Several read paths already assume `predicateSql` is present. For example,
`ResolveTableConstraints.buildCatalystExpression` prefers the structured
`predicate` but falls back to parsing `predicateSql`:
```scala
Option(c.predicate())
.flatMap(V2ExpressionUtils.toCatalyst)
.getOrElse(catalogManager.v1SessionCatalog.parser.parseExpression(c.predicateSql()))
```
If a connector were to build a `Check` with a `null` `predicateSql` and a
`predicate` that cannot be converted back by `V2ExpressionUtils.toCatalyst`,
this would fall through to `parseExpression(null)` and fail with an NPE.
`predicateSql` is also used as the human-readable condition in CHECK violation
error messages. Requiring `predicateSql` makes the invariant explicit and keeps
these paths safe.
### Does this PR introduce _any_ user-facing change?
No. `Check` is an `@Evolving` DSv2 API, and Spark itself always sets
`predicateSql`, so no existing Spark behavior changes. The only effect is
tighter validation for connector authors who construct `Check` directly:
building a `Check` without `predicateSql` now fails fast with a clear error
instead of producing a constraint that downstream code already assumes is
invalid.
### How was this patch tested?
Updated `ConstraintSuite`:
- The existing "CHECK constraint toDDL" `con2` case now also supplies
`predicateSql` (it previously relied on the predicate-only path).
- Added "CHECK constraint requires predicateSql", asserting that `build()`
fails with `INTERNAL_ERROR` when `predicateSql` is absent, both when no
condition is supplied at all and when only a `predicate` is supplied.
```
build/sbt 'catalyst/testOnly *ConstraintSuite'
```
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]