gengliangwang opened a new pull request, #55663: URL: https://github.com/apache/spark/pull/55663
### What changes were proposed in this pull request? Tighten the CDC `Changelog` connector contract so that `_commit_version` must be either `LongType` or `StringType`. Previously any `AtomicType` was accepted, which left several edge-case types (`IntegerType`, `TimestampType`, `BinaryType`, `Decimal`, `Float`, `Double`, `Boolean`, ...) silently allowed. - `ChangelogTable.validateSchema` now rejects everything outside `LongType` / `StringType` with a `BIGINT or STRING` expected-type message. - `Changelog` Javadoc updated to state the narrower contract and explain the ordering requirement (the netChanges post-processing path sorts rows by this column, so the column's natural ordering must match commit order). - `CdcNetChangesStatefulProcessor` ordering comment updated; the existing Catalyst-routed comparator is left in place for symmetry with the batch `SortOrder`. - `ChangelogResolutionSuite` updates: accept-list narrowed to `Long` / `String`; reject-list expanded to cover the previously-allowed atomic types (`Integer`, `Timestamp`) plus the existing complex-type cases. ### Why are the changes needed? `Long` (numeric monotonic version) and `String` (lexicographically ordered commit identifier) cover every realistic CDC source. The other atomic types are either strict subsets (`IntegerType` -> `LongType`) or duplicate the role of `_commit_timestamp` (`TimestampType`); types like `BinaryType` / `Float` / `Double` add NaN / boxing / ordering foot-guns with no expressive power gained. The narrower contract also lets the Javadoc state the ordering requirement precisely (matching what the netChanges code actually relies on). Locking down now is non-breaking (no external connectors yet) and keeps the documented surface area small. Relaxing later is non-breaking; restricting later is not. ### Does this PR introduce _any_ user-facing change? The `Changelog` connector API is `@Evolving` and has no external implementations yet; the restriction only narrows what implementers may return. No user-facing behavior change. ### How was this patch tested? - `ChangelogResolutionSuite` (27 tests) covers the new accept / reject matrix. - `ResolveChangelogTablePostProcessingSuite`, `ResolveChangelogTableStreamingPostProcessingSuite`, `ResolveChangelogTableNetChangesSuite`, `ChangelogEndToEndSuite` -- 130 existing tests still pass on the new contract. - `UnsupportedOperationsSuite` (216 tests) still passes. - `Xdoclint:html,syntax,accessibility` is clean on `Changelog.java`; no new warnings under `Xdoclint:all`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude opus-4-7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
