gengliangwang opened a new pull request, #55664:
URL: https://github.com/apache/spark/pull/55664

   ### What changes were proposed in this pull request?
   
   Document the predicate pushdown contract for CDC `Changelog` connectors in 
the `Changelog` Javadoc:
   
   - When any post-processing pass applies (carry-over removal, update 
detection, or netChanges), the connector's `SupportsPushDownFilters` / 
`SupportsPushDownV2Filters` implementation will only receive predicates that 
reference `_commit_version`, `_commit_timestamp`, or columns named by `rowId()`.
   - Predicates on `_change_type`, the `rowVersion()` column, or any data 
column are kept above the scan and never reach `pushFilters` / 
`pushPredicates`, because pushing them would drop a single half of a 
delete/insert pair within a row-identity group and silently break 
post-processing.
   - The restriction is enforced by the rewrite shape itself: a `Window` / 
`Aggregate` / `TransformWithState` keyed on the safe columns sits between the 
relation and the user's filter, so Catalyst's predicate-pushdown rules 
naturally block unsafe pushes. Connectors do not need to code this restriction 
themselves, but they must not bypass it (e.g. by self-applying filters from 
connector-specific options).
   
   This is a sub-task of SPARK-55668.
   
   ### Why are the changes needed?
   
   The contract was implicit. A connector author reading the Javadoc could 
reasonably implement `SupportsPushDownFilters` and accept all predicates, 
including unsafe ones, expecting Spark to handle the rest. Spelling out which 
predicates the connector actually needs to handle (and why others are 
intentionally never delivered) prevents accidental misuse and explains the 
asymmetry to anyone debugging an unexpected post-scan filter.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Documentation only. No behavior change.
   
   ### How was this patch tested?
   
   `Xdoclint:html,syntax,accessibility` is clean on `Changelog.java`. No code 
changed; existing CDC test suites unaffected.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude opus-4-7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to