gengliangwang commented on code in PR #55636:
URL: https://github.com/apache/spark/pull/55636#discussion_r3174335852
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Changelog.java:
##########
@@ -35,8 +35,34 @@
* {@code update_preimage}, or {@code update_postimage}</li>
* <li>{@code _commit_version} (connector-defined type, e.g. LONG) — the
version containing
* this change</li>
- * <li>{@code _commit_timestamp} (TIMESTAMP) — the timestamp of the
commit</li>
+ * <li>{@code _commit_timestamp} (TIMESTAMP) -- the timestamp of the commit.
All rows
+ * belonging to a single {@code _commit_version} must share the same
+ * {@code _commit_timestamp}. For streaming reads with post-processing
enabled,
+ * two additional requirements apply:
+ * <ol>
+ * <li>All rows of a single commit must appear in the same micro-batch
(i.e.
Review Comment:
You're right on both counts. Updated in e8db78a:
Replaced requirement 2 ("distinct commit versions must have distinct
timestamps") with the actual invariant: each micro-batch's rows must carry
`_commit_timestamp` strictly greater than the maximum `_commit_timestamp` of
any prior micro-batch. The new wording explicitly mentions out-of-order commits
as a covered case (the `v2@ts=20`, `v3@ts=10` example you gave would now be a
contract violation).
Also clarified that multiple distinct commits with equal `_commit_timestamp`
are allowed within a single micro-batch -- only *across* batches does timestamp
progression need to be strictly increasing. That's strictly weaker than the
previous "distinct versions must have distinct timestamps" requirement and
avoids the unrealistic ms-resolution edge case you flagged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]