viirya commented on code in PR #55636:
URL: https://github.com/apache/spark/pull/55636#discussion_r3172307259
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Changelog.java:
##########
@@ -35,8 +35,34 @@
* {@code update_preimage}, or {@code update_postimage}</li>
* <li>{@code _commit_version} (connector-defined type, e.g. LONG) — the
version containing
* this change</li>
- * <li>{@code _commit_timestamp} (TIMESTAMP) — the timestamp of the
commit</li>
+ * <li>{@code _commit_timestamp} (TIMESTAMP) -- the timestamp of the commit.
All rows
+ * belonging to a single {@code _commit_version} must share the same
+ * {@code _commit_timestamp}. For streaming reads with post-processing
enabled,
+ * two additional requirements apply:
+ * <ol>
+ * <li>All rows of a single commit must appear in the same micro-batch
(i.e.
Review Comment:
The new requirements fix the “same commit split across batches” case and the
“same timestamp in later batch” case only if timestamps also arrive in
increasing event-time order. But the doc no longer explicitly requires that
every later micro-batch has _commit_timestamp greater than the previous
watermark/max.
Example:
batch 1: commit v2, ts = 20
batch 2: commit v3, ts = 10
Timestamps are distinct, and each commit is atomic, but batch 2 is late
after watermark 20. So the real required invariant is closer to: no later
micro-batch may contain rows with _commit_timestamp <= previous max event time.
Also, “distinct commit versions must have distinct timestamps” is stronger than
necessary and may be unrealistic for ms-resolution commit timestamps; equal
timestamps are safe if all such commits are emitted before the watermark
advances.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]