Hi Paimon community,
I’d like to confirm the intended semantics of *scan.snapshot-id* in Apache
Paimon *1.2* when reading with Flink.
What I see
- Table’s latest snapshot ID is *53*.
- I start a *streaming* query with:
SET 'execution.runtime-mode' = 'streaming';
SELECT * FROM my_table /*+ OPTIONS('scan.snapshot-id'='53') */;
- The job *emits records immediately*, even though there is *no snapshot
54* yet. After that initial output, it waits for new snapshots and
continues normally when new data arrives.
My understanding
- In streaming mode, when scan.snapshot-id is provided (and scan.mode
defaults to from-snapshot), the source *reads changes starting from that
snapshot* (i.e., includes the changes produced by snapshot S itself,
then S+1, S+2, …), and it *does not* first produce a full snapshot at
startup.
- In batch mode, using scan.snapshot-id = S should return the *full view
of snapshot S* only (no subsequent changes, no waiting).
Questions
1. *Streaming semantics:* Is it *by design* that from-snapshot is
*inclusive* of the starting snapshot’s changes (ΔS), i.e., it will
output ΔS even if there is no S+1 yet?
2. *Batch semantics:* Is it correct that batch + scan.snapshot-id = S
always returns *Full(S)* (and never ΔS)? Are there any exceptions
depending on table type?
3. *PK vs non-PK tables:* Should we expect any observable difference
here depending on whether the table is a primary-key table and/or the
changelog-producer is enabled (lookup, full-compaction, input)?
4. *Exclusive start recommendation:* If a user wants to *strictly start
from S+1* (i.e., exclude ΔS), is the recommended approach to:
- wait until S+1 exists and set scan.snapshot-id = S+1, or
- use incremental-between='S,S+1' for a bounded read?
5. *Docs wording:* If the above is the intended behavior, would it make
sense to emphasize the *inclusive* nature of from-snapshot in streaming
(vs. the *(start, end]* semantics of incremental-between) to help users
avoid confusion?
Environment:
- Paimon: 1.2
- Engine: Flink 1.19
- scan.mode: default (from-snapshot)
Thanks a lot for confirming the expected behavior .
Best regards