Re: Clarify semantics of scan.snapshot-id in streaming vs batch (Paimon 1.2)

Yunfeng Zhou Fri, 17 Oct 2025 17:31:16 -0700

Hi lec ssmi,

Most of your questions and understandings might be addressed by the document of 
the configuration `scan.mode`. You can find it here
https://paimon.apache.org/docs/master/maintenance/configurations/
It explains differences between modes like "from-snapshot” and 
“from-timestamp-full” (corresponds to your exclusive/inclusive converns) and 
the different behaviors between batch and streaming mode.


The streaming read result does differ according to the changelog producer of 
the table. You can find the corresponding behaviors here
https://paimon.apache.org/docs/master/primary-key-table/changelog-producer/

Best,
Yunfeng

> 2025年10月11日 09:38，lec ssmi <[email protected]> 写道：
> 
> Hi Paimon community,
> 
> I’d like to confirm the intended semantics of scan.snapshot-id in Apache 
> Paimon 1.2 when reading with Flink.
> 
> What I see
> 
> Table’s latest snapshot ID is 53.
> I start a streaming query with:
> SET 'execution.runtime-mode' = 'streaming';
> SELECT * FROM my_table /*+ OPTIONS('scan.snapshot-id'='53') */;
> The job emits records immediately, even though there is no snapshot 54 yet. 
> After that initial output, it waits for new snapshots and continues normally 
> when new data arrives.
> My understanding
> 
> In streaming mode, when scan.snapshot-id is provided (and scan.mode defaults 
> to from-snapshot), the source reads changes starting from that snapshot 
> (i.e., includes the changes produced by snapshot S itself, then S+1, S+2, …), 
> and it does not first produce a full snapshot at startup.
> In batch mode, using scan.snapshot-id = S should return the full view of 
> snapshot S only (no subsequent changes, no waiting).
> Questions
> 
> Streaming semantics: Is it by design that from-snapshot is inclusive of the 
> starting snapshot’s changes (ΔS), i.e., it will output ΔS even if there is no 
> S+1 yet?
> Batch semantics: Is it correct that batch + scan.snapshot-id = S always 
> returns Full(S) (and never ΔS)? Are there any exceptions depending on table 
> type?
> PK vs non-PK tables: Should we expect any observable difference here 
> depending on whether the table is a primary-key table and/or the 
> changelog-producer is enabled (lookup, full-compaction, input)?
> Exclusive start recommendation: If a user wants to strictly start from S+1 
> (i.e., exclude ΔS), is the recommended approach to:
> wait until S+1 exists and set scan.snapshot-id = S+1, or
> use incremental-between='S,S+1' for a bounded read?
> Docs wording: If the above is the intended behavior, would it make sense to 
> emphasize the inclusive nature of from-snapshot in streaming (vs. the (start, 
> end] semantics of incremental-between) to help users avoid confusion?
> Environment:
> 
> Paimon: 1.2
> Engine: Flink 1.19
> scan.mode: default (from-snapshot) 
> Thanks a lot for confirming the expected behavior .
> 
> Best regards
>

Re: Clarify semantics of scan.snapshot-id in streaming vs batch (Paimon 1.2)

Reply via email to