I agree with your understanding about the semantics of from-snapshot scan mode. I guess your Flink job might not be executing in streaming mode?
If all conditions are checked but the actual behavior is still different from the configuration document’s description, you can create an issue in Paimon repo to record this potential bug, with your detailed code to reproduce the unexpected behavior. > 2025年10月11日 11:59,lec ssmi <[email protected]> 写道: > > What confuses me most is why, when I set the snapshot ID to the current > latest, the job still produces records. From the documentation, my > understanding is that in streaming mode this usage should not consume that > snapshot itself, but rather use it as the starting point to consume > incremental changes. This should also be one of the main differences between > from-snapshot and from-snapshot-full, right? > > > Yunfeng Zhou <[email protected] > <mailto:[email protected]>> 于 2025年10月11日周六 11:15写道: >> Hi lec ssmi, >> >> Most of your questions and understandings might be addressed by the document >> of the configuration `scan.mode`. You can find it here >> https://paimon.apache.org/docs/master/maintenance/configurations/ >> It explains differences between modes like "from-snapshot” and >> “from-timestamp-full” (corresponds to your exclusive/inclusive converns) and >> the different behaviors between batch and streaming mode. >> >> The streaming read result does differ according to the changelog producer of >> the table. You can find the corresponding behaviors here >> https://paimon.apache.org/docs/master/primary-key-table/changelog-producer/ >> >> Best, >> Yunfeng >> >>> 2025年10月11日 09:38,lec ssmi <[email protected] >>> <mailto:[email protected]>> 写道: >>> >>> Hi Paimon community, >>> >>> I’d like to confirm the intended semantics of scan.snapshot-id in Apache >>> Paimon 1.2 when reading with Flink. >>> >>> What I see >>> >>> Table’s latest snapshot ID is 53. >>> I start a streaming query with: >>> SET 'execution.runtime-mode' = 'streaming'; >>> SELECT * FROM my_table /*+ OPTIONS('scan.snapshot-id'='53') */; >>> The job emits records immediately, even though there is no snapshot 54 yet. >>> After that initial output, it waits for new snapshots and continues >>> normally when new data arrives. >>> My understanding >>> >>> In streaming mode, when scan.snapshot-id is provided (and scan.mode >>> defaults to from-snapshot), the source reads changes starting from that >>> snapshot (i.e., includes the changes produced by snapshot S itself, then >>> S+1, S+2, …), and it does not first produce a full snapshot at startup. >>> In batch mode, using scan.snapshot-id = S should return the full view of >>> snapshot S only (no subsequent changes, no waiting). >>> Questions >>> >>> Streaming semantics: Is it by design that from-snapshot is inclusive of the >>> starting snapshot’s changes (ΔS), i.e., it will output ΔS even if there is >>> no S+1 yet? >>> Batch semantics: Is it correct that batch + scan.snapshot-id = S always >>> returns Full(S) (and never ΔS)? Are there any exceptions depending on table >>> type? >>> PK vs non-PK tables: Should we expect any observable difference here >>> depending on whether the table is a primary-key table and/or the >>> changelog-producer is enabled (lookup, full-compaction, input)? >>> Exclusive start recommendation: If a user wants to strictly start from S+1 >>> (i.e., exclude ΔS), is the recommended approach to: >>> wait until S+1 exists and set scan.snapshot-id = S+1, or >>> use incremental-between='S,S+1' for a bounded read? >>> Docs wording: If the above is the intended behavior, would it make sense to >>> emphasize the inclusive nature of from-snapshot in streaming (vs. the >>> (start, end] semantics of incremental-between) to help users avoid >>> confusion? >>> Environment: >>> >>> Paimon: 1.2 >>> Engine: Flink 1.19 >>> scan.mode: default (from-snapshot) >>> Thanks a lot for confirming the expected behavior . >>> >>> Best regards >>> >>
