Hi Paimon community,

I’d like to confirm the intended semantics of *scan.snapshot-id* in Apache
Paimon *1.2* when reading with Flink.
What I see

   - Table’s latest snapshot ID is *53*.
   - I start a *streaming* query with:

   SET 'execution.runtime-mode' = 'streaming';
   SELECT * FROM my_table /*+ OPTIONS('scan.snapshot-id'='53') */;

   - The job *emits records immediately*, even though there is *no snapshot
   54* yet. After that initial output, it waits for new snapshots and
   continues normally when new data arrives.

My understanding

   - In streaming mode, when scan.snapshot-id is provided (and scan.mode
   defaults to from-snapshot), the source *reads changes starting from that
   snapshot* (i.e., includes the changes produced by snapshot S itself,
   then S+1, S+2, …), and it *does not* first produce a full snapshot at
   startup.
   - In batch mode, using scan.snapshot-id = S should return the *full view
   of snapshot S* only (no subsequent changes, no waiting).

Questions

   1. *Streaming semantics:* Is it *by design* that from-snapshot is
   *inclusive* of the starting snapshot’s changes (ΔS), i.e., it will
   output ΔS even if there is no S+1 yet?
   2. *Batch semantics:* Is it correct that batch + scan.snapshot-id = S
   always returns *Full(S)* (and never ΔS)? Are there any exceptions
   depending on table type?
   3. *PK vs non-PK tables:* Should we expect any observable difference
   here depending on whether the table is a primary-key table and/or the
   changelog-producer is enabled (lookup, full-compaction, input)?
   4. *Exclusive start recommendation:* If a user wants to *strictly start
   from S+1* (i.e., exclude ΔS), is the recommended approach to:
      - wait until S+1 exists and set scan.snapshot-id = S+1, or
      - use incremental-between='S,S+1' for a bounded read?
   5. *Docs wording:* If the above is the intended behavior, would it make
   sense to emphasize the *inclusive* nature of from-snapshot in streaming
   (vs. the *(start, end]* semantics of incremental-between) to help users
   avoid confusion?

Environment:

   - Paimon: 1.2
   - Engine: Flink 1.19
   - scan.mode: default (from-snapshot)

Thanks a lot for confirming the expected behavior .

Best regards

Reply via email to