As far as I know, we support ROW_NUMBER in SQL that could give you sequence number.

Regarding window semantics, the processing time only determines when to trigger the evaluation (also mentioned here [1]). A timer is registered for the next evaluation. The window content and next timer is part of every checkpoint and savepoint. If you restore from a checkpoint/savepoint, the stored next timestamp will be checked with the current wall clock and an evaluation might be triggered immediately. Thus, usually event-time is more useful than processing time. If you have a lot of processing time timers set, they might all fire immediately during a restore.

So the window will not start over from scratch. But inflight data that was about to reach the window operator will be reread from the source operator.

Timo


[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#triggers


On 01.02.21 20:06, Rex Fenley wrote:
We need to aggregate in precisely row order. Is there a safe way to do this? Maybe with some sort of row time sequence number?

As written in another email, we're currently doing the following set of operations
valcompactedUserDocsStream = userDocsStream
.window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
.aggregate(newCompactionAggregate())

I guess my concern is if we restore from a checkpoint or savepoint I don't understand how the window get's checkpointed and how window alignment works between runs of a job. Will the window just start over from scratch, and re-process any rows that may have been inflight but not finished processing in the previous run's last window?

If so then I guess everything will arrive in row order like we want it to. But if a window get's checkpointed with its previous proctime, then it may be misaligned in the next run and drop rows that were in that window.

On Mon, Feb 1, 2021 at 6:37 AM Timo Walther <twal...@apache.org <mailto:twal...@apache.org>> wrote:

    Hi Rex,

    processing-time gives you no alignment of operators across nodes. Each
    operation works with its local machine clock that might be interrupted
    by the OS, Java garbage collector, etc. It is always a best effort
    timing.

    Regards,
    Timo


    On 27.01.21 18:16, Rex Fenley wrote:
     > Hello,
     >
     > I'm looking at ways to deduplicate data and found [1], but does
    proctime
     > get committed with operators? How does this work against clock
    skew on
     > different machines?
     >
     > Thanks
     >
     > [1]
     >
    
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication>

     >
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication>>
     >
     > --
     >
     > Rex Fenley|Software Engineer - Mobile and Backend
     >
     >
     > Remind.com <https://www.remind.com/ <https://www.remind.com/>>|
    BLOG <http://blog.remind.com/ <http://blog.remind.com/>> |
     > FOLLOW US <https://twitter.com/remindhq
    <https://twitter.com/remindhq>> | LIKE US
     > <https://www.facebook.com/remindhq
    <https://www.facebook.com/remindhq>>
     >



--

Rex Fenley|Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>




Reply via email to