Re: Proctime consistency

Timo Walther Tue, 02 Feb 2021 03:17:28 -0800

As far as I know, we support ROW_NUMBER in SQL that could give yousequence number.

Regarding window semantics, the processing time only determines when totrigger the evaluation (also mentioned here [1]). A timer is registeredfor the next evaluation. The window content and next timer is part ofevery checkpoint and savepoint. If you restore from acheckpoint/savepoint, the stored next timestamp will be checked with thecurrent wall clock and an evaluation might be triggered immediately.Thus, usually event-time is more useful than processing time. If youhave a lot of processing time timers set, they might all fireimmediately during a restore.

So the window will not start over from scratch. But inflight data thatwas about to reach the window operator will be reread from the sourceoperator.


Timo

[1]https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#triggers



On 01.02.21 20:06, Rex Fenley wrote:

We need to aggregate in precisely row order. Is there a safe way to dothis? Maybe with some sort of row time sequence number?
As written in another email, we're currently doing the following set ofoperations
valcompactedUserDocsStream = userDocsStream
.window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
.aggregate(newCompactionAggregate())
I guess my concern is if we restore from a checkpoint or savepoint Idon't understand how the window get's checkpointed and how windowalignment works between runs of a job. Will the window just start overfrom scratch, and re-process any rows that may have been inflight butnot finished processing in the previous run's last window?
If so then I guess everything will arrive in row order like we want itto. But if a window get's checkpointed with its previous proctime, thenit may be misaligned in the next run and drop rows that were in that window.
On Mon, Feb 1, 2021 at 6:37 AM Timo Walther <twal...@apache.org<mailto:twal...@apache.org>> wrote:
    Hi Rex,

    processing-time gives you no alignment of operators across nodes. Each
    operation works with its local machine clock that might be interrupted
    by the OS, Java garbage collector, etc. It is always a best effort
    timing.

    Regards,
    Timo


    On 27.01.21 18:16, Rex Fenley wrote:
     > Hello,
     >
     > I'm looking at ways to deduplicate data and found [1], but does
    proctime
     > get committed with operators? How does this work against clock
    skew on
     > different machines?
     >
     > Thanks
     >
     > [1]
     >
    
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication>

     >
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication>>
     >
     > --
     >
     > Rex Fenley|Software Engineer - Mobile and Backend
     >
     >
     > Remind.com <https://www.remind.com/ <https://www.remind.com/>>|
    BLOG <http://blog.remind.com/ <http://blog.remind.com/>> |
     > FOLLOW US <https://twitter.com/remindhq
    <https://twitter.com/remindhq>> | LIKE US
     > <https://www.facebook.com/remindhq
    <https://www.facebook.com/remindhq>>
     >



--

Rex Fenley|Software Engineer - Mobile and Backend
Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |FOLLOW US <https://twitter.com/remindhq> | LIKE US<https://www.facebook.com/remindhq>

Re: Proctime consistency

Reply via email to