As far as I know, we support ROW_NUMBER in SQL that could give you
sequence number.
Regarding window semantics, the processing time only determines when to
trigger the evaluation (also mentioned here [1]). A timer is registered
for the next evaluation. The window content and next timer is part of
every checkpoint and savepoint. If you restore from a
checkpoint/savepoint, the stored next timestamp will be checked with the
current wall clock and an evaluation might be triggered immediately.
Thus, usually event-time is more useful than processing time. If you
have a lot of processing time timers set, they might all fire
immediately during a restore.
So the window will not start over from scratch. But inflight data that
was about to reach the window operator will be reread from the source
operator.
Timo
[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#triggers
On 01.02.21 20:06, Rex Fenley wrote:
We need to aggregate in precisely row order. Is there a safe way to do
this? Maybe with some sort of row time sequence number?
As written in another email, we're currently doing the following set of
operations
valcompactedUserDocsStream = userDocsStream
.window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
.aggregate(newCompactionAggregate())
I guess my concern is if we restore from a checkpoint or savepoint I
don't understand how the window get's checkpointed and how window
alignment works between runs of a job. Will the window just start over
from scratch, and re-process any rows that may have been inflight but
not finished processing in the previous run's last window?
If so then I guess everything will arrive in row order like we want it
to. But if a window get's checkpointed with its previous proctime, then
it may be misaligned in the next run and drop rows that were in that window.
On Mon, Feb 1, 2021 at 6:37 AM Timo Walther <twal...@apache.org
<mailto:twal...@apache.org>> wrote:
Hi Rex,
processing-time gives you no alignment of operators across nodes. Each
operation works with its local machine clock that might be interrupted
by the OS, Java garbage collector, etc. It is always a best effort
timing.
Regards,
Timo
On 27.01.21 18:16, Rex Fenley wrote:
> Hello,
>
> I'm looking at ways to deduplicate data and found [1], but does
proctime
> get committed with operators? How does this work against clock
skew on
> different machines?
>
> Thanks
>
> [1]
>
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication>
>
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#deduplication>>
>
> --
>
> Rex Fenley|Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/ <https://www.remind.com/>>|
BLOG <http://blog.remind.com/ <http://blog.remind.com/>> |
> FOLLOW US <https://twitter.com/remindhq
<https://twitter.com/remindhq>> | LIKE US
> <https://www.facebook.com/remindhq
<https://www.facebook.com/remindhq>>
>
--
Rex Fenley|Software Engineer - Mobile and Backend
Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |
FOLLOW US <https://twitter.com/remindhq> | LIKE US
<https://www.facebook.com/remindhq>