Re: Apache Flink - Will later events from a table with watermark be propagated and when can CURRENT_WATERMARK be null in that situation

2022-02-12 Thread M Singh
 Thanks David for your insights.  Mans
On Saturday, February 12, 2022, 05:26:29 AM EST, David Anderson 
 wrote:  
 
 Flink uses watermarks to indicate when a stream has become complete up through 
some point in time. Various operations on streams wait for watermarks in order 
to know when they can safely stop waiting for further input, and so go ahead 
and produce their results. These operations include event-time windowing, 
interval and temporal joins, pattern matching, and sorting (by timestamp).
Events that are late have timestamps less than equal to the current watermark. 
They have missed their chance to influence the results of those operations that 
rely on watermarks for triggering. But otherwise, Flink doesn't care if events 
are late or not. It's not that late events are automatically dropped in all 
circumstances -- it's just that these temporal operations won't wait long 
enough to accommodate their extreme out-of-order-ness (lateness). 
So yes, your ALL_EVENTS view will contain all of the events, including late 
ones.
When your job starts running, it takes some time for an initial watermark to be 
produced. During that period of time, the current watermark is NULL, and no 
events will be considered late.
Hope this helps clarify things.
Regards,David
On Sat, Feb 12, 2022 at 12:01 AM M Singh  wrote:

 I thought a little more about your references Martijn and wanted to confirm 
one thing - the table is specifying the watermark and the downstream view needs 
to check if it wants all events or only the non-late events.  Please let my 
understanding is correct.  
Thanks again for your references.
Mans
On Friday, February 11, 2022, 05:02:49 PM EST, M Singh 
 wrote:  
 
  
Hi Martijn:
Thanks for the reference.   
My understanding was that if we use watermark then any event with event time 
(in the above example) < event_time - 30 seconds will be dropped automatically. 
 
My question [1] is will the downstream (ALL_EVENTS) view which is selecting the 
events from the table receive events which are late ?  If late events are 
dropped at the table level then do we still need the second predicate check (ts 
> CURRENT_WATERMARK(ts)) to filter out late events at the view level.  
If the table does not drop late events, then will all downstream views/etc need 
to add this check (ts > CURRENT_WATERMARK(ts)) ?
I am still not clear on this concept of whether downstream view need to check 
for late events with this predicate or will they never receive late events.
Thanks again for your time.

On Friday, February 11, 2022, 01:55:09 PM EST, Martijn Visser 
 wrote:  
 
 Hi,
There's a Flink SQL Cookbook recipe on CURRENT_WATERMARK, I think this would 
cover your questions [1].
Best regards,
Martijn
[1] 
https://github.com/ververica/flink-sql-cookbook/blob/main/other-builtin-functions/03_current_watermark/03_current_watermark.md
On Fri, 11 Feb 2022 at 16:45, M Singh  wrote:

Hi:
The flink docs 
(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/)
 indicates that the CURRENT_WATERMARK(rowtime) can return null:

Note that this function can return NULL, and you may have to consider this 
case. For example, if you want to filter out late data you can use:
WHERE
  CURRENT_WATERMARK(ts) IS NULL
  OR ts > CURRENT_WATERMARK(ts)
I have the following questions that if the table is defined with a watermark eg:
CREATE TABLE `MYEVENTS` (`name` STRING,    `event_time` TIMESTAMP_LTZ(3), 
...WATERMARK FOR event_time AS event_time - INTERVAL '30' SECONDS)WITH (...)

1. If we define the water mark as above, will the late events still be 
propagated to a view or table which is selecting from MYEVENTS table:
CREATE TEMPORARY VIEW `ALL_EVENTS` AS    SELECT * FROM MYEVENTS; 
2. Can CURRENT_WATERMARK(event_time) still return null ?  If so, what are the 
conditions for returning null ?


Thanks
  



  

Re: Apache Flink - Will later events from a table with watermark be propagated and when can CURRENT_WATERMARK be null in that situation

2022-02-12 Thread David Anderson
Flink uses watermarks to indicate when a stream has become complete up
through some point in time. Various operations on streams wait for
watermarks in order to know when they can safely stop waiting for
further input, and so go ahead and produce their results. These
operations include event-time windowing, interval and temporal joins,
pattern matching, and sorting (by timestamp).

Events that are late have timestamps less than equal to the current
watermark. They have missed their chance to influence the results of those
operations that rely on watermarks for triggering. But otherwise, Flink
doesn't care if events are late or not. It's not that late events are
automatically dropped in all circumstances -- it's just that these temporal
operations won't wait long enough to accommodate their extreme
out-of-order-ness (lateness).

So yes, your ALL_EVENTS view will contain all of the events, including late
ones.

When your job starts running, it takes some time for an initial watermark
to be produced. During that period of time, the current watermark is NULL,
and no events will be considered late.

Hope this helps clarify things.

Regards,
David

On Sat, Feb 12, 2022 at 12:01 AM M Singh  wrote:

> I thought a little more about your references Martijn and wanted to
> confirm one thing - the table is specifying the watermark and the
> downstream view needs to check if it wants all events or only the non-late
> events.  Please let my understanding is correct.
>
> Thanks again for your references.
>
> Mans
>
> On Friday, February 11, 2022, 05:02:49 PM EST, M Singh <
> mans2si...@yahoo.com> wrote:
>
>
>
> Hi Martijn:
>
> Thanks for the reference.
>
> My understanding was that if we use watermark then any event with event
> time (in the above example) < event_time - 30 seconds will be dropped
> automatically.
>
> My question [1] is will the downstream (ALL_EVENTS) view which is
> selecting the events from the table receive events which are late ?  If
> late events are dropped at the table level then do we still need the second
> predicate check (ts > CURRENT_WATERMARK(ts)) to filter out late events at
> the view level.
>
> If the table does not drop late events, then will all downstream views/etc
> need to add this check (ts > CURRENT_WATERMARK(ts)) ?
>
> I am still not clear on this concept of whether downstream view need to
> check for late events with this predicate or will they never receive late
> events.
>
> Thanks again for your time.
>
>
> On Friday, February 11, 2022, 01:55:09 PM EST, Martijn Visser <
> mart...@ververica.com> wrote:
>
>
> Hi,
>
> There's a Flink SQL Cookbook recipe on CURRENT_WATERMARK, I think this
> would cover your questions [1].
>
> Best regards,
>
> Martijn
>
> [1]
> https://github.com/ververica/flink-sql-cookbook/blob/main/other-builtin-functions/03_current_watermark/03_current_watermark.md
>
> On Fri, 11 Feb 2022 at 16:45, M Singh  wrote:
>
> Hi:
>
> The flink docs (
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/)
> indicates that the CURRENT_WATERMARK(rowtime) can return null:
>
> Note that this function can return NULL, and you may have to consider
> this case. For example, if you want to filter out late data you can use:
>
> WHERE
>   CURRENT_WATERMARK(ts) IS NULL
>   OR ts > CURRENT_WATERMARK(ts)
>
>
> I have the following questions that if the table is defined with a
> watermark eg:
>
> CREATE TABLE `MYEVENTS` (
> `name` STRING,
> `event_time` TIMESTAMP_LTZ(3),
>  ...
> WATERMARK FOR event_time AS event_time - INTERVAL '30' SECONDS)
> WITH (...)
>
>
> 1. If we define the water mark as above, will the late events still be
> propagated to a view or table which is selecting from MYEVENTS table:
>
> CREATE TEMPORARY VIEW `ALL_EVENTS` AS
> SELECT * FROM MYEVENTS;
>
> 2. Can CURRENT_WATERMARK(event_time) still return null ?  If so, what are
> the conditions for returning null ?
>
>
>
> Thanks
>
>
>
>


Re: Apache Flink - Will later events from a table with watermark be propagated and when can CURRENT_WATERMARK be null in that situation

2022-02-11 Thread M Singh
 I thought a little more about your references Martijn and wanted to confirm 
one thing - the table is specifying the watermark and the downstream view needs 
to check if it wants all events or only the non-late events.  Please let my 
understanding is correct.  
Thanks again for your references.
Mans
On Friday, February 11, 2022, 05:02:49 PM EST, M Singh 
 wrote:  
 
  
Hi Martijn:
Thanks for the reference.   
My understanding was that if we use watermark then any event with event time 
(in the above example) < event_time - 30 seconds will be dropped automatically. 
 
My question [1] is will the downstream (ALL_EVENTS) view which is selecting the 
events from the table receive events which are late ?  If late events are 
dropped at the table level then do we still need the second predicate check (ts 
> CURRENT_WATERMARK(ts)) to filter out late events at the view level.  
If the table does not drop late events, then will all downstream views/etc need 
to add this check (ts > CURRENT_WATERMARK(ts)) ?
I am still not clear on this concept of whether downstream view need to check 
for late events with this predicate or will they never receive late events.
Thanks again for your time.

On Friday, February 11, 2022, 01:55:09 PM EST, Martijn Visser 
 wrote:  
 
 Hi,
There's a Flink SQL Cookbook recipe on CURRENT_WATERMARK, I think this would 
cover your questions [1].
Best regards,
Martijn
[1] 
https://github.com/ververica/flink-sql-cookbook/blob/main/other-builtin-functions/03_current_watermark/03_current_watermark.md
On Fri, 11 Feb 2022 at 16:45, M Singh  wrote:

Hi:
The flink docs 
(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/)
 indicates that the CURRENT_WATERMARK(rowtime) can return null:

Note that this function can return NULL, and you may have to consider this 
case. For example, if you want to filter out late data you can use:
WHERE
  CURRENT_WATERMARK(ts) IS NULL
  OR ts > CURRENT_WATERMARK(ts)
I have the following questions that if the table is defined with a watermark eg:
CREATE TABLE `MYEVENTS` (`name` STRING,    `event_time` TIMESTAMP_LTZ(3), 
...WATERMARK FOR event_time AS event_time - INTERVAL '30' SECONDS)WITH (...)

1. If we define the water mark as above, will the late events still be 
propagated to a view or table which is selecting from MYEVENTS table:
CREATE TEMPORARY VIEW `ALL_EVENTS` AS    SELECT * FROM MYEVENTS; 
2. Can CURRENT_WATERMARK(event_time) still return null ?  If so, what are the 
conditions for returning null ?


Thanks
  




Re: Apache Flink - Will later events from a table with watermark be propagated and when can CURRENT_WATERMARK be null in that situation

2022-02-11 Thread M Singh
 
Hi Martijn:
Thanks for the reference.   
My understanding was that if we use watermark then any event with event time 
(in the above example) < event_time - 30 seconds will be dropped automatically. 
 
My question [1] is will the downstream (ALL_EVENTS) view which is selecting the 
events from the table receive events which are late ?  If late events are 
dropped at the table level then do we still need the second predicate check (ts 
> CURRENT_WATERMARK(ts)) to filter out late events at the view level.  
If the table does not drop late events, then will all downstream views/etc need 
to add this check (ts > CURRENT_WATERMARK(ts)) ?
I am still not clear on this concept of whether downstream view need to check 
for late events with this predicate or will they never receive late events.
Thanks again for your time.

On Friday, February 11, 2022, 01:55:09 PM EST, Martijn Visser 
 wrote:  
 
 Hi,
There's a Flink SQL Cookbook recipe on CURRENT_WATERMARK, I think this would 
cover your questions [1].
Best regards,
Martijn
[1] 
https://github.com/ververica/flink-sql-cookbook/blob/main/other-builtin-functions/03_current_watermark/03_current_watermark.md
On Fri, 11 Feb 2022 at 16:45, M Singh  wrote:

Hi:
The flink docs 
(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/)
 indicates that the CURRENT_WATERMARK(rowtime) can return null:

Note that this function can return NULL, and you may have to consider this 
case. For example, if you want to filter out late data you can use:
WHERE
  CURRENT_WATERMARK(ts) IS NULL
  OR ts > CURRENT_WATERMARK(ts)
I have the following questions that if the table is defined with a watermark eg:
CREATE TABLE `MYEVENTS` (`name` STRING,    `event_time` TIMESTAMP_LTZ(3), 
...WATERMARK FOR event_time AS event_time - INTERVAL '30' SECONDS)WITH (...)

1. If we define the water mark as above, will the late events still be 
propagated to a view or table which is selecting from MYEVENTS table:
CREATE TEMPORARY VIEW `ALL_EVENTS` AS    SELECT * FROM MYEVENTS; 
2. Can CURRENT_WATERMARK(event_time) still return null ?  If so, what are the 
conditions for returning null ?


Thanks
  


  

Re: Apache Flink - Will later events from a table with watermark be propagated and when can CURRENT_WATERMARK be null in that situation

2022-02-11 Thread Martijn Visser
Hi,

There's a Flink SQL Cookbook recipe on CURRENT_WATERMARK, I think this
would cover your questions [1].

Best regards,

Martijn

[1]
https://github.com/ververica/flink-sql-cookbook/blob/main/other-builtin-functions/03_current_watermark/03_current_watermark.md

On Fri, 11 Feb 2022 at 16:45, M Singh  wrote:

> Hi:
>
> The flink docs (
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/)
> indicates that the CURRENT_WATERMARK(rowtime) can return null:
>
> Note that this function can return NULL, and you may have to consider
> this case. For example, if you want to filter out late data you can use:
>
> WHERE
>   CURRENT_WATERMARK(ts) IS NULL
>   OR ts > CURRENT_WATERMARK(ts)
>
>
> I have the following questions that if the table is defined with a
> watermark eg:
>
> CREATE TABLE `MYEVENTS` (
> `name` STRING,
> `event_time` TIMESTAMP_LTZ(3),
>  ...
> WATERMARK FOR event_time AS event_time - INTERVAL '30' SECONDS)
> WITH (...)
>
>
> 1. If we define the water mark as above, will the late events still be
> propagated to a view or table which is selecting from MYEVENTS table:
>
> CREATE TEMPORARY VIEW `ALL_EVENTS` AS
> SELECT * FROM MYEVENTS;
>
> 2. Can CURRENT_WATERMARK(event_time) still return null ?  If so, what are
> the conditions for returning null ?
>
>
>
> Thanks
>
>
>
>


Apache Flink - Will later events from a table with watermark be propagated and when can CURRENT_WATERMARK be null in that situation

2022-02-11 Thread M Singh
Hi:
The flink docs 
(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/)
 indicates that the CURRENT_WATERMARK(rowtime) can return null:

Note that this function can return NULL, and you may have to consider this 
case. For example, if you want to filter out late data you can use:
WHERE
  CURRENT_WATERMARK(ts) IS NULL
  OR ts > CURRENT_WATERMARK(ts)
I have the following questions that if the table is defined with a watermark eg:
CREATE TABLE `MYEVENTS` (`name` STRING,    `event_time` TIMESTAMP_LTZ(3), 
...WATERMARK FOR event_time AS event_time - INTERVAL '30' SECONDS)WITH (...)

1. If we define the water mark as above, will the late events still be 
propagated to a view or table which is selecting from MYEVENTS table:
CREATE TEMPORARY VIEW `ALL_EVENTS` AS    SELECT * FROM MYEVENTS; 
2. Can CURRENT_WATERMARK(event_time) still return null ?  If so, what are the 
conditions for returning null ?


Thanks