Re: Handle late message with flink SQL

Timo Walther Tue, 16 Mar 2021 03:17:14 -0700

Hi,

your explanation makes sense but I'm wondering how the implementationwould look like. This would mean bigger changes in a Flink fork, right?

Late data handling in SQL is a frequently asked question. Currently, wedon't have a good way of supporting it. Usually, we recommend to useDataStream API before Table API for branching late events into aseparate processing pipeline.

Another idea (not well thought though) could be a temporal join at alater stage with a LookupTableSource that contains the late events toperform the "connect"?


Regards,
Timo


On 15.03.21 09:58, Yi Tang wrote:

We can get a stream from a DataStream api by SideOutput. But it's hard to do
the same thing with Flink SQL.

I have an idea about how to get the late records while using Flink SQL.

Assuming we have a source table for the late records, then we can query late
records on it. Obviously, it's not a real dynamic source table, it can be a
virtual source.

After optimizing, we can get a graph with some window aggregate nodes, which
can produced late records. And another graph for handling late records with
a virtual source node.

[scan] --> [group] --> [sink]

[virtual scan] --> [sink]

Then we can just "connect" these window nodes into the virtual source node.

The "connect" can be done by the following:

1. A side output node from each window node;
2. A mapper node may needed to encoding the record from the window node to
match the row type of virtual source;

[scan] --> [group] --> [sink]
                  \
                    --> [side output] --> [mapper] --> [sink]


Does it make sense? Or is there another way in progress for the similar
purpose?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Handle late message with flink SQL

Reply via email to