Hi,

your explanation makes sense but I'm wondering how the implementation would look like. This would mean bigger changes in a Flink fork, right?

Late data handling in SQL is a frequently asked question. Currently, we don't have a good way of supporting it. Usually, we recommend to use DataStream API before Table API for branching late events into a separate processing pipeline.

Another idea (not well thought though) could be a temporal join at a later stage with a LookupTableSource that contains the late events to perform the "connect"?

Regards,
Timo


On 15.03.21 09:58, Yi Tang wrote:
We can get a stream from a DataStream api by SideOutput. But it's hard to do
the same thing with Flink SQL.

I have an idea about how to get the late records while using Flink SQL.

Assuming we have a source table for the late records, then we can query late
records on it. Obviously, it's not a real dynamic source table, it can be a
virtual source.

After optimizing, we can get a graph with some window aggregate nodes, which
can produced late records. And another graph for handling late records with
a virtual source node.

[scan] --> [group] --> [sink]

[virtual scan] --> [sink]

Then we can just "connect" these window nodes into the virtual source node.

The "connect" can be done by the following:

1. A side output node from each window node;
2. A mapper node may needed to encoding the record from the window node to
match the row type of virtual source;

[scan] --> [group] --> [sink]
                  \
                    --> [side output] --> [mapper] --> [sink]


Does it make sense? Or is there another way in progress for the similar
purpose?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply via email to