Francesco Guardiani created FLINK-24466:
-------------------------------------------

             Summary: Interval Join late events handling behaviour is not 
consistent
                 Key: FLINK-24466
                 URL: https://issues.apache.org/jira/browse/FLINK-24466
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Runtime
            Reporter: Francesco Guardiani
         Attachments: Fix_late_events_filtering_for_interval_join.patch

Interval Join handles late events emitting them in the output, as a padded row. 
This behavior is also tested extensively in {{RowTimeIntervalJoinTest}}.

The problem with this behavior is the way an event is considered "late" or not: 
in order to distinguish between the two, {{RowTimeIntervalJoin}} uses the 
{{ctx.timerService().currentWatermark()}} to find out if an event is later than 
the last received watermark or not. But that method returns the "combined" 
watermark across all the keys, partitions and *input streams*, that is if one 
of the two streams goes "slower" than the other one, the returned watermark is 
going to be the minimum among the two.

This means that our late events handling effectively works only if the two 
streams run "at the same pace", otherwise we'll just see what we consider _late 
events_ for one of the two streams as joined.

To observe this behavior, just run the test 
{{IntervalJoinITCase#testRowTimeInnerJoinWithEquiTimeAttrs}} in this revision 
https://github.com/apache/flink/commit/7033cbfe404bea1519d3342a611e2f92768d70f9 
several times and you'll see that after a couple of runs it fails, joining one 
of the {{"should-be-discarded"}} records. Those records are way behind the 
watermark - 1 second, as defined.

You'll find attached in the issue a small patch to show how this could be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to