[ https://issues.apache.org/jira/browse/FLINK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Weise resolved FLINK-10886. ---------------------------------- Resolution: Done > Event time synchronization across sources > ----------------------------------------- > > Key: FLINK-10886 > URL: https://issues.apache.org/jira/browse/FLINK-10886 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common > Reporter: Jamie Grier > Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor, > auto-unassigned > Original Estimate: 336h > Remaining Estimate: 336h > > When reading from a source with many parallel partitions, especially when > reading lots of historical data (or recovering from downtime and there is a > backlog to read), it's quite common for there to develop an event-time skew > across those partitions. > > When doing event-time windowing -- or in fact any event-time driven > processing -- the event time skew across partitions results directly in > increased buffering in Flink and of course the corresponding state/checkpoint > size growth. > > As the event-time skew and state size grows larger this can have a major > effect on application performance and in some cases result in a "death > spiral" where the application performance get's worse and worse as the state > size grows and grows. > > So, one solution to this problem, outside of core changes in Flink itself, > seems to be to try to coordinate sources across partitions so that they make > progress through event time at roughly the same rate. In fact if there is > large skew the idea would be to slow or even stop reading from some > partitions with newer data while first reading the partitions with older > data. Anyway, to do this we need to share state somehow amongst sub-tasks. > -- This message was sent by Atlassian Jira (v8.20.10#820010)