[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359841#comment-16359841 ] ASF GitHub Bot commented on FLINK-8477: --- Github user Aitozi commented on the issue: https://github.com/apache/flink/pull/5405 Hi @aljoscha , you have mentioned two points : 1. The events arrived may out of order in event-time processing 2. We can use windowFunction or ProcessWindowFunction to filter serverl window by specify the start time of window and the endtime. I have some differerent ideas: 1. when we deal with the out-of-order eventtime stream, we may specify the maxOutOfOrder to avoid the too much late elements skipped, so when the job restart/start the maxNumOfWindow to be skipped can be set to maxOutOfOrder/(the length of the thumbling window), So that the late elements will not produce incorrect results. The num of the window need to be skipped is according to the degree of the out of order 2. We need to skip the serveral broken window data , and we dont know which window is broken, we can just detect which window is first fired and the serval window after this is broken too. The num should very from the production (according to the maxOutOfOrder & the length of the window ) > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359835#comment-16359835 ] ASF GitHub Bot commented on FLINK-8477: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5405 I commented on the issue: https://issues.apache.org/jira/browse/FLINK-8477?focusedCommentId=16359834=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16359834 > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359834#comment-16359834 ] Aljoscha Krettek commented on FLINK-8477: - I think skipping windows by counting them is not a good choice for general cases, where you can have windows that are out of order if you have event-time processing. Specifying a number of windows to skip is also not very accurate when trying to hit a certain point in time but if it really must be done I think it could be done in a custom {{ProcessWindowFunction}} (or {{WindowFunction}}). Although, a better Idea would probably be to filter windows by a certain start timestamp in the window function. > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358297#comment-16358297 ] ASF GitHub Bot commented on FLINK-8477: --- Github user Aitozi commented on the issue: https://github.com/apache/flink/pull/5405 ping @aljoscha > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351172#comment-16351172 ] ASF GitHub Bot commented on FLINK-8477: --- Github user Aitozi commented on the issue: https://github.com/apache/flink/pull/5405 cc @aljoscha please help review this patch. ![image](https://user-images.githubusercontent.com/9486140/35761522-6e00f4b8-08c4-11e8-8063-7ec015802428.png) see the picture above, when user choose to use without a checkpoint to avoid catch up data after a crash , and use kafka#setStartFromLatest to consume the latest data. if use without the skip api , we can see that it can produce a broken data which may lead to the alert in monitor Scenario。if user want to skip the broken window, can hava a choice to skip serveral window after the first fire. > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351163#comment-16351163 ] ASF GitHub Bot commented on FLINK-8477: --- GitHub user Aitozi opened a pull request: https://github.com/apache/flink/pull/5405 [FLINK-8477][Window]Add api to support user to skip serval broken window In production, some application like monitor type , it need the accuarcy data,but in this scenario: if we start a job at 10:45:20s with a 1min window aggregate, we may produce a broken data of 10:45min ,so may lead to mistake. We can support a user api to choose to skip serveral windows to avoid the broken data by user self. ## Brief change log - add a streaming api You can merge this pull request into a Git repository by running: $ git pull https://github.com/Aitozi/flink FLINK-8477 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5405 commit 9c6b77077bac2e0dfa4ea3bddf11bd27831ba3e4 Author: minwenjunDate: 2018-02-02T15:46:11Z Add api to support user to skip serval broken window > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350403#comment-16350403 ] aitozi commented on FLINK-8477: --- [~aljoscha]In production, some application like monitor type , it need the accuarcy data,but in this scenario, if we start a job at 10:45:20s with a 1min window aggregate, we may produce a broken data of 10:45min ,so may lead to mistake. We can support a user api to choose to skip serveral windows to avoid the broken data by user self. > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data
[ https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337579#comment-16337579 ] Aljoscha Krettek commented on FLINK-8477: - Could you please provide some details about what this is about? > Add api to support for user to skip the first incomplete window data > > > Key: FLINK-8477 > URL: https://issues.apache.org/jira/browse/FLINK-8477 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0, 1.3.2 >Reporter: aitozi >Assignee: aitozi >Priority: Major > Fix For: 1.4.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)