[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359841#comment-16359841
 ] 

ASF GitHub Bot commented on FLINK-8477:
---

Github user Aitozi commented on the issue:

https://github.com/apache/flink/pull/5405
  
Hi @aljoscha , you have mentioned two points : 
1. The events arrived may out of order in event-time processing 
2. We can use windowFunction or ProcessWindowFunction to filter serverl 
window by specify the start time of window and the endtime.

I have some differerent ideas: 
1. when we deal with the out-of-order eventtime stream, we may specify the 
maxOutOfOrder to avoid the too much late elements skipped, so when the job 
restart/start the maxNumOfWindow to be skipped can be set to  
maxOutOfOrder/(the length of the thumbling window), So that the late elements 
will not produce incorrect results. The num of the window need to be skipped is 
according to the degree of the out of order
2. We need to skip the serveral broken window data , and we dont know which 
window is broken, we can just detect which window is first fired and the serval 
window after this is broken too. The num should very from the production 
(according to the maxOutOfOrder & the length of the window )


> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359835#comment-16359835
 ] 

ASF GitHub Bot commented on FLINK-8477:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5405
  
I commented on the issue: 
https://issues.apache.org/jira/browse/FLINK-8477?focusedCommentId=16359834=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16359834


> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-10 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359834#comment-16359834
 ] 

Aljoscha Krettek commented on FLINK-8477:
-

I think skipping windows by counting them is not a good choice for general 
cases, where you can have windows that are out of order if you have event-time 
processing. Specifying a number of windows to skip is also not very accurate  
when trying to hit a certain point in time but if it really must be done I 
think it could be done in a custom {{ProcessWindowFunction}} (or 
{{WindowFunction}}). Although, a better Idea would probably be to filter 
windows by a certain start timestamp in the window function.



> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358297#comment-16358297
 ] 

ASF GitHub Bot commented on FLINK-8477:
---

Github user Aitozi commented on the issue:

https://github.com/apache/flink/pull/5405
  
ping @aljoscha 


> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351172#comment-16351172
 ] 

ASF GitHub Bot commented on FLINK-8477:
---

Github user Aitozi commented on the issue:

https://github.com/apache/flink/pull/5405
  
cc @aljoscha please help review this patch.

![image](https://user-images.githubusercontent.com/9486140/35761522-6e00f4b8-08c4-11e8-8063-7ec015802428.png)
see the picture above, when user choose to use without a checkpoint to 
avoid catch up data after a crash , and use kafka#setStartFromLatest to consume 
the latest data. if use without the skip api , we can see that it can  produce 
a broken data which may lead to the alert in monitor Scenario。if user want to 
skip the broken window, can hava a choice to skip serveral window after the 
first fire.



> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351163#comment-16351163
 ] 

ASF GitHub Bot commented on FLINK-8477:
---

GitHub user Aitozi opened a pull request:

https://github.com/apache/flink/pull/5405

[FLINK-8477][Window]Add api to support user to skip serval broken window

In production, some application like monitor type , it need the accuarcy 
data,but in this scenario: if we start a job at 10:45:20s with a 1min window 
aggregate, we may produce a broken data of 10:45min ,so may lead to mistake. We 
can support a user api to choose to skip serveral windows to avoid the broken 
data by user self.

## Brief change log

  - add a streaming api 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Aitozi/flink FLINK-8477

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5405


commit 9c6b77077bac2e0dfa4ea3bddf11bd27831ba3e4
Author: minwenjun 
Date:   2018-02-02T15:46:11Z

Add api to support user to skip serval broken window




> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-02-02 Thread aitozi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350403#comment-16350403
 ] 

aitozi commented on FLINK-8477:
---

[~aljoscha]In production, some application like monitor type , it need the 
accuarcy data,but in this scenario, if we start a job at 10:45:20s with a 1min 
window aggregate, we may produce a broken data of 10:45min ,so may lead to 
mistake. We can support a user api to choose to skip serveral windows to avoid 
the broken data by user self.

> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8477) Add api to support for user to skip the first incomplete window data

2018-01-24 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337579#comment-16337579
 ] 

Aljoscha Krettek commented on FLINK-8477:
-

Could you please provide some details about what this is about?

> Add api to support for user to skip the first incomplete window data
> 
>
> Key: FLINK-8477
> URL: https://issues.apache.org/jira/browse/FLINK-8477
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: aitozi
>Assignee: aitozi
>Priority: Major
> Fix For: 1.4.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)