Re: Flink - Handling late events - main vs late window firing

2017-08-15 Thread Aljoscha Krettek
I would say so, yes. But I don't consider ProessWindowFunction to be low-level, 
it's just the function that should be used for processing windows if you need 
more information about context.

Best,
Aljoscha

> On 14. Aug 2017, at 22:53, M Singh  wrote:
> 
> Thanks Aljoscha for your response.
> 
> Just to clarify - the only way to handle the duplication scenario properly is 
> by using the ProcessWindowFunction - there is no high level function for this.
> 
> Thanks again.
> 
> 
> On Wednesday, August 9, 2017 6:26 AM, Aljoscha Krettek  
> wrote:
> 
> 
> Hi,
> 
> 1. You could use a ProcessWindowFunction instead of a WindowFunction. In 
> there, you can query the current watermark and thus determine why the firing 
> is happening. Also, in a ProcessWindowFunction you can keep per-window state, 
> this would allow you to keep a bit of state that can tell you whether this is 
> the first firing for a given window or the number of firings so far.
> 
> 2. This depends on whether the Trigger is purging or not. The default 
> EventTimeTrigger is not purging, meaning that all elements in the window will 
> be preserved after firing (until the watermark reaches the end of the window 
> plus the allowed lateness). You can turn this into a purging trigger using 
> PurgingTrigger.of(EventTimeTrigger.create()). You would specify this using 
> .trigger() on WindowedStream when constructing your windowed operation.
> 
> 3. It doesn't, you have to manually keep state in a ProcessWindowFunction to 
> distinguish between different cases, as mentioned above.
> 
> 4. Currently, I think there are no examples because this depends to a large 
> degree on the specifics of the application. I'm afraid.
> 
> Best,
> Aljoscha
> 
>> On 6. Aug 2017, at 18:45, M Singh > > wrote:
>> 
>> Hi Folks:
>> 
>> I am going through flink documentation and it states the following:
>> 
>> "You should be aware that the elements emitted by a late firing should be 
>> treated as updated results of a previous computation, i.e., your data stream 
>> will contain multiple results for the same computation. Depending on your 
>> application, you need to take these duplicated results into account or 
>> deduplicate them."
>> 
>> I wanted to find out the following:
>> 
>> 1. How do we distinguish the late firing from the main firing ?
>> 2. Does the late firing including all events or only late events ?
>> 3. How does the late vs main firing affect the associated window function ?
>> 4. Are there any examples of how to handle these events and deduplication 
>> mentioned in the documentation ?
>> 
>> Thanks for your help.
>> 
>> Mans
> 
> 
> 



Re: Flink - Handling late events - main vs late window firing

2017-08-14 Thread M Singh
Thanks Aljoscha for your response.
Just to clarify - the only way to handle the duplication scenario properly is 
by using the ProcessWindowFunction - there is no high level function for this.

Thanks again. 

On Wednesday, August 9, 2017 6:26 AM, Aljoscha Krettek 
 wrote:
 

 Hi,
1. You could use a ProcessWindowFunction instead of a WindowFunction. In there, 
you can query the current watermark and thus determine why the firing is 
happening. Also, in a ProcessWindowFunction you can keep per-window state, this 
would allow you to keep a bit of state that can tell you whether this is the 
first firing for a given window or the number of firings so far.
2. This depends on whether the Trigger is purging or not. The default 
EventTimeTrigger is not purging, meaning that all elements in the window will 
be preserved after firing (until the watermark reaches the end of the window 
plus the allowed lateness). You can turn this into a purging trigger using 
PurgingTrigger.of(EventTimeTrigger.create()). You would specify this using 
.trigger() on WindowedStream when constructing your windowed operation.
3. It doesn't, you have to manually keep state in a ProcessWindowFunction to 
distinguish between different cases, as mentioned above.
4. Currently, I think there are no examples because this depends to a large 
degree on the specifics of the application. I'm afraid.
Best,Aljoscha

On 6. Aug 2017, at 18:45, M Singh  wrote:
Hi Folks:
I am going through flink documentation and it states the following:
"You should be aware that the elements emitted by a late firing should be 
treated as updated results of a previous computation, i.e., your data stream 
will contain multiple results for the same computation. Depending on your 
application, you need to take these duplicated results into account or 
deduplicate them."

I wanted to find out the following:
1. How do we distinguish the late firing from the main firing ?2. Does the late 
firing including all events or only late events ?3. How does the late vs main 
firing affect the associated window function ?
4. Are there any examples of how to handle these events and deduplication 
mentioned in the documentation ?
Thanks for your help.
Mans



   

Re: Flink - Handling late events - main vs late window firing

2017-08-09 Thread Aljoscha Krettek
Hi,

1. You could use a ProcessWindowFunction instead of a WindowFunction. In there, 
you can query the current watermark and thus determine why the firing is 
happening. Also, in a ProcessWindowFunction you can keep per-window state, this 
would allow you to keep a bit of state that can tell you whether this is the 
first firing for a given window or the number of firings so far.

2. This depends on whether the Trigger is purging or not. The default 
EventTimeTrigger is not purging, meaning that all elements in the window will 
be preserved after firing (until the watermark reaches the end of the window 
plus the allowed lateness). You can turn this into a purging trigger using 
PurgingTrigger.of(EventTimeTrigger.create()). You would specify this using 
.trigger() on WindowedStream when constructing your windowed operation.

3. It doesn't, you have to manually keep state in a ProcessWindowFunction to 
distinguish between different cases, as mentioned above.

4. Currently, I think there are no examples because this depends to a large 
degree on the specifics of the application. I'm afraid.

Best,
Aljoscha

> On 6. Aug 2017, at 18:45, M Singh  wrote:
> 
> Hi Folks:
> 
> I am going through flink documentation and it states the following:
> 
> "You should be aware that the elements emitted by a late firing should be 
> treated as updated results of a previous computation, i.e., your data stream 
> will contain multiple results for the same computation. Depending on your 
> application, you need to take these duplicated results into account or 
> deduplicate them."
> 
> I wanted to find out the following:
> 
> 1. How do we distinguish the late firing from the main firing ?
> 2. Does the late firing including all events or only late events ?
> 3. How does the late vs main firing affect the associated window function ?
> 4. Are there any examples of how to handle these events and deduplication 
> mentioned in the documentation ?
> 
> Thanks for your help.
> 
> Mans