Hi Fabian,
thanks for your very goog explanation. However, I don't exactly know how to increase the watermark by myself. Do you have an example for me? Do I have to override the getCurrentWatermark method?
Thanks,
Markus
Gesendet: Dienstag, 28. Februar 2017 um 20:36 Uhr
Von: "Fabian Hueske" <fhue...@gmail.com>
An: user@flink.apache.org
Betreff: Re: Combine two independant streams
Von: "Fabian Hueske" <fhue...@gmail.com>
An: user@flink.apache.org
Betreff: Re: Combine two independant streams
In event-time mode, operators compute their internal time from watermarks.
Depending on how watermarks are generated, their time only increases if records with later timestamps are processed. If no records arrive, no new watermarks are generated and the event-time does not increase.Since you want to reprocess offline data, you cannot use processing time which uses the wall clock time of the processing machines.
Instead you could use a custom periodic watermark that slowly increases the time even if no new data arrives. However, you should be careful, because this could also lead to late arriving events being dropped. The allowedLateness parameter can help to mitigate the problem.
Hope that helps,
Fabian
2017-02-28 18:50 GMT+01:00 Markus <klein.marku...@gmx.de>:
Hi Fabian,
yeah, that's basically it. The events window gets closed only when a newer event arrives (after 10 seconds window).
Can I tell Flink to close the event window at timeWindow.getEnd() even if no newer event arrives?
Thanks,
Markus
Am 28.02.17 um 17:19 schrieb Fabian Hueske:Hi Markus,I'm not sure I understood the issue with the second approach.
Is it that the stream of application events might be empty for some time such that its event time is not increasing?
Best, Fabian2017-02-28 17:02 GMT+01:00 Markus Klein <klein.marku...@gmx.de>:Hello Flink Community,
I have a question regarding combining two independant streams.
The first stream is a stream of events with metrics information. It occurs every 10 seconds. What I want is to join a second stream with events from an application. The result should be an event with the metrics and the events that happened the last 10 seconds.
So my first approacch was to generate an ID which will be increased after every metric event. This ID will be added to the application events and of cours for the current metric event. This works somehow good for live events but for recalculating past events the two streams have to start at the same point in event time.
The second approach was to generate a time window of 10 seconds for the application events and for the metrics and set the window end time as a key because flink windows end at e.g. 11:01:10, then 11:01:20 and so on. But this approach works only for past events because flink needs another application event to know that 10 seconds have passed for the application events window.
I hope you guys understand the problem. Is there a way two combine them in a nice way? I don't want to generate empty "heartbeats" for the application event stream.
Thanks for your help.
Greetings
Markus