Coming back to this after a while. What is the place were you would have expected such a note? Unfortunately the documentation about watermarks and process function is a bit spread across the documentation.
If you could point me to where you would expect it that would be very helpful. Best, Aljoscha > On 29. Aug 2017, at 10:22, Gwenhael Pasquiers > <gwenhael.pasqui...@ericsson.com> wrote: > > Hi, > > I'm late but thanks for your answer, anyway we made a special case for the > first watermark (if(watermark == Long.MIN_VALUE)...) > > At least we now know that we did not made anything wrong. Maybe that special > case of the first watermark is worth mentioning in your documentation ? > > -----Original Message----- > From: Aljoscha Krettek [mailto:aljos...@apache.org] > Sent: mardi 8 août 2017 16:50 > To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com> > Cc: Nico Kruber <n...@data-artisans.com>; user@flink.apache.org > Subject: Re: Event-time and first watermark > > I see. But yes, even in the case the watermark will always be "one behind". > The logic in the extraction operator is roughly this: > > 1. Extract timestamp T, assign to internal StreamRecord 2. Send StreamRecord > downstream 3. Extract Watermark W 4. Send Watermark downstream > > (In your case T == W) > > The reason is that a watermark T says that there will not be an element with > a timestamp <= T in the future. If the watermark were sent before the record > then this would violate the watermark contract, i.e. your element with > timestamp T would arrive after the watermark W. I think it's not easily > possible to have a properly defined watermark for the very first element in a > stream, unfortunately. > > Best, > Aljoscha >> On 4. Aug 2017, at 16:43, Gwenhael Pasquiers >> <gwenhael.pasqui...@ericsson.com> wrote: >> >> We're using a AssignerWithPunctuatedWatermarks that extracts a timestamp >> from the data. It keeps and returns the higher timestamp it has ever seen >> and returns a new Watermark everytime the value grows. >> >> I know it's bad for performances, but for the moment it's not the issue, i >> want the most possibly up-to-date watermark. >> >> -----Original Message----- >> From: Aljoscha Krettek [mailto:aljos...@apache.org] >> Sent: vendredi 4 août 2017 12:22 >> To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com> >> Cc: Nico Kruber <n...@data-artisans.com>; user@flink.apache.org >> Subject: Re: Event-time and first watermark >> >> Hi, >> >> How are you defining the watermark, i.e. what kind of watermark extractor >> are you using? >> >> Best, >> Aljoscha >> >>> On 3. Aug 2017, at 17:45, Gwenhael Pasquiers >>> <gwenhael.pasqui...@ericsson.com> wrote: >>> >>> We're not using a Window but a more basic ProcessFunction to handle >>> sessions. We made this choice because we have to handle (millions of) >>> sessions that can last from 10 seconds to 24 hours so we wanted to handle >>> things manually using the State class. >>> >>> We're using the watermark as an event-time "clock" to: >>> * compute "lateness" of a message relatively to the watermark (most >>> recent message from the stream) >>> * fire timer events >>> >>> We're using event-time instead of processing time because our stream will >>> be late and data arrive by hourly bursts. >>> >>> Maybe we're misusing the watermark ? >>> >>> B.R. >>> >>> -----Original Message----- >>> From: Nico Kruber [mailto:n...@data-artisans.com] >>> Sent: jeudi 3 août 2017 16:30 >>> To: user@flink.apache.org >>> Cc: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com> >>> Subject: Re: Event-time and first watermark >>> >>> Hi Gwenhael, >>> "A Watermark(t) declares that event time has reached time t in that >>> stream, meaning that there should be no more elements from the stream >>> with a timestamp t’ <= t (i.e. events with timestamps older or equal >>> to the watermark)." [1] >>> >>> Therefore, they should be behind the actual event with timestamp t. >>> >>> What is it that you want to achieve in the end? What do you want to use the >>> watermark for? They are basically a means to defining when an event time >>> window ends. >>> >>> >>> Nico >>> >>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/ >>> event_time.html#event-time-and-watermarks >>> >>> On Thursday, 3 August 2017 10:24:35 CEST Gwenhael Pasquiers wrote: >>>> Hi, >>>> >>>> From my tests it seems that the initial watermark value is >>>> Long.MIN_VALUE even though my first data passed through the >>>> timestamp extractor before arriving into my ProcessFunction. It >>>> looks like the watermark "lags" behind the data by one message. >>>> >>>> Is there a way to have a watermark more "up to date" ? Or is the >>>> only way to compute it myself into my ProcessFunction ? >>>> >>>> Thanks. >>> >> >