Hi Aljoscha, Thanks for your response. My use case is to track user trajectory based on page view event when they visit a website. The input would be like a list of PageView(userId, url, eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying SessionWindows with some event time trigger. Note we can't wait for the end of session window due to latency. Instead, we want to emit the user trajectories whenever a buffered PageView's event time is passed by watermark. I tried ContinuousEventTimeTrigger and a custom trigger which sets timer on each element's timestamp. For both triggers I've witnessed a problem like the following (e.g. a session gap of 5)
PageView("user1", "http://foo", 1) PageView("user1", "http://foo/bar", 2) Watermark(1) => emit UserTrajectory("user1", "http://foo -> *http://foo/bar <http://foo/bar>*", [1,6]) PageView("user1", "http://foo/bar/foobar", 5) Watermark(4) => emit UserTrajectory("user1", "http://foo -> http://foo/bar -> *http://foo/bar/foobar <http://foo/bar/foobar>*", [1, 10]) The urls in bold should be included since there could be events before them not arrived yet. Thanks, Manu On Tue, Oct 25, 2016 at 1:36 AM Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > with some additional information we might be able to figure this out > together. What specific combination of WindowAssigner/Trigger are you using > for your example and what is the input stream (including watermarks)? > > Cheers, > Aljoscha > > On Mon, 24 Oct 2016 at 06:30 Manu Zhang <owenzhang1...@gmail.com> wrote: > > Hi, > > Say I have a window state of List(("a", 1:00), ("b", 1:03), ("c", 1:06)) > which is triggered to emit when watermark passes the timestamp of an > element. For example, > > on watermark(1:01), List(("a", 1:00)) is emitted > on watermark(1:04), List(("a", 1:00), ("b", 1:03)) is emitted > on watermark(1:07), List(("a", 1:00), ("b", 1:03), ("c", 1:06)) is emitted > > It seems that if *("c", 1:06) is processed before watermark(1:04)* > List(("a", 1:00), ("b", 1:03), ("c", 1:06)) will be emitted on > watermark(1:04). This is incorrect since there could be elements with > timestamp between 1:04 and 1:06 that have not arrived yet. > > I guess this is because watermark trigger doesn't check whether element's > timestamp has been passed. > > Please correct me if any of the above is not right. > > Thanks, > Manu Zhang > > > >