Re: GroupBy result delay

2019-07-24 Thread Hequn Cheng
Hi Fanbin, > 2. I have parallelism = 32 and only one task has the record. Can you please elaborate more on why this would affect the watermark advancement? Each parallel subtask of a source function usually generates its watermarks independently, say wk1, wk2... wkn. The downstream window

Re: GroupBy result delay

2019-07-24 Thread Fanbin Bu
Hequn, Thanks for the help. It is indeed a watermark problem. From Flink UI, I can see the low watermark value for each operator. And the groupBy operator has lagged value of watermark. I checked the link from SO and confirmed that: 1. I do see record coming in for this operator 2. I have

Re: GroupBy result delay

2019-07-23 Thread Hequn Cheng
Hi Fanbin, Fabian is right, it should be a watermark problem. Probably, some tasks of the source don't have enough data to advance the watermark. Furthermore, you could also monitor event time through Flink web interface. I have answered a similar question on stackoverflow, see more details

Re: GroupBy result delay

2019-07-23 Thread Fanbin Bu
If I use proctime, the groupBy happens without any delay. On Tue, Jul 23, 2019 at 10:16 AM Fanbin Bu wrote: > not sure whether this is related: > > public SingleOutputStreamOperator assignTimestampsAndWatermarks( > AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) { > >//

Re: GroupBy result delay

2019-07-23 Thread Fanbin Bu
not sure whether this is related: public SingleOutputStreamOperator assignTimestampsAndWatermarks( AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange // behaviour: the watermark will creep

Re: GroupBy result delay

2019-07-23 Thread Fanbin Bu
Thanks Fabian for the prompt reply. I just started using Flink and this is a great community. The watermark setting is only accounting for 10 sec delay. Besides that, the local job on IntelliJ is running fine without issues. Here is the code: class EventTimestampExtractor(slack: Long = 0L)

Re: GroupBy result delay

2019-07-23 Thread Fabian Hueske
Hi Fanbin, The delay is most likely caused by the watermark delay. A window is computed when the watermark passes the end of the window. If you configured the watermark to be 10 minutes before the current max timestamp (probably to account for out of order data), then the window will be computed

GroupBy result delay

2019-07-22 Thread Fanbin Bu
Hi, I have a Flink sql streaming job defined by: SELECT user_id , hop_end(created_at, interval '30' second, interval '1' minute) as bucket_ts , count(name) as count FROM event WHERE name = 'signin' GROUP BY user_id , hop(created_at, interval '30' second, interval '1' minute) there is