Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 14:00, Dawid Wysakowicz wrote: > If an event arrived at WindowOperator before the Watermark, then it will > be accounted for window aggregation and put in state. Once that state gets > checkpointed this same event won't be processed again. In other words if a > checkpoint

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Dawid Wysakowicz
If an event arrived at WindowOperator before the Watermark, then it will be accounted for window aggregation and put in state. Once that state gets checkpointed this same event won't be processed again. In other words if a checkpoint succeeds elements that produced corresponding state won't be

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 13:36, Dawid Wysakowicz wrote: > It is definitely a solution ;) > > You should be aware of the downsides though: > >- you might get different results in case of reprocessing >- you might drop some data as late, due to some delays in processing, >if the events

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Dawid Wysakowicz
It is definitely a solution ;) You should be aware of the downsides though: * you might get different results in case of reprocessing * you might drop some data as late, due to some delays in processing, if the events arrive later then the "ProcessingTime" threshold Best, Dawid On

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
Yes, it was the "watermarks for event time when no events for that shard" problem. I am now investigating whether we can use a blended watermark of max(lastEventTimestamp - 1min, System.currentTimeMillis() - 5min) to ensure idle shards do not cause excessive data retention. Is that the best

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Dawid Wysakowicz
Hi Stephen, Watermark for a single operator is the minimum of Watermarks received from all inputs, therefore if one of your shards/operators does not have incoming data it will not produce Watermarks thus the Watermark of WindowOperator will not progress. So this is sort of an expected behavior.

Re: How to debug difference between Kinesis and Kafka

2019-02-20 Thread Congxian Qiu
Hi Stephen If the window has not been triggered ever, maybe you could investigate the watermark, maybe the doc[1][2] can be helpful. [1]  https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/windows.html#interaction-of-watermarks-and-windows [2] 

Re: EXT :Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Dian Fu
> > > From: Stephen Connolly [mailto:stephen.alan.conno...@gmail.com > <mailto:stephen.alan.conno...@gmail.com>] > Sent: Tuesday, February 19, 2019 6:32 AM > To: user mailto:user@flink.apache.org>> > Subject: EXT :Re: How to debug difference between Kinesis

Re: EXT :Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
> function isn’t getting data you have to watch out for this. > > > > *From:* Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] > *Sent:* Tuesday, February 19, 2019 6:32 AM > *To:* user > *Subject:* EXT :Re: How to debug difference between Kinesis and Kafka > >

RE: EXT :Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Martin, Nick
of a source function isn’t getting data you have to watch out for this. From: Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] Sent: Tuesday, February 19, 2019 6:32 AM To: user Subject: EXT :Re: How to debug difference between Kinesis and Kafka Hmmm my suspicions are now quite high. I created

Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hmmm my suspicions are now quite high. I created a file source that just replays the events straight then I get more results On Tue, 19 Feb 2019 at 11:50, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hmmm after expanding the dataset such that there was additional data that >

Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hmmm after expanding the dataset such that there was additional data that ended up on shard-0 (everything in my original dataset was coincidentally landing on shard-1) I am now getting output... should I expect this kind of behaviour if no data arrives at shard-0 ever? On Tue, 19 Feb 2019 at

How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hi, I’m having a strange situation and I would like to know where I should start trying to debug. I have set up a configurable swap in source, with three implementations: 1. A mock implementation 2. A Kafka consumer implementation 3. A Kinesis consumer implementation >From injecting a log and