Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Fabian Hueske
Awesome! I've given you contributor permissions and assigned FLINK-9183 to you. With the permissions you can also do that yourself in the future. Here's a guide for contributions to the documentation [1]. Best, Fabian [1] http://flink.apache.org/contribute-documentation.html 2018-04-16 15:38

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Juho Autio
Great. I'd be happy to contribute. I added 2 sub-tasks in https://issues.apache.org/jira/browse/FLINK-5479. Someone with the privileges could assign this sub-task to me: https://issues.apache.org/jira/browse/FLINK-9183? On Mon, Apr 16, 2018 at 3:14 PM, Fabian Hueske wrote: >

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Fabian Hueske
Fully agree Juho! Do you want to contribute the docs fix? If yes, we should update FLINK-5479 to make sure that the warning is removed once the bug is fixed. Thanks, Fabian 2018-04-12 9:32 GMT+02:00 Juho Autio : > Looks like the bug

Re: Kafka consumer to sync topics by event time?

2018-04-12 Thread Juho Autio
Looks like the bug https://issues.apache.org/jira/browse/FLINK-5479 is entirely preventing this feature to be used if there are any idle partitions. It would be nice to mention in documentation that currently this requires all subscribed partitions to have a constant stream of data with growing

Re: Kafka consumer to sync topics by event time?

2017-12-04 Thread Fabian Hueske
You are right, offsets cannot be used for tracking processing progress. I think setting Kafka offsets with respect to some progress notion other than "has been consumed" would be highly application specific and hard to generalize. As you said, there might be a window (such as a session window)

Re: Kafka consumer to sync topics by event time?

2017-12-04 Thread Juho Autio
Thank you Fabian. Really clear explanation. That matches with my observation indeed (data is not dropped from either small or big topic, but the offsets are advancing in kafka side already before those offsets have been triggered from a window operator). This means that it's a bit harder to

Re: Kafka consumer to sync topics by event time?

2017-12-04 Thread Fabian Hueske
Hi Juho, the partitions of both topics are independently consumed, i.e., at their own speed without coordination. With the configuration that Gordon linked, watermarks are generated per partition. Each source task maintains the latest (and highest) watermark per partition and propagates the

Re: Kafka consumer to sync topics by event time?

2017-12-01 Thread Juho Autio
Thanks for the answers, I still don't understand why I can see the offsets being quickly committed to Kafka for the "small topic"? Are they committed to Kafka before their watermark has passed on Flink's side? That would be quite confusing.. Indeed when Flink handles the state/offsets internally,

Re: Kafka consumer to sync topics by event time?

2017-11-22 Thread Tzu-Li (Gordon) Tai
Hi! The FlinkKafkaConsumer can handle watermark advancement with per-Kafka-partition awareness (across partitions of different topics). You can see an example of how to do that here [1]. Basically what this does is that it generates watermarks within the Kafka consumer individually for each

Re: Kafka consumer to sync topics by event time?

2017-11-22 Thread Kien Truong
Hi, When you join multiple stream with different watermarks, the resulting stream's watermark will be the smallest of the input watermark, as long as you don't explicitly assign a new watermarks generator. In your example, if small_topic has watermark at time t1, big_topic has watermark

Kafka consumer to sync topics by event time?

2017-11-22 Thread Juho Autio
I would like to understand how FlinkKafkaConsumer treats "unbalanced" topics. We're using FlinkKafkaConsumer010 with 2 topics, say "small_topic" & "big_topic". After restoring from an old savepoint (4 hours before), I checked the consumer offsets on Kafka (Flink commits offsets to kafka for