Re: Kafka consumer are too fast for some partitions in "flatMap" like jobs

2017-08-30 Thread Elias Levy
On Wed, Aug 30, 2017 at 11:50 AM, Oleksandr Baliev < aleksanderba...@gmail.com> wrote: > > So the main question is how to synchronize data reading between kafka > partitions when data is sequential per partitions, but late for some of > them and we care about that data is not thrown away and will

Re: Kafka consumer are too fast for some partitions in "flatMap" like jobs

2017-08-30 Thread Oleksandr Baliev
Hi Elias, Thanks for reply, TOPIC_OUT has less partitions, ~20, but actually there are 4 output topics with different amount of partitions. So the Job is kind of router. In general to have 1:1 partitions for IN and OUT topics is good, thanks for tip. But since the main goal is to have windows in

Re: Kafka consumer are too fast for some partitions in "flatMap" like jobs

2017-08-29 Thread Elias Levy
How many partitions does the output topic have? If it has the same number of partitions as the input topic (30), have you considered simply using a custom partitioner for the Kafka sink that uses the input partition number as the output partition number? If the input messages are ordered per

Kafka consumer are too fast for some partitions in "flatMap" like jobs

2017-08-29 Thread Oleksandr Baliev
Hello, There is one Flink job which consumes from Kafka topic (TOPIC_IN), simply flatMap / map data and push to another Kafka topic (TOPIC_OUT). TOPIC_IN has around 30 partitions, data is more or less sequential per partition and the job has parallelism 30. So in theory there should be 1:1