On Wed, Aug 30, 2017 at 11:50 AM, Oleksandr Baliev <
aleksanderba...@gmail.com> wrote:
>
> So the main question is how to synchronize data reading between kafka
> partitions when data is sequential per partitions, but late for some of
> them and we care about that data is not thrown away and will
Hi Elias,
Thanks for reply, TOPIC_OUT has less partitions, ~20, but actually there
are 4 output topics with different amount of partitions. So the Job is kind
of router.
In general to have 1:1 partitions for IN and OUT topics is good, thanks for
tip. But since the main goal is to have windows in
How many partitions does the output topic have? If it has the same number
of partitions as the input topic (30), have you considered simply using a
custom partitioner for the Kafka sink that uses the input partition number
as the output partition number? If the input messages are ordered per
inpu
Hello,
There is one Flink job which consumes from Kafka topic (TOPIC_IN), simply
flatMap / map data and push to another Kafka topic (TOPIC_OUT).
TOPIC_IN has around 30 partitions, data is more or less sequential per
partition and the job has parallelism 30. So in theory there should be 1:1
mapping