Re: How does flink read data from kafka number of TM's are more than topic partitions

Taher Koitawala Fri, 21 Sep 2018 06:10:50 -0700

Thanks a lot for the explanation. That was exactly what I thought should
happen. However, it is always good to a clear confirmation.



Regards,
Taher Koitawala
GS Lab Pune
+91 8407979163


On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski <[email protected]>
wrote:

> Hi,
>
> Yes, in your case half of the Kafka source tasks wouldn’t read/process any
> records (you can check that in web UI). This shouldn’t harm you, unless
> your records will be redistributed after the source. For example:
>
> source.keyBy(..).process(new MyVeryHeavyOperator()).print()
>
> Should be fine, because `keyBy(…)` will redistribute records. However
>
> source.map(new MyVeryHeavyOperator()).print()
>
> Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To
> solve that, you might want to consider using
>
> dataStream.rebalance();
>
> Piotrek
>
> On 21 Sep 2018, at 13:25, Taher Koitawala <[email protected]>
> wrote:
>
> Hi All,
>          Let's say a topic in kafka has 5 partitions. If I spawn 10 Task
> Managers with 1 slot each and parallelism is 10 then how will records be
> read from the kafka topic if I use the FlinkKafkaConsumer to read.
>
> Will 5 TM's read and the rest be ideal in that case? Is over subscribing
> the number of TM's than the number of partitions in the Kafka topic
> guarantee high throughput?
>
> Regards,
> Taher Koitawala
> GS Lab Pune
> +91 8407979163
>
>
>

Re: How does flink read data from kafka number of TM's are more than topic partitions

Reply via email to