Thanks a lot for the explanation. That was exactly what I thought should happen. However, it is always good to a clear confirmation.
Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski <[email protected]> wrote: > Hi, > > Yes, in your case half of the Kafka source tasks wouldn’t read/process any > records (you can check that in web UI). This shouldn’t harm you, unless > your records will be redistributed after the source. For example: > > source.keyBy(..).process(new MyVeryHeavyOperator()).print() > > Should be fine, because `keyBy(…)` will redistribute records. However > > source.map(new MyVeryHeavyOperator()).print() > > Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To > solve that, you might want to consider using > > dataStream.rebalance(); > > Piotrek > > On 21 Sep 2018, at 13:25, Taher Koitawala <[email protected]> > wrote: > > Hi All, > Let's say a topic in kafka has 5 partitions. If I spawn 10 Task > Managers with 1 slot each and parallelism is 10 then how will records be > read from the kafka topic if I use the FlinkKafkaConsumer to read. > > Will 5 TM's read and the rest be ideal in that case? Is over subscribing > the number of TM's than the number of partitions in the Kafka topic > guarantee high throughput? > > Regards, > Taher Koitawala > GS Lab Pune > +91 8407979163 > > >
