Perfect - This explains it very clearly. Thank you very much! Sameer
On Tue, Aug 23, 2016 at 9:31 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com> wrote: > Slight misunderstanding here. The one thread per Kafka broker happens > *after* the assignment of Kafka partitions to the source instances. So, > with a total of 10 partitions and 10 source instances, each source instance > will first be assigned 1 partition. Then, each source instance will create > 1 thread for every individual broker that holds partitions that the source > instance is assigned. The per-broker threading model of the Kafka consumer > has nothing to do with the initial assignment of partitions to source > instances. > > Another example to explain this more clearly: > Say you have 2 Kafka brokers, and each hold 5 partitions, and have source > parallelism 5. Each source instance will still have 2 partitions. If the > 2 partitions belong to the same broker, the source instance will have only > 1 consuming threads; otherwise if the 2 partitions belong to different > brokers, the source instance will have 2 consuming threads. > > Regards, > Gordon > > > On August 23, 2016 at 8:47:15 PM, Sameer W (sam...@axiomine.com) wrote: > > Gordon, > > I tried the following with Kafka - 1 Broker but a topic has 10 partitions. > I have a parallelism of 10 defined for the job. I see all my 10 > source->Mapper->assignTimestamps receiving and sending data. If there is > only one source instance per broker how does that happen? > > Thanks, > Sameer > > On Tue, Aug 23, 2016 at 7:17 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com> > wrote: > >> Hi! >> >> Kinesis shards should be ideally evenly assigned to the source instances. >> So, with your example of source parallelism of 10 and 20 shards, each >> source instance will have 2 shards and will have 2 threads consuming them >> (therefore, not in round robin). >> >> For the Kafka consumer, in the source instances there will be one >> consuming thread per broker, instead of partition. So, if a source instance >> is assigned partitions that happen to be on the same broker, the source >> instance will only create 1 thread to consume all of them. >> >> You are correct that currently the Kafka consumer does not handle >> repartitioning transparently like the Kinesis connector, but we’re working >> on this :) >> >> Regards, >> Gordon >> >> On August 23, 2016 at 6:50:31 PM, Sameer W (sam...@axiomine.com) wrote: >> >> Hi, >> >> The documentation says that there will be one thread per shard. If I my >> streaming job runs with a parallelism of 10 and there are 20 shards, are >> more threads going to be launched within a task slot running a source >> function to consume the additional shards or will one source function >> instance consume 2 shards in round robin. >> >> Is it any different for Kafka? Based on the documentation my >> understanding is that if there are 10 source function instances and 20 >> partitions, each one will read 2 partitions. >> >> Also if partitions are added to Kafka are they handled by the existing >> streaming job or does it need to be restarted? It appears as though Kinesis >> handles it via the consumer constantly checking for more shards. >> >> Thanks, >> Sameer >> >> >