Perfect - This explains it very clearly. Thank you very much!

Sameer

On Tue, Aug 23, 2016 at 9:31 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com>
wrote:

> Slight misunderstanding here. The one thread per Kafka broker happens
> *after* the assignment of Kafka partitions to the source instances. So,
> with a total of 10 partitions and 10 source instances, each source instance
> will first be assigned 1 partition. Then, each source instance will create
> 1 thread for every individual broker that holds partitions that the source
> instance is assigned. The per-broker threading model of the Kafka consumer
> has nothing to do with the initial assignment of partitions to source
> instances.
>
> Another example to explain this more clearly:
> Say you have 2 Kafka brokers, and each hold 5 partitions, and have source
> parallelism 5. Each source instance will still have 2 partitions. If the
> 2 partitions belong to the same broker, the source instance will have only
> 1 consuming threads; otherwise if the 2 partitions belong to different
> brokers, the source instance will have 2 consuming threads.
>
> Regards,
> Gordon
>
>
> On August 23, 2016 at 8:47:15 PM, Sameer W (sam...@axiomine.com) wrote:
>
> Gordon,
>
> I tried the following with Kafka - 1 Broker but a topic has 10 partitions.
> I have a parallelism of 10 defined for the job. I see all my 10
> source->Mapper->assignTimestamps receiving and sending data. If there is
> only one source instance per broker how does that happen?
>
> Thanks,
> Sameer
>
> On Tue, Aug 23, 2016 at 7:17 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com>
> wrote:
>
>> Hi!
>>
>> Kinesis shards should be ideally evenly assigned to the source instances.
>> So, with your example of source parallelism of 10 and 20 shards, each
>> source instance will have 2 shards and will have 2 threads consuming them
>> (therefore, not in round robin).
>>
>> For the Kafka consumer, in the source instances there will be one
>> consuming thread per broker, instead of partition. So, if a source instance
>> is assigned partitions that happen to be on the same broker, the source
>> instance will only create 1 thread to consume all of them.
>>
>> You are correct that currently the Kafka consumer does not handle
>> repartitioning transparently like the Kinesis connector, but we’re working
>> on this :)
>>
>> Regards,
>> Gordon
>>
>> On August 23, 2016 at 6:50:31 PM, Sameer W (sam...@axiomine.com) wrote:
>>
>> Hi,
>>
>> The documentation says that there will be one thread per shard. If I my
>> streaming job runs with a parallelism of 10 and there are 20 shards, are
>> more threads going to be launched within  a task slot running a source
>> function to consume the additional shards or will one source function
>> instance consume 2 shards in round robin.
>>
>> Is it any different for Kafka? Based on the documentation my
>> understanding is that if there are 10 source function instances and 20
>> partitions, each one will read 2 partitions.
>>
>> Also if partitions are added to Kafka are they handled by the existing
>> streaming job or does it need to be restarted? It appears as though Kinesis
>> handles it via the consumer constantly checking for more shards.
>>
>> Thanks,
>> Sameer
>>
>>
>

Reply via email to