Re: number of topics given many consumers and groups within the data

Ben Stopford Wed, 30 Sep 2015 09:07:54 -0700

Hi Shaun

You might consider using a custom partition assignment strategy to push your 
different “groups" to different partitions. This would allow you walk the 
middle ground between "all consumers consume everything” and “one topic per 
consumer” as you vary the number of partitions in the topic, albeit at the cost 
of a little extra complexity.


Also, not sure if you’ve seen it but there is quite a good section in the FAQ 
here 
<https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave?>
 on topic and partition sizing. 

B

> On 29 Sep 2015, at 18:48, Shaun Senecal <shaun.sene...@lithium.com> wrote:
> 
> Hi
> 
> 
> I heave read Jay Kreps post regarding the number of topics that can be 
> handled by a broker 
> (https://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka), and 
> it has left me with more questions that I dont see answered anywhere else.
> 
> 
> We have a data stream which will be consumed by many consumers (~400).  We 
> also have many "groups" within our data.  A group in the data corresponds 1:1 
> with what the consumers would consume, so consumer A only ever see group A 
> messages, consumer B only consumes group B messages, etc.
> 
> 
> The downstream consumers will be consuming via a websocket API, so the API 
> server will be the thing consuming from kafka.
> 
> 
> If I use a single topic with, say, 20 partitions, the consumers in the API 
> server would need to re-read the same messages over and over for each 
> consumer, which seems like a waste of network and a potential bottleneck.
> 
> 
> Alternatively, I could use a single topic with 20 partitions and have a 
> single consumer in the API put the messages into cassandra/redis (as 
> suggested by Jay), and serve out the downstream consumer streams that way.  
> However, that requires using a secondary sorted storage, which seems like a 
> waste (and added complexity) given that Kafka already has the data exactly as 
> I need it.  Especially if cassandra/redis are required to maintain a long TTL 
> on the stream.
> 
> 
> Finally, I could use 1 topic per group, each with a single partition.  This 
> would result in 400 topics on the broker, but would allow the API server to 
> simply serve the stream for each consumer directly from kafka and wont 
> require additional machinery to serve out the requests.
> 
> 
> The 400 topic solution makes the most sense to me (doesnt require extra 
> services, doesnt waste resources), but seem to conflict with best practices, 
> so I wanted to ask the community for input.  Has anyone done this before?  
> What makes the most sense here?
> 
> 
> 
> 
> Thanks
> 
> 
> Shaun

Re: number of topics given many consumers and groups within the data

Reply via email to