A few things I've learned:
1) Don't break things up into separate topics unless the data in them is
truly independent. Consumer behavior can be extremely variable, don't
assume you will always be consuming as fast as you are producing.
2) Keep time related messages in the same partition. Again
Take a look at:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic?
On Fri, May 23, 2014 at 12:49:39PM -0700, Bhavesh Mistry wrote:
> Hi Kafka Users,
>
>
>
> We are trying to transport 4TB data per day on single topic. It is
> operation applica
Hi Kafka Users,
We are trying to transport 4TB data per day on single topic. It is
operation application logs.How do we estimate number of partitions and
partitioning strategy? Our goal is to drain (from consumer side) from
the Kafka Brokers as soon as messages arrive (keep the lag as min