Max. storage for Kafka and impact

2014-12-19 Thread Achanta Vamsi Subhash
Hi, We are using Kafka for our messaging system and we have an estimate for 200 TB/week in the coming months. Will it impact any performance for Kafka? PS: We will be having greater than 2 lakh partitions. -- Regards Vamsi Subhash

Re: Max. storage for Kafka and impact

2014-12-19 Thread Achanta Vamsi Subhash
We definitely need a retention policy of a week. Hence. On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash achanta.va...@flipkart.com wrote: Hi, We are using Kafka for our messaging system and we have an estimate for 200 TB/week in the coming months. Will it impact any performance for

Re: Max. storage for Kafka and impact

2014-12-19 Thread nitin sharma
hi, Few things you have to plan for: a. Ensure that from resilience point of view, you are having sufficient follower brokers for your partitions. b. In my testing of kafka (50TB/week) so far, haven't seen much issue with CPU utilization or memory. I had 24 CPU and 32GB RAM. c. 200,000 partitions

Re: Max. storage for Kafka and impact

2014-12-19 Thread Achanta Vamsi Subhash
Yes. We need those many max partitions as we have a central messaging service and thousands of topics. On Friday, December 19, 2014, nitin sharma kumarsharma.ni...@gmail.com wrote: hi, Few things you have to plan for: a. Ensure that from resilience point of view, you are having sufficient

Re: Max. storage for Kafka and impact

2014-12-19 Thread Joe Stein
see some comments inline On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash achanta.va...@flipkart.com wrote: We require: - many topics - ordering of messages for every topic Ordering is only on a per partition basis so you might have to pick a partition key that makes sense for what

Re: Max. storage for Kafka and impact

2014-12-19 Thread Joe Stein
Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000 partitions? I think you can take what I said below and change my 250 to 25 as I went with your result (1,000,000) and not your arguments (2,000 x 50). And you should think on the processing as a separate step from fetch and

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-19 Thread Paul Pearcy
Hi Jay, Many thanks for the info. All that makes sense, but from an API standpoint when something is labelled async and returns a Future, this will be misconstrued and developers will place async sends in critical client facing request/response pathways of code that should never block. If the

Re: Max. storage for Kafka and impact

2014-12-19 Thread Achanta Vamsi Subhash
Joe, - Correction, it's 1,00,000 partitions - We can have at max only 1 consumer/partition. Not 50 per 1 partition. Yes, we have a hashing mechanism to support future partition increase as well. We override the Default Partitioner. - We use both Simple and HighLevel consumers depending on the

The purpose of key in kafka

2014-12-19 Thread Rajiv Kurian
Hi all, I was wondering what why every ProducerRecord sent requires a serialized key. I am using kafka, to send opaque bytes and I am ending up creating garbage keys because I don't really have a good one. Thanks, Rajiv

Re: The purpose of key in kafka

2014-12-19 Thread Jiangjie Qin
Hi Rajiv, You can send messages without keys. Just provide null for key. Jiangjie (Becket) Qin On 12/19/14, 10:14 AM, Rajiv Kurian ra...@signalfuse.com wrote: Hi all, I was wondering what why every ProducerRecord sent requires a serialized key. I am using kafka, to send opaque bytes and I am

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-19 Thread Jay Kreps
Hey Paul, I agree we should document this better. We allow and encourage using partitions to semantically distribute data. So unfortunately we can't just arbitrarily assign a partition (say 0) as that would actually give incorrect answers for any consumer that made use of the partitioning. It is

Fwd: Help: KafkaSpout not getting data from Kafka

2014-12-19 Thread Banias H
Hi folks, I am new to both Kafka and Storm and I have problem having KafkaSpout to get data from Kafka in our three-node environment with Kafka 0.8.1.1 and Storm 0.9.3. What is working: - I have a Kafka producer (a java application) to generate random string to a topic and I was able to run the

Re: Max. storage for Kafka and impact

2014-12-19 Thread Pradeep Gollakota
@Joe, Achanta is using Indian English numerals which is why it's a little confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system 1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The rest of the world :P) On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash

Kafka consumer session timeouts

2014-12-19 Thread Terry Cumaranatunge
Hi I would like to get some feedback on design choices with kafka consumers. We have an application that a consumer reads a message and the thread does a number of things, including database accesses before a message is produced to another topic. The time between consuming and producing the

Re: The purpose of key in kafka

2014-12-19 Thread Rajiv Kurian
Thanks, didn't know that. On Fri, Dec 19, 2014 at 10:39 AM, Jiangjie Qin j...@linkedin.com.invalid wrote: Hi Rajiv, You can send messages without keys. Just provide null for key. Jiangjie (Becket) Qin On 12/19/14, 10:14 AM, Rajiv Kurian ra...@signalfuse.com wrote: Hi all, I was

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-19 Thread Paul Pearcy
Hi Jay, I have implemented a wrapper around the producer to behave like I want it to. Where it diverges from current 0.8.2 producer is that it accepts three new inputs: - A list of expected topics - A timeout value to init meta for those topics during producer creationg - An option to blow up if

Re: The purpose of key in kafka

2014-12-19 Thread Steve Miller
Also, if log.cleaner.enable is true in your broker config, that enables the log-compaction retention strategy. Then, for topics with the per-topic cleanup.policy=compact config parameter set, Kafka will scan the topic periodically, nuking old versions of the data with the same key. I