Does kafka write key to broker?

2015-06-23 Thread Mohit Kathuria
Hi, We are using kafka 0.8.1.1 in our production cluster. I recently started specifying key as the message itself. I just realised that the key is also written to the broker which means that the data is duplicated within a keyed message. I am going to change the key. Stupid mistake. However,

Increase partitions and replication of __consumer_offsets

2015-06-23 Thread Daniel Coldham
Hi all, I'm using Kafka 0.8.2.1 in production. My Kafka Config is pretty much vanilla, so (as far as I understand) offsets are being written to Zookeeper. As recommended, I want to start writing offsets to Kafka instead of Zookeeper. I was surprised to see that the __consumer_offsets topic

Re: Broker Fails to restart

2015-06-23 Thread Zakee
Thanks, Jiangjie, Yes, we had reduced the segmetn.index.bytes to 1K in order to maintain more frequent offset index, which was required for ability to fetch start and end offsets for a given span of time say 15 mins. Ideally changing only the index.interval.bytes to 1K should have been

Re: Increase partitions and replication of __consumer_offsets

2015-06-23 Thread Daniel Coldham
To make my question clearer: I know how to increase the partitions and the replication factor of any plain old topic. I'm worried that making changes to this internal topic could cause problems, so I'm looking for advice. Thanks, *Daniel Coldham* On Tue, Jun 23, 2015 at 3:15 PM, Daniel

Re: Does kafka write key to broker?

2015-06-23 Thread Jason Gustafson
Hey Mohit, Unfortunately, I don't think there's any such configuration. By the way, there are some pretty cool things you can do with keys in Kafka (such as semantic partitioning and log compaction). I don't know if they would help in your use case, but it might be worth checking out

Re: Is trunk safe for production?

2015-06-23 Thread Todd Palino
Yes and no. We're running a version about a month behind trunk at any given time here at LinkedIn. That's generally the amount of time we spend testing and going through our release process internally (less if there are no problems). So it can be done. That said, we also have several Kafka

Re: Does kafka write key to broker?

2015-06-23 Thread Liquan Pei
Hi Mohit, If you instantiate the keyed message with val topic = topic val value = value val message = new KeyedMessage[String, String](topic, value); Then the key in the KeyedMessage will be null. Hope this helps! Thanks, Liquan On Tue, Jun 23, 2015 at 8:18 AM, Mohit Kathuria

Re: Is trunk safe for production?

2015-06-23 Thread Joel Koshy
Yes new features are a big part of it and sometimes bug fixes/improvements. Bug fixes are mostly due to being on trunk, but some aren't necessarily introduced on trunk. For e.g., we would like to do a broader roll-out of the new producer, but KAFKA-2121 (adding a request timeout to NetworkClient)

Re: data loss - replicas

2015-06-23 Thread Todd Palino
Thanks, Joel. I know I remember a case where we had a difference like this between two brokers, and it was not due to retention settings or some other problem, but I can't remember exactly what we determined it was. -Todd On Mon, Jun 22, 2015 at 4:22 PM, Joel Koshy jjkosh...@gmail.com wrote:

Re: Is trunk safe for production?

2015-06-23 Thread Gwen Shapira
Out of curiosity, why do you want to run trunk? General fondness for cutting edge stuff? Or are there specific features in trunk that you need? Gwen On Tue, Jun 23, 2015 at 2:59 AM, Achanta Vamsi Subhash achanta.va...@flipkart.com wrote: I am planning to use for the producer part. How stable is

high level consumer memory footprint

2015-06-23 Thread Kris K
Hi, I was just wondering if there is any difference in the memory footprint of a high level consumer when: 1. the consumer is live and continuously consuming messages with no backlogs 2. when the consumer is down for quite some time and needs to be brought up to clear the backlog. My test case

Re: Is trunk safe for production?

2015-06-23 Thread Achanta Vamsi Subhash
@Gwen I want to patch this JIRA https://issues.apache.org/jira/browse/KAFKA-1865 to 0.8.2.1. So, I was thinking instead of patching it can we run it against the trunk as I see other producer changes also pushed to trunk. We are facing latency problems with the current producer (sent out a separate

Best Practices for Java Consumers

2015-06-23 Thread Tom McKenzie
Hello Is there a good reference for best practices on running Java consumers? I'm thinking a FAQ format. - How should we run them? We are currently running them in Tomcat on Ubuntu, are there other approaches using services? Maybe the service wrapper

Re: data loss - replicas

2015-06-23 Thread Joel Koshy
It seems you might have run that on the last log segment. Can you run it on 21764229.log on both brokers and compare? I'm guessing there may be a message-set with a different compression codec that may be causing this. Thanks, Joel On Tue, Jun 23, 2015 at 01:06:16PM +0530, nirmal

Re: Best Practices for Java Consumers

2015-06-23 Thread Gwen Shapira
I don't know of any such resource, but I'll be happy to help contribute from my experience. I'm sure others would too. Do you want to start one? Gwen On Tue, Jun 23, 2015 at 2:03 PM, Tom McKenzie thomaswmcken...@gmail.com wrote: Hello Is there a good reference for best practices on running

No key specified when sending the message to Kafka

2015-06-23 Thread bit1...@163.com
I have the following code snippet that use Kafka Producer to send message(No key is specified in the KeyedMessage): val data = new KeyedMessage[String, String](topicName, msg); Kafka_Producer.send(data) Kafka_Producer is an instance of kafka.producer.Producer. With above code, I observed that

Re: data loss - replicas

2015-06-23 Thread nirmal
Hi, i ran DumpLogSegments. *Broker 1* offset: 23077447 position: 1073722324 isvalid: true payloadsize: 431 magic: 0 compresscodec: NoCompressionCodec crc: 895349554 *Broker 2* offset: 23077447 position: 1073740131 isvalid: true payloadsize: 431 magic: 0 compresscodec: NoCompressionCodec crc:

Batch producer latencies and flush()

2015-06-23 Thread Achanta Vamsi Subhash
Hi, We are using the batch producer of 0.8.2.1 and we are getting very bad latencies for the topics. We have ~40K partitions now in a 20-node cluster. - We have many topics and each with messages published to them varying. Ex: some topics take 10k/sec and other 2000/minute. - We are seeing

Is trunk safe for production?

2015-06-23 Thread Achanta Vamsi Subhash
I am planning to use for the producer part. How stable is trunk generally? -- Regards Vamsi Subhash -- -- This email and any files transmitted with it are

High level consumer rebalance question

2015-06-23 Thread tao xiao
Hi, I have 3 high level consumers with the same group id. One of the consumer goes down, I know rebalance will kick in in the remaining two consumers. What happens if one of the remaining consumers is very slow during rebalancing and it hasn't released ownership of some of the topics will the

Re: No key specified when sending the message to Kafka

2015-06-23 Thread Ewen Cheslack-Postava
It does balance data, but is sticky over short periods of time (for some definition of short...). See this FAQ for an explanation: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ? This behavior has been

Re: Consumer rebalancing based on partition sizes?

2015-06-23 Thread Ewen Cheslack-Postava
Current partition assignment only has a few limited options -- see the partition.assignment.strategy consumer option (which seems to be listed twice, see the second version for a more detailed explanation). There has been some discussion of making assignment strategies user extensible to support

Consumer rebalancing based on partition sizes?

2015-06-23 Thread Joel Ohman
Hello! I'm working with a topic of largely variable partition sizes. My biggest concern is that I have no control over which keys are assigned to which consumers in my consumer group, as the amount of data my consumer sees is directly reflected on it's work load. Is there a way to distribute