Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-25 Thread Cliff Rhyne
I was reflecting on this more and I think there is a change or two that should be considered for kafka. First off, auto-commit is a very resilient mode. Drops in zookeeper sessions due to garbage collection, network, rebalance or other interference are handled gracefully within the kafka client.

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-23 Thread Cliff Rhyne
Thanks, Jiangjie. Understanding more about the auto-commit behavior and why it's resilient to these is a big help. We're going to do some deeper investigation and testing. I'll report back when I have more information. Thanks, Cliff On Thu, Oct 22, 2015 at 11:48 PM, Jiangjie Qin

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-22 Thread Cliff Rhyne
We did some more testing with logging turned on (I figured out why it wasn't working). We tried increasing the JVM memory capacity on our test server (it's lower than in production) and increasing the zookeeper timeouts. Neither changed the results. With trace logging enabled, we saw that we

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-22 Thread Jiangjie Qin
Hi Cliff, If auto.offset.commit is set to true, the offset will be committed in following cases in addition to periodical offset commit: 1. During consumer rebalance before release the partition ownership. If consumer A owns partition P before rebalance, it will commit offset for partition P

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-21 Thread James Cheng
Do you have multiple consumers in a consumer group? I think that when a new consumer joins the consumer group, that the existing consumers will stop consuming during the group rebalance, and then when they start consuming again, that they will consume from the last committed offset. You should

Java high level consumer providing duplicate messages when auto commit is off

2015-10-21 Thread Cliff Rhyne
Hi, My team and I are looking into a problem where the Java high level consumer provides duplicate messages if we turn auto commit off (using version 0.8.2.1 of the server and Java client). The expected sequence of events are: 1. Start high-level consumer and initialize a KafkaStream to get a

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-21 Thread Cliff Rhyne
Hi James, There are two scenarios we run: 1. Multiple partitions with one consumer per partition. This rarely has starting/stopping of consumers, so the pool is very static. There is a configured consumer timeout, which is causing the ConsumerTimeoutException to get thrown prior to the test

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-21 Thread Kris K
Hi Cliff, One other case I observed in my environment is - when there were gc pauses on one of our high level consumer in the group. Thanks, Kris On Wed, Oct 21, 2015 at 10:12 AM, Cliff Rhyne wrote: > Hi James, > > There are two scenarios we run: > > 1. Multiple partitions

Re: Java high level consumer providing duplicate messages when auto commit is off

2015-10-21 Thread Cliff Rhyne
Hi Kris, Thanks for the tip. I'm going to investigate this further. I checked and we have fairly short zk timeouts and run with a smaller memory allocation on the two environments we encounter this issue. I'll let you all know what I find. I saw this ticket