Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Sachin Mittal
Hi, Understood. Just need to figure out the cause of these frequent re-balances. Somehow it seems to be pointing to rocksdb, but need to debug more. The pressing issue now is, to not kill the thread if there are commit failed exception on partition revoked (we anyway catch this at consumer

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Damian Guy
On 10 February 2017 at 11:18, Sachin Mittal wrote: > The heartbeat exception while rebalancing is OK. However I had some > different scenario which I wanted to understand. > > Please check line 42428 of https://dl.dropboxusercontent.com/u/46450177/ >

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Sachin Mittal
The heartbeat exception while rebalancing is OK. However I had some different scenario which I wanted to understand. Please check line 42428 of https://dl.dropboxusercontent.com/u/46450177/ TestKafkaAdvice.StreamThread-1.log Attempt to heartbeat failed for group new-part-advice since member id

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Damian Guy
Hi Sachin, The CommitFailedException are thrown because the group is rebalancing. You can see log messages like below happening before the commit failed exception: Attempt to heartbeat failed for group new-part-advice since it is rebalancing. It isn't clear from the logs why the rebalancing is

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Sachin Mittal
Hi, I could manage the streams client log, the server logs were deleted since time had elapsed and it got rolled over. See if you can figure out something from these. These are not best of logs generated. https://dl.dropboxusercontent.com/u/46450177/TestKafkaAdvice.StreamThread-1.log The above

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Sachin Mittal
I am getting the logs but could you please look at the line rebalanceException = t; https://github.com/apache/kafka/blob/0.10.2/streams/src/ main/java/org/apache/kafka/streams/processor/internals/ StreamThread.java#L261 Why are we setting rebalanceException in case of commit failed exception on

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Damian Guy
Might be easiest to just send all the logs if possible. On Thu, 9 Feb 2017 at 08:10 Sachin Mittal wrote: > I would try to get the logs soon. > One quick question, I have three brokers which run in cluster with default > logging. > > Which log4j logs would be of interest at

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Sachin Mittal
I would try to get the logs soon. One quick question, I have three brokers which run in cluster with default logging. Which log4j logs would be of interest at broker side and which broker or do I need to send logs from all three. My topic is partitioned and replicated on all three so kafka-logs

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Damian Guy
Sachin, Can you provide the full logs from the broker and the streams app? It is hard to understand what is going on with only snippets of information. It seems like the rebalance is taking too long, but i can't tell from this. Thanks, Damian On Thu, 9 Feb 2017 at 07:53 Sachin Mittal

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-08 Thread Sachin Mittal
Hi, In continuation of the CommitFailedException what we observe is that when this happens first time ConsumerCoordinator invokes onPartitionsRevoked on StreamThread. This calls suspendTasksAndState() which again tries to commit offset and then again the same exception is thrown. This gets

Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-08 Thread Sachin Mittal
Hi All, I am trying out the 0.10.2.0 rc. We have a source stream of 40 partitions. We start one instance with 4 threads. After that we start second instance with same config on a different machine and then same way third instance. After application reaches steady state we start getting