Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Sachin Mittal
Hi, Understood. Just need to figure out the cause of these frequent re-balances. Somehow it seems to be pointing to rocksdb, but need to debug more. The pressing issue now is, to not kill the thread if there are commit failed exception on partition revoked (we anyway catch this at consumer coordin

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Damian Guy
On 10 February 2017 at 11:18, Sachin Mittal wrote: > The heartbeat exception while rebalancing is OK. However I had some > different scenario which I wanted to understand. > > Please check line 42428 of https://dl.dropboxusercontent.com/u/46450177/ > TestKafkaAdvice.StreamThread-1.log If you lo

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Sachin Mittal
The heartbeat exception while rebalancing is OK. However I had some different scenario which I wanted to understand. Please check line 42428 of https://dl.dropboxusercontent.com/u/46450177/ TestKafkaAdvice.StreamThread-1.log Attempt to heartbeat failed for group new-part-advice since member id is

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-10 Thread Damian Guy
Hi Sachin, The CommitFailedException are thrown because the group is rebalancing. You can see log messages like below happening before the commit failed exception: Attempt to heartbeat failed for group new-part-advice since it is rebalancing. It isn't clear from the logs why the rebalancing is h

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Sachin Mittal
Hi, I could manage the streams client log, the server logs were deleted since time had elapsed and it got rolled over. See if you can figure out something from these. These are not best of logs generated. https://dl.dropboxusercontent.com/u/46450177/TestKafkaAdvice.StreamThread-1.log The above log

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Sachin Mittal
I am getting the logs but could you please look at the line rebalanceException = t; https://github.com/apache/kafka/blob/0.10.2/streams/src/ main/java/org/apache/kafka/streams/processor/internals/ StreamThread.java#L261 Why are we setting rebalanceException in case of commit failed exception on p

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Damian Guy
Might be easiest to just send all the logs if possible. On Thu, 9 Feb 2017 at 08:10 Sachin Mittal wrote: > I would try to get the logs soon. > One quick question, I have three brokers which run in cluster with default > logging. > > Which log4j logs would be of interest at broker side and which

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Sachin Mittal
I would try to get the logs soon. One quick question, I have three brokers which run in cluster with default logging. Which log4j logs would be of interest at broker side and which broker or do I need to send logs from all three. My topic is partitioned and replicated on all three so kafka-logs d

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-09 Thread Damian Guy
Sachin, Can you provide the full logs from the broker and the streams app? It is hard to understand what is going on with only snippets of information. It seems like the rebalance is taking too long, but i can't tell from this. Thanks, Damian On Thu, 9 Feb 2017 at 07:53 Sachin Mittal wrote: >

Re: Getting CommitFailedException in 0.10.2.0 due to member id is not valid or unknown

2017-02-08 Thread Sachin Mittal
Hi, In continuation of the CommitFailedException what we observe is that when this happens first time ConsumerCoordinator invokes onPartitionsRevoked on StreamThread. This calls suspendTasksAndState() which again tries to commit offset and then again the same exception is thrown. This gets handled