Hello Yonghui,

In 0.7 the consumer rebalance logic is distributed and in some corner cases
such as soft-failure-caused-consecutive rebalances some consumer may
consider the rebalance as complete while others are still trying the
rebalance process. You can check the GC logs on your consumer to verify if
that is the case:

https://issues.apache.org/jira/browse/KAFKA-242

If you bounce the consumers to trigger another rebalance, this issue would
likely to be resolved.

To solve this issue in 0.9 we are moving the group management like load
rebalance from the ZK-based distributed logic into a centralized
coordiantor. Details can be found here:

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design

Guozhang


On Mon, May 12, 2014 at 12:48 AM, Yonghui Zhao <zhaoyong...@gmail.com>wrote:

> Hi,
>
> We are using kafka 0.7.
>
> 2 brokers, each broker has 10 partitions for one topic
> 3 consumers in one consumer group, each consumer create 10 streams.
>
>
> Today, when we want to rollout new service.
> After we restart one consumer we find exceptions and warning.
>
> kafka.common.ConsumerRebalanceFailedException:
> RecommendEvent_sd-sns-relation01.bj-1399630465426-53d3aefc can't rebalance
> after 4 retries
>
>
> [INFO  2014-05-12 15:17:47.364]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [conflict in /consumers/RecommendEvent/owners/sensei/1-2 data:
> RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-2 stored data:
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1]
> [INFO  2014-05-12 15:17:47.366]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
> partition ownership to be deleted: 1-2]
> [INFO  2014-05-12 15:17:47.375]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [conflict in /consumers/RecommendEvent/owners/sensei/1-3 data:
> RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-3 stored data:
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1]
> [INFO  2014-05-12 15:17:47.375]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
> partition ownership to be deleted: 1-3]
> [INFO  2014-05-12 15:17:47.385]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [conflict in /consumers/RecommendEvent/owners/sensei/1-5 data:
> RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-5 stored data:
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2]
> [INFO  2014-05-12 15:17:47.386]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
> partition ownership to be deleted: 1-5]
>
>
>
> And I opened zk viewer.
>
> In zk, we found 2 consumers in ConsumerGroup/ids:
>
> RecommendEvent_sd-sns-relation02.bj-1399635256619-5d8123c6
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3
>
>
> And in owners/topic/ we found all partitions are assigned to
> sd-sns-relation03.bj:
>
> Here is the owner info:
> 1:0  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
> 1:1  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
> 1:2  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
> 1:3  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
> 1:4  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
> 1:5  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
> 1:6  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
> 1:7  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
> 1:8  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
> 1:9  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
>
> 2:0  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
> 2:1  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
> 2:2  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
> 2:3  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
> 2:4  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
> 2:5  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-5
> 2:6  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-6
> 2:7  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-7
> 2:8  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-8
> 2:9  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-9
>
>
> So all partitions are assigned to sd-sns-relation03.bj,  but from logs and
> counter, we are sure sd-sns-relation02.bj has input too.
>
>
> My question is:
>
> 1. why rebalance failed?
> 2. why owner info is wrong?  btw: zkclient is 0.2
>



-- 
-- Guozhang

Reply via email to