Hi Weide The consumer rebalancing algorithm is deterministic. In your failure scenario, when A comes back up again, the consumer threads will rebalance. This will give you the initial consumer configuration at the start of the test.
I'm unsure whether the partitions are balanced round robin, or if they will all go to A, then the overflow to B. If all of the messages need to be processed by a single machine, an alternative architecture would be to have a standby server that waits until master A fails and then connects as a consumer. This could be accomplished by watching Zookeeper and getting a notification when A's ephemeral node is removed. The high level consumer does seem to be the way to go as long as your application can handle duplicate processing. Daniel. > On 2/08/2014, at 1:38 pm, Weide Zhang <weo...@gmail.com> wrote: > > Hi Guozhang, > > If I use high level consumer, how do I ensure all data goes to master even > if slave was up and running ? Is it just by forcing master to have enough > consumer thread to cover maximum number of partitions of a topic since > high level consumer doesn't have assumption of consumers who are master and > consumers who are slave. > > For example, master A initiate enough thread such that it can cover all the > partitions. slave B is standby with same consumer group and same number of > threads but since master A has enough thread to cover all the partitions. > Slave B won't get any data. > > Suddenly master A goes down, slave B becomes new master, and it start to > get data based on high level consumer rebalance design. > > After that old master A comes up and becomes slave, will A get data ? Or A > will not get data because B has enough thread to cover all partitions in > the rebalancing logic. > > Thanks, > > Weide > > >> On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wangg...@gmail.com> wrote: >> >> Hello Weide, >> >> That should be doable via high-level consumer, you can take a look at this >> page: >> >> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example >> >> Guozhang >> >> >>> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <weo...@gmail.com> wrote: >>> >>> Hi, >>> >>> I have a use case for a master slave cluster where the logic inside >> master >>> need to consume data from kafka and publish some aggregated data to kafka >>> again. When master dies, slave need to take the latest committed offset >>> from master and continue consuming the data from kafka and doing the >> push. >>> >>> My questions is what will be easiest kafka consumer design for this >>> scenario to work ? I was thinking about using simpleconsumer and doing >>> manual consumer offset syncing between master and slave. That seems to >>> solve the problem but I was wondering if it can be achieved by using high >>> level consumer client ? >>> >>> Thanks, >>> >>> Weide >> >> >> >> -- >> -- Guozhang >>