Weide, Like Daniel said, the rebalance logic is deterministic as round robin, so if you have a total number of partitions as n, and each one (master or slave) machine also has n threads, then all partitions will go to master. When master fails and restarts, the partitions will automatically go back to the old master.
On Sun, Aug 3, 2014 at 6:25 AM, Daniel Compton <d...@danielcompton.net> wrote: > Hi Weide > > The consumer rebalancing algorithm is deterministic. In your failure > scenario, when A comes back up again, the consumer threads will rebalance. > This will give you the initial consumer configuration at the start of the > test. > > I'm unsure whether the partitions are balanced round robin, or if they > will all go to A, then the overflow to B. > > If all of the messages need to be processed by a single machine, an > alternative architecture would be to have a standby server that waits until > master A fails and then connects as a consumer. This could be accomplished > by watching Zookeeper and getting a notification when A's ephemeral node is > removed. > > The high level consumer does seem to be the way to go as long as your > application can handle duplicate processing. > > Daniel. > > > On 2/08/2014, at 1:38 pm, Weide Zhang <weo...@gmail.com> wrote: > > > > Hi Guozhang, > > > > If I use high level consumer, how do I ensure all data goes to master > even > > if slave was up and running ? Is it just by forcing master to have enough > > consumer thread to cover maximum number of partitions of a topic since > > high level consumer doesn't have assumption of consumers who are master > and > > consumers who are slave. > > > > For example, master A initiate enough thread such that it can cover all > the > > partitions. slave B is standby with same consumer group and same number > of > > threads but since master A has enough thread to cover all the partitions. > > Slave B won't get any data. > > > > Suddenly master A goes down, slave B becomes new master, and it start to > > get data based on high level consumer rebalance design. > > > > After that old master A comes up and becomes slave, will A get data ? > Or A > > will not get data because B has enough thread to cover all partitions in > > the rebalancing logic. > > > > Thanks, > > > > Weide > > > > > >> On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > >> > >> Hello Weide, > >> > >> That should be doable via high-level consumer, you can take a look at > this > >> page: > >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example > >> > >> Guozhang > >> > >> > >>> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <weo...@gmail.com> wrote: > >>> > >>> Hi, > >>> > >>> I have a use case for a master slave cluster where the logic inside > >> master > >>> need to consume data from kafka and publish some aggregated data to > kafka > >>> again. When master dies, slave need to take the latest committed offset > >>> from master and continue consuming the data from kafka and doing the > >> push. > >>> > >>> My questions is what will be easiest kafka consumer design for this > >>> scenario to work ? I was thinking about using simpleconsumer and doing > >>> manual consumer offset syncing between master and slave. That seems to > >>> solve the problem but I was wondering if it can be achieved by using > high > >>> level consumer client ? > >>> > >>> Thanks, > >>> > >>> Weide > >> > >> > >> > >> -- > >> -- Guozhang > >> > -- -- Guozhang