Thanks a lot Guozhang and Daniel. Weide
On Mon, Aug 4, 2014 at 8:27 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Weide, > > Like Daniel said, the rebalance logic is deterministic as round robin, so > if you have a total number of partitions as n, and each one (master or > slave) machine also has n threads, then all partitions will go to master. > When master fails and restarts, the partitions will automatically go back > to the old master. > > > On Sun, Aug 3, 2014 at 6:25 AM, Daniel Compton <d...@danielcompton.net> > wrote: > > > Hi Weide > > > > The consumer rebalancing algorithm is deterministic. In your failure > > scenario, when A comes back up again, the consumer threads will > rebalance. > > This will give you the initial consumer configuration at the start of the > > test. > > > > I'm unsure whether the partitions are balanced round robin, or if they > > will all go to A, then the overflow to B. > > > > If all of the messages need to be processed by a single machine, an > > alternative architecture would be to have a standby server that waits > until > > master A fails and then connects as a consumer. This could be > accomplished > > by watching Zookeeper and getting a notification when A's ephemeral node > is > > removed. > > > > The high level consumer does seem to be the way to go as long as your > > application can handle duplicate processing. > > > > Daniel. > > > > > On 2/08/2014, at 1:38 pm, Weide Zhang <weo...@gmail.com> wrote: > > > > > > Hi Guozhang, > > > > > > If I use high level consumer, how do I ensure all data goes to master > > even > > > if slave was up and running ? Is it just by forcing master to have > enough > > > consumer thread to cover maximum number of partitions of a topic since > > > high level consumer doesn't have assumption of consumers who are master > > and > > > consumers who are slave. > > > > > > For example, master A initiate enough thread such that it can cover all > > the > > > partitions. slave B is standby with same consumer group and same number > > of > > > threads but since master A has enough thread to cover all the > partitions. > > > Slave B won't get any data. > > > > > > Suddenly master A goes down, slave B becomes new master, and it start > to > > > get data based on high level consumer rebalance design. > > > > > > After that old master A comes up and becomes slave, will A get data ? > > Or A > > > will not get data because B has enough thread to cover all partitions > in > > > the rebalancing logic. > > > > > > Thanks, > > > > > > Weide > > > > > > > > >> On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > >> > > >> Hello Weide, > > >> > > >> That should be doable via high-level consumer, you can take a look at > > this > > >> page: > > >> > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example > > >> > > >> Guozhang > > >> > > >> > > >>> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <weo...@gmail.com> > wrote: > > >>> > > >>> Hi, > > >>> > > >>> I have a use case for a master slave cluster where the logic inside > > >> master > > >>> need to consume data from kafka and publish some aggregated data to > > kafka > > >>> again. When master dies, slave need to take the latest committed > offset > > >>> from master and continue consuming the data from kafka and doing the > > >> push. > > >>> > > >>> My questions is what will be easiest kafka consumer design for this > > >>> scenario to work ? I was thinking about using simpleconsumer and > doing > > >>> manual consumer offset syncing between master and slave. That seems > to > > >>> solve the problem but I was wondering if it can be achieved by using > > high > > >>> level consumer client ? > > >>> > > >>> Thanks, > > >>> > > >>> Weide > > >> > > >> > > >> > > >> -- > > >> -- Guozhang > > >> > > > > > > -- > -- Guozhang >