I'll try to answer some, the Kafka team will need to answer the others:
On Wed, May 8, 2013 at 12:17 PM, Yu, Libo <libo...@citi.com> wrote: > Hi, > > I read this link > https://cwiki.apache.org/KAFKA/consumer-group-example.html > and have a few questions (if not too many). > > 1 When you say the iterator may block, do you mean hasNext() may block? > Yes. > > 2 "Remember, you can only use a single process per Consumer Group." > Do you mean we can only use a single process on one node of the > cluster for a consumer group? > Or there can be only one process on the whole cluster for a consumer > group? Please clarify on this. > > Bug. I'll change it. When I wrote this I mis-understood the re-balancing step. I missed this reference but fixed the others. Sorry > 3 Why save offset to zookeeper? Is it easier to save it to a local file? > > 4 When client exits/crashes or leader for a partition is changed, > duplicate messages may be replayed. "To help avoid this (replayed duplicate > messages), make sure you provide a clean way for your client to exit > instead of assuming it can be 'kill -9'd." > > a. For client exit, if the client is receiving data at the time, how > to do a clean exit? How can client tell consumer to write offset to > zookeepr before exiting? > If you call the shutdown() method on the Consumer it will cleanly stop, releasing any blocked iterators. In the example it goes to sleep for a few seconds then cleanly shuts down. > > > b. For client crash, what can client do to avoid duplicate messages > when restarted? What I can think of is to read last message from log file > and ignore the first few received duplicate messages until receiving the > last read message. But is it possible for client to read log file directly? > If you can't tolerate the possibility of duplicates you need to look at the Simple Consumer example, There you control the offset storage. > > > c. For the change of the partition leader, is there anything that > clients can do to avoid duplicates? > > Thanks. > > > > Libo > >