@Guozhang: In server.properties we have :
zookeeper.connection.timeout.ms=1000000 In zoo.cfg we have tickTime=2000 initLimit=10 syncLimit=5 dataDir=/opt/zookeeper/data dataLogDir=/opt/zookeeper/logs clientPort=2182 server.1=xxxx.com:2888:3888 server.2=xxxx.com:2888:3888 server.3=xxxx.com:2888:3888 On Thu, Nov 13, 2014 at 10:27 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Chen, > > From ZK logs it sounds like ZK kept timed out consumers which triggers > rebalance. > > What is the zk session timeout config value in your consumers? > > Guozhang > > On Thu, Nov 13, 2014 at 10:15 AM, Chen Wang <chen.apache.s...@gmail.com> > wrote: > > > Thanks for the info. > > It makes sense, however, I didn't see any "session timeout"/"expired" > > entries in consumer log.. > > but do see lots of connection closed entry in zookeeper log: > > > > 2014-11-13 10:07:53,132 [myid:1] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1007] - Closed socket connection for > > client /10.93.83.50:37180 which had sessionid 0x149a4cc1b580e7d > > 2014-11-13 10:08:04,499 [myid:1] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@197] - Accepted socket > > connection > > from /10.93.80.121:38437 > > 2014-11-13 10:08:04,503 [myid:1] - WARN [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2182:ZooKeeperServer@822] - Connection request from old > > client /10.93.80.121:38437; will be dropped if server is in r-o mode > > 2014-11-13 10:08:04,503 [myid:1] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2182:ZooKeeperServer@868] - Client attempting to > establish > > new session at /10.93.80.121:38437 > > 2014-11-13 10:08:04,538 [myid:1] - INFO > > [CommitProcessor:1:ZooKeeperServer@617] - Established session > > 0x149a4cc1b580e7e with negotiated timeout 40000 for client / > > 10.93.80.121:38437 > > 2014-11-13 10:08:08,746 [myid:1] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1007] - Closed socket connection for > > client /10.93.80.121:38437 which had sessionid 0x149a4cc1b580e7e > > > > We are using -Xmx2048m for consumer, and I didn't see any GC related > > exceptions > > > > Chen > > > > > > > > On Thu, Nov 13, 2014 at 9:13 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hey Chen, > > > > > > As Neha suggested, typical reason of too many rebalances is that your > > > consumers kept being timed out from ZK, and you can verify this by > > checking > > > in your consumer logs for sth. like "session timeout" entries (these > are > > > not ERROR entries). > > > > > > Guozhang > > > > > > Guozhang > > > > > > On Wed, Nov 12, 2014 at 5:31 PM, Neha Narkhede < > neha.narkh...@gmail.com> > > > wrote: > > > > > > > Does this help? > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog > > > > ? > > > > > > > > On Wed, Nov 12, 2014 at 3:53 PM, Chen Wang < > chen.apache.s...@gmail.com > > > > > > > wrote: > > > > > > > > > Hi there, > > > > > My kafka client is reading a 3 partition topic from kafka with 3 > > > threads > > > > > distributed on different machines. I am seeing frequent owner > changes > > > on > > > > > the topics when running: > > > > > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group > > > > > my_test_group --topic mytopic -zkconnect localhost:2181 > > > > > > > > > > The owner kept changing once a while, but I didn't see any > exceptions > > > > > thrown from the consumer side. When checking broker log, its full > of > > > > > INFO Closing socket connection to /IP. (kafka.network.Processor) > > > > > > > > > > Is this expected behavior? If so, how can I tell when the leader > is > > > > > imbalanced, and rebalance is triggered? > > > > > Thanks, > > > > > Chen > > > > > > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >