The easiest way to diagnose is to enable GC logging on both the consumer and the zk instance and see if you have long pauses.
-Jay On Tue, Feb 5, 2013 at 5:46 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > >> Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > has expired, closing socket connection > > This can happen either due to long GC pauses on your client side or due to > IO pauses on the zookeeper server side. > That is the reason increasing the session timeout seems to have helped. > If this error happens frequently, it will cause your consumer instances to > keep rebalancing. > > Thanks, > Neha > > > On Tue, Feb 5, 2013 at 5:41 PM, Manish Khettry <man...@ooyala.com> wrote: > > > We are trying to trouble shoot a problem wherein our system just cannot > > seem to read messages fast enough from Kafka. We are on kafka 0.6 and are > > using the simple consumer. > > > > From looking at the logs, and we see a lot (almost constant chatty > > messages) about rebalancing. So for instance every minute, we see > messages > > like this: > > > > > > Consumer rookery-vacuum-prod_<first_ip>.internal-1360106018385 > > rebalancing the following partitions: List(0-0, 0-1, 0-10, 0-11, 0-12, > > 0-13, 0-14, 0-15, 0-16, 0-17, 0-18, 0-19, 0-2, 0-3, 0-4, 0-5, 0-6, > > 0-7, 0-8, 0-9, 1-0, 1-1, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16, > > 1-17, 1-18, 1-19, 1-2, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9) for topic > > compact-player-logs with consumers: > > > > > > I also see zookeeper timeouts like so: > > > > Unable to reconnect to ZooKeeper service, session 0x33c981ab95100ed > > has expired, closing socket connection > > > > > > We increased the zookeeper session timeout from 6 seconds to 12 seconds > and > > this seems to have helped somewhat but I'm not sure if these zookeeper > > timeouts at 6 seconds are symptomatic of a problem with our zookeeper > > cluster and/or connectivity between the consumers and zk. Any thoughts? > > > > Manish > > >