That rebalance cycle doesn't look endless. I see that you started 23 consumers, and I see 23 rebalances finishing successfully, which is correct. You will see rebalance messages from all of the consumers you started. It all happens within about 2 seconds, which is fine. I agree that there is a lot of log messages, but I'm not seeing anything that is particularly a problem here. After the segment of pot you provided, your consumers will be running properly. Now, given you have a topic with 16 partitions, and you're running 23 consumers, 7 of those consumer threads are going to be idle because they do not own partitions.
-Todd On Fri, Sep 25, 2015 at 3:27 PM, noah <iamn...@gmail.com> wrote: > We're seeing this the most on developer machines that are starting up > multiple high level consumers on the same topic+group as part of service > startup. The consumers do not seem to get a chance to consume anything > before they disconnect. > > These are developer topics, so it is possible/likely that there isn't > anything for them to consume in the topic, but the same service will start > producing, so I would expect them to not be idle for long. > > Could it be the way we are bring up multiple consumers at the same time is > hitting some sort of endless rebalance cycle? And/or the resulting > thrashing is causing them to time out, rebalance, etc.? > > I've tried attaching the logs again. Thanks! > > On Fri, Sep 25, 2015 at 3:33 PM Todd Palino <tpal...@gmail.com> wrote: > >> I don't see the logs attached, but what does the GC look like in your >> applications? A lot of times this is caused (at least on the consumer >> side) >> by the Zookeeper session expiring due to excessive GC activity, which >> causes the consumers to go into a rebalance and change up their >> connections. >> >> -Todd >> >> >> On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira <g...@confluent.io> wrote: >> >> > How busy are the clients? >> > >> > The brokers occasionally close idle connections, this is normal and >> > typically not something to worry about. >> > However, this shouldn't happen to consumers that are actively reading >> data. >> > >> > I'm wondering if the "consumers not making any progress" could be due >> to a >> > different issue, and because they are idle, the connection closes (vs >> the >> > other way around). >> > >> > On Thu, Sep 24, 2015 at 2:32 PM, noah <iamn...@gmail.com> wrote: >> > >> > > We are having issues with producers and consumers frequently fully >> > > disconnecting (from both the brokers and ZK) and reconnecting without >> any >> > > apparent cause. On our production systems it can happen anywhere from >> > every >> > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and >> > > developer laptops, it can happen almost constantly. >> > > >> > > We see no errors in the logs (sample attached), just a message for >> each >> > of >> > > our our consumers and producers disconnecting, then reconnecting. On >> the >> > > systems where it happens constantly, the consumers are not making any >> > > progress. >> > > >> > > The logs on the brokers are equally unhelpful, they show only frequent >> > > connects and reconnects, without any apparent cause. >> > > >> > > What could be causing this behavior? >> > > >> > > >> > >> >