That rebalance cycle doesn't look endless. I see that you started 23
consumers, and I see 23 rebalances finishing successfully, which is
correct. You will see rebalance messages from all of the consumers you
started. It all happens within about 2 seconds, which is fine. I agree that
there is a lot of log messages, but I'm not seeing anything that is
particularly a problem here. After the segment of pot you provided, your
consumers will be running properly. Now, given you have a topic with 16
partitions, and you're running 23 consumers, 7 of those consumer threads
are going to be idle because they do not own partitions.

-Todd


On Fri, Sep 25, 2015 at 3:27 PM, noah <iamn...@gmail.com> wrote:

> We're seeing this the most on developer machines that are starting up
> multiple high level consumers on the same topic+group as part of service
> startup. The consumers do not seem to get a chance to consume anything
> before they disconnect.
>
> These are developer topics, so it is possible/likely that there isn't
> anything for them to consume in the topic, but the same service will start
> producing, so I would expect them to not be idle for long.
>
> Could it be the way we are bring up multiple consumers at the same time is
> hitting some sort of endless rebalance cycle? And/or the resulting
> thrashing is causing them to time out, rebalance, etc.?
>
> I've tried attaching the logs again. Thanks!
>
> On Fri, Sep 25, 2015 at 3:33 PM Todd Palino <tpal...@gmail.com> wrote:
>
>> I don't see the logs attached, but what does the GC look like in your
>> applications? A lot of times this is caused (at least on the consumer
>> side)
>> by the Zookeeper session expiring due to excessive GC activity, which
>> causes the consumers to go into a rebalance and change up their
>> connections.
>>
>> -Todd
>>
>>
>> On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira <g...@confluent.io> wrote:
>>
>> > How busy are the clients?
>> >
>> > The brokers occasionally close idle connections, this is normal and
>> > typically not something to worry about.
>> > However, this shouldn't happen to consumers that are actively reading
>> data.
>> >
>> > I'm wondering if the "consumers not making any progress" could be due
>> to a
>> > different issue, and because they are idle, the connection closes (vs
>> the
>> > other way around).
>> >
>> > On Thu, Sep 24, 2015 at 2:32 PM, noah <iamn...@gmail.com> wrote:
>> >
>> > > We are having issues with producers and consumers frequently fully
>> > > disconnecting (from both the brokers and ZK) and reconnecting without
>> any
>> > > apparent cause. On our production systems it can happen anywhere from
>> > every
>> > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and
>> > > developer laptops, it can happen almost constantly.
>> > >
>> > > We see no errors in the logs (sample attached), just a message for
>> each
>> > of
>> > > our our consumers and producers disconnecting, then reconnecting. On
>> the
>> > > systems where it happens constantly, the consumers are not making any
>> > > progress.
>> > >
>> > > The logs on the brokers are equally unhelpful, they show only frequent
>> > > connects and reconnects, without any apparent cause.
>> > >
>> > > What could be causing this behavior?
>> > >
>> > >
>> >
>>
>

Reply via email to