Peter, It does seem like KAFKA-9752 is the most likely suspect, although if your clients were upgraded to 2.6.1 then I don't believe they would be on an early enough version of the JoinGroup to run into this. I'm not 100% sure though, it may be a good idea to leave a comment on that ticket and ping Jason directly since he implemented the fix
Murilo, I agree that your problem is not likely to be KAFKA-9752, since that was caused by KAFKA-9232 and that code is not present in 2.2.1. But maybe you're hitting up on the issue which KAFKA-9232 was originally intended to fix? In any case, 2.2.1 is quite old now so there may be other known bugs which have since been fixed. I know it's not always possible/easy, but I would still recommend to upgrade your brokers to a more recent version if you can. On Fri, Feb 26, 2021 at 7:19 AM Murilo Tavares <murilo...@gmail.com> wrote: > Just to provide a bit more detail, I noticed Peter's pattern: > "Rebalance failed. org.apache.kafka.common.errors.DisconnectException: > null" > "(Re-)joining group" > > But I also get a different pattern, interchangeably: > Group coordinator broker-1:9092 (id: 2147483646 rack: null) is unavailable > or invalid due to cause: null.isDisconnected: true. Rediscovery will be > attempted. > Followed by > Discovered group coordinator broker-1:9092 (id: 2147483646 rack: null) > > > > On Fri, 26 Feb 2021 at 09:59, Murilo Tavares <murilo...@gmail.com> wrote: > > > Hi > > I got the same behaviour yesterday while trying to upgrade my > KafkaStreams > > app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1. > > > > Looking at KAFKA-9752 it mentions the cause being two other tickets: > > https://issues.apache.org/jira/browse/KAFKA-7610 > > https://issues.apache.org/jira/browse/KAFKA-9232 > > > > Although the first ticket seems fixed in 2.2.0, the latter was just fixed > > in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232. > > But what I don't understand is that KAFKA-9752 says: > > "Note that this is only possible if 1) we have a consumer using an old > > JoinGroup version, 2) the consumer times out and disconnects from its > > initial JoinGroup request." > > In this case, I guess my consumer is not using an old JoinGroup, as my > > consumers (KafkaStreams) are on 2.7.0... > > > > Thanks > > Murilo > > > > On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó > > <peter.sinoros-sz...@transferwise.com.invalid> wrote: > > > >> Hey Sophie, > >> > >> thanks for the link, I was checking that ticket, but I was not sure if > it > >> is relevant for our case. > >> Eventually we "fixed" our problem with reducing the session.timeout.ms > >> (it > >> was set to a high value for other reasons). > >> > >> But today, in another service, we faced the same problem when upgrading > >> the > >> Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the > brokers. > >> Do you think the same problem (KAFKA-9752) might cause this problem too? > >> It's hard to judge just based on the description of that ticket. > >> > >> Thanks, > >> Peter > >> > > >