Hi Team, Please help to find resolution for below kafka rolling upgrade issue.
Thanks, Yogesh On Monday, September 18, 2017 at 9:03:04 PM UTC+5:30, Yogesh Sangvikar wrote: > > Hi Team, > > Currently, we are using confluent 3.0.0 kafka cluster in our production > environment. And, we are planing to upgrade the kafka cluster for confluent > 3.2.2 > We are having topics with millions on records and data getting > continuously published to those topics. And, also, we are using other > confluent services like schema-registry, kafka connect and kafka rest to > process the data. > > So, we can't afford downtime upgrade for the platform. > > We have tries rolling kafka upgrade as suggested on blogs in Development > environment, > > https://docs.confluent.io/3.2.2/upgrade.html > > https://kafka.apache.org/documentation/#upgrade > > But, we are observing data loss on topics while doing rolling upgrade / > restart of kafka servers for "inter.broker.protocol.version=0.10.2". > > As per our observation, we suspect the root cause for the data loss > (explained for a topic partition having 3 replicas), > > - As the kafka broker protocol version updates from 0.10.0 to 0.10.2 > in rolling fashion, the in-sync replicas having older version will not > allow updated replicas (0.10.2) to be in sync unless are all updated. > - Also, we have explicitly disabled "unclean.leader.election.enabled" > property, so only in-sync replicas will be elected as leader for the given > partition. > - While doing rolling fashion update, as mentioned above, older > version leader is not allowing newer version replicas to be in sync, so > the > data pushed using this older version leader, will not be synced with other > replicas and if this leader(older version) goes down for an upgrade, > other > updated replicas will be shown in in-sync column and become leader, but > they lag in offset with old version leader and shows the offset of the > data > till they have synced. > - And, once the last replica comes up with updated version, will start > syncing data from the current leader. > > > Please let us know comments on our observation and suggest proper way for > rolling kafka upgrade as we can't afford downtime. > > Thanks, > Yogesh >