Fwd: Data loss while upgrading confluent 3.0.0 kafka cluster to confluent 3.2.2

Yogesh Sangvikar Mon, 18 Sep 2017 09:08:55 -0700

Hi Team,

Please help to find resolution for below kafka rolling upgrade issue.


Thanks,

Yogesh

On Monday, September 18, 2017 at 9:03:04 PM UTC+5:30, Yogesh Sangvikar 
wrote:
>
> Hi Team,
>
> Currently, we are using confluent 3.0.0 kafka cluster in our production 
> environment. And, we are planing to upgrade the kafka cluster for confluent 
> 3.2.2 
> We are having topics with millions on records and data getting 
> continuously published to those topics. And, also, we are using other 
> confluent services like schema-registry, kafka connect and kafka rest to 
> process the data.
>
> So, we can't afford downtime upgrade for the platform.
>
> We have tries rolling kafka upgrade as suggested on blogs in Development 
> environment,
>
> https://docs.confluent.io/3.2.2/upgrade.html
>
> https://kafka.apache.org/documentation/#upgrade
>
> But, we are observing data loss on topics while doing rolling upgrade / 
> restart of kafka servers for "inter.broker.protocol.version=0.10.2".
>
> As per our observation, we suspect the root cause for the data loss 
> (explained for a topic partition having 3 replicas), 
>
>    - As the kafka broker protocol version updates from 0.10.0 to 0.10.2 
>    in rolling fashion, the in-sync replicas having older version will not 
>    allow updated replicas (0.10.2) to be in sync unless are all updated. 
>    - Also, we have explicitly disabled "unclean.leader.election.enabled" 
>    property, so only in-sync replicas will be elected as leader for the given 
>    partition.
>    - While doing rolling fashion update, as mentioned above, older 
>    version leader is not allowing newer version replicas to be in sync, so 
> the 
>    data pushed using this older version leader, will not be synced with other 
>    replicas and if this leader(older version)  goes down for an upgrade, 
> other 
>    updated replicas will be shown in in-sync column and become leader, but 
>    they lag in offset with old version leader and shows the offset of the 
> data 
>    till they have synced.
>    - And, once the last replica comes up with updated version, will start 
>    syncing data from the current leader.  
>
>
> Please let us know comments on our observation and suggest proper way for 
> rolling kafka upgrade as we can't afford downtime.
>
> Thanks,
> Yogesh
>

Fwd: Data loss while upgrading confluent 3.0.0 kafka cluster to confluent 3.2.2

Reply via email to