Thanks for the response. You’re right, a single broker downtime shouldn’t impact consumer reads in a healthy replicated cluster.
However, my concern is slightly different; I am referring to a scenario where the entire Kafka cluster is down (for example, due to a maintenance window or infrastructure issue) and is brought back up after the topic’s retention period has already expired. In that case, since Kafka deletes segments purely based on timestamps, it might start deleting data immediately upon startup, even if the messages were never consumed. -----Original Message----- From: Artem Timchenko <[email protected]> Sent: 12 November 2025 19:13 To: [email protected] Subject: Re: Query: Preventing Message Loss Due to Retention Expiry in Strimzi Kafka In production-grade clusters downtime of a single broker shouldn't prevent consumers from reading messages and catch up with the offset. What replication factor are you using? On Wed, Nov 12, 2025 at 10:44 AM Prateek Kohli <[email protected]> wrote: > Hi, > > I am looking for a reliable, production-safe strategy to avoid losing > unread messages when a Kafka broker remains down longer than the > topic's configured retention.ms. > > Since Kafka deletes segments purely based on timestamps, if a broker > is down for (for example) 24 hours and the topic's retention.ms is > also 24 hours, the broker may start deleting segments immediately on > startup, even if no consumers have read those messages yet. > > Is there a recommended way to prevent message loss in this scenario? > > I am running Kafka on Kubernetes using Strimzi, so all topic > configurations are managed through KafkaTopic CRDs and the Topic Operator. > > One solution could to be alter the topic's retention configuration. > But for that to work I would need to ensure that its triggered before > Kafka delete the log segments. So could something be done during startup? > > For example, with a 3-broker cluster, I could prevent the brokers from > fully starting after the first pod comes up, update the retention > values in the Strimzi Kafka CR, and then let the operator complete the > rollout so the cluster restarts with the new retention. Is this safe, > or is there a better recommended approach to ensure that unread > messages are preserved after long broker downtime? > > Regards, > Prateek Kohli > >
