> > Can you try using the console consumer to display messages/keys and > timestamps ? > --property print.key=true --property print.timestamp=true
There are a lot off messages so I'm picking an example without and with timeindex entry. All of them have a null key: Offset 57 CreateTime:1588074808027 Key:null - no time index Offset 144 CreateTime:1588157145655 Key:null - has time index Hmm, how are you doing your rolling deploys? It's rollout deployment, we take one node down and spin up another a new one. One at a time. I'm wondering if the time indexes are being corrupted by unclean > shutdowns. I've been reading code and the only path I could find that led > to a largest > timestamp of 0 was, as you've discovered, where there was no time index. > WRT to the corruption - the broker being SIGKILLed (systemctl by default > sends SIGKILL 90 seconds after SIGTERM, and our broker needed 120s to shut > down cleanly) has caused index corruption for us in the past - although in > our case it was recovered from automatically by the broker. Just took 2 > hours. This would be a perfect justification for it but we use systemctl stop and it takes around 4 seconds to shut down so I believe it ends gracefully before SIGKILL? Also, are you moving between versions with these deploys? No, we have several clusters where this is happening. The information I showed you is from a cluster with version 2.3 but with 10.2 for inter broker protocol communication and log format. We have also experienced this in fully updated 2.4 and 2.4.1 clusters. But to sum, the experiences are done always deploying (again) the version already there. Thanks all for the efforts so far. Em qua., 29 de abr. de 2020 às 13:01, Nicolas Carlot < nicolas.car...@chronopost.fr> escreveu: > Can you try using the console consumer to display messages/keys and > timestamps ? > --property print.key=true --property print.timestamp=true > > > Le mer. 29 avr. 2020 à 13:23, JP MB <jose.brandao1...@gmail.com> a écrit : > > > The server is in UTC, [2020-04-27 10:36:40,386] was actually my time. On > > the server was 9:36. > > It doesn't look like a timezone problem because it cleans properly other > > records, exactly 48 hours. > > > > Em qua., 29 de abr. de 2020 às 11:26, Goran Sliskovic > > <gslis...@yahoo.com.invalid> escreveu: > > > > > Hi, > > > When lastModifiedTime on that segment is converted to human readable > > time: > > > Monday, April 27, 2020 9:14:19 AM UTC > > > > > > In what time zone is the server (IOW: [2020-04-27 10:36:40,386] from > the > > > log is in what time zone)? > > > It looks as largestTime is property of log record and 0 means the log > > > record is empty. > > > > > > On Tuesday, April 28, 2020, 04:37:03 PM GMT+2, JP MB < > > > jose.brandao1...@gmail.com> wrote: > > > > > > Hi, > > > We have messages disappearing from topics on Apache Kafka with versions > > > 2.3, 2.4.0, 2.4.1 and 2.5.0. We noticed this when we make a rolling > > > deployment of our clusters and unfortunately it doesn't happen every > > time, > > > so it's very inconsistent. > > > > > > Sometimes we lose all messages inside a topic, other times we lose all > > > messages inside a partition. When this happens the following log is a > > > constant: > > > > > > [2020-04-27 10:36:40,386] INFO [Log partition=test-lost-messages-5, > > > dir=/var/kafkadata/data01/data] Deleting segments > > > List(LogSegment(baseOffset=6, size=728, > > > lastModifiedTime=1587978859000, largestTime=0)) (kafka.log.Log) > > > > > > There is also a previous log saying this segment hit the retention time > > > breach of 48 hours. In this example, the message was produced ~12 > minutes > > > before the deployment. > > > > > > Notice, all messages that are wrongly deleted havelargestTime=0 and the > > > ones that are properly deleted have a valid timestamp in there. From > what > > > we read from documentation and code it looks like the largestTime is > used > > > to calculate if a given segment reached the time breach or not. > > > > > > Since we can observe this in multiple versions of Kafka, we think this > > > might be related to anything external to Kafka. E.g Zookeeper. > > > > > > Does anyone have any ideas of why this could be happening? > > > For the record, we are using Zookeeper 3.6.0. > > > > > > > > -- > *Nicolas Carlot* > > Lead dev > | | nicolas.car...@chronopost.fr > > > *Veuillez noter qu'à partir du 20 mai, le siège Chronopost déménage. La > nouvelle adresse est : 3 boulevard Romain Rolland 75014 Paris* > > [image: Logo Chronopost] > | chronopost.fr <http://www.chronopost.fr/> > Suivez nous sur Facebook <https://fr-fr.facebook.com/chronopost> et > Twitter > <https://twitter.com/chronopost>. > > [image: DPD Group] >