Hello all, any thoughts on this? Just pinging if anyone out there has any pointers on where to go from here? I was wondering if it could be that the topic is actually being compacted and not a retention policy of delete? Could this be because that the topics were initially created with a compact policy and now the delete policy isn't applicable?
Thanks! On Thu, May 21, 2020 at 12:49 PM Don Ky <asandy...@gmail.com> wrote: > Hello Kafka users! I'm posting because we are having an issue with a topic > retention that we don't understand and need input on what could potentially > be wrong on our end if it's indeed is a configuration issue.¬ > First we are running a containerized version of Kafka with the following > environment variables.¬ > ¬ > CONFLUENT_MINOR_VERSION=3¬ > KAFKA_LOG_CLEANUP_POLICY=compact¬ > KAFKA_LOG4J_LOGGERS=kafka.controller=ERROR¬ > CONFLUENT_DEB_VERSION=1¬ > KAFKA_VERSION=5.3.1¬ > CONFLUENT_PATCH_VERSION=1¬ > CONFLUENT_VERSION=5.3.1¬ > KAFKA_LOG_RETENTION_BYTES=-1¬ > PYTHON_VERSION=2.7.9-1¬ > SCALA_VERSION=2.12¬ > CONFLUENT_MAJOR_VERSION=5¬ > KAFKA_NUM_PARTITIONS=4¬ > ZULU_OPENJDK_VERSION=8=8.38.0.13¬ > KAFKA_DEFAULT_REPLICATION_FACTOR=6¬ > KAFKA_LOG_RETENTION_HOURS=2147483647¬ > ¬ > I'm listing these vars because they might be helpful for someone with a > keen eye to spot something we are missing. So based on this we should have > a global log clean up/retention of:¬ > ¬ > KAFKA_LOG_CLEANUP_POLICY of compact~¬ > KAFKA_LOG_RETENTION_HOURS=2147483647 = 214 years~¬ > KAFKA_LOG_RETENTION_BYTES=-1~¬ > ¬ > We currently have a topic called "ETD-TEST" shown below:¬ > ¬ > kafka-topics --describe --zookeeper $PRD:2181 --topic ETD-TEST¬ > Topic: ETD-TEST▸--PartitionCount: 4▸ReplicationFactor: 6▸Configs: > cleanup.policy=delete,retention.ms=1210000000¬ > ▸--Topic: ETD-TEST▸--Partition: 0▸--Leader: 1▸--Replicas: > 1,4,5,2,3,6▸--Isr: 5,1,6,2,3,4¬ > ▸--Topic: ETD-TEST▸--Partition: 1▸--Leader: 1▸--Replicas: > 1,2,3,6,5,4▸--Isr: 5,1,6,2,3,4¬ > ▸--Topic: ETD-TEST▸--Partition: 2▸--Leader: 4▸--Replicas: > 4,5,1,6,2,3▸--Isr: 5,1,6,2,3,4¬ > ▸--Topic: ETD-TEST▸--Partition: 3▸--Leader: 1▸--Replicas: > 1,3,6,4,2,5▸--Isr: 5,1,6,2,3,4¬ > ¬ > Notice it has a clean up policy of delete which should override the global > setting of compact?¬ > ¬ > The issue is ETD-TEST has a topic level retention.ms value of 1210000000 > (14 days) but we have messages further in the topic dating much further > back. Somewhere around 8 months. We only ran into this and were surprised > when we started consuming the messages and saw close to 2 million > messages.~¬ > ¬ > From the reading and research I've done in the last few days retention.ms > isn't necessarily how old messages/events in the topic should be but a > value of how long the segment files in the topic are kept around before > getting rotated out. (Please correct if this is false)¬ > ¬ > I learned that the segment files rotated also have a size + duration > value. The defaults being 1G for segment size (if a segment reaches this > size it gets rotated similar to logs. The segment.ms determines how old a > message should be also before it gets rotated the default being 7 days. So > if a log segment never reaches size it will eventually hit the segment.ms > of 7 days and be rotated.¬ > ¬ > The weird thing is this seems to only affect this one particular topic, > I've created a separate topic and flooded it with data and set up similar > retention/clean up policies and they seem to be deleting fine.¬ > ¬ > What adds to the mystery is here is our output of the segment files of one > of the brokers-0 in this example.¬ > > $ ls -lah¬ > total 642M¬ > drwxr-xr-x 2 root root 4.0K May 15 17:28 .¬ > drwxr-xr-x 281 root root 12K May 21 12:38 ..¬ > -rw-r--r-- 1 root root 16K May 9 08:03 00000000000000000000.index¬ > -rw-r--r-- 1 root root 277M May 9 08:03 00000000000000000000.log¬ > -rw-r--r-- 1 root root 23K May 9 08:03 00000000000000000000.timeindex¬ > -rw-r--r-- 1 root root 63K May 14 22:12 00000000000000507838.index¬ > -rw-r--r-- 1 root root 131M May 14 22:00 00000000000000507838.log¬ > -rw-r--r-- 1 root root 10 May 9 08:03 00000000000000507838.snapshot¬ > -rw-r--r-- 1 root root 94K May 14 22:12 00000000000000507838.timeindex¬ > -rw-r--r-- 1 root root 10M May 21 12:38 00000000000000518083.index¬ > -rw-r--r-- 1 root root 170M May 21 12:38 00000000000000518083.log¬ > -rw-r--r-- 1 root root 10 May 14 22:12 00000000000000518083.snapshot¬ > -rw-r--r-- 1 root root 10M May 21 12:38 00000000000000518083.timeindex¬ > -rw-r--r-- 1 root root 383 May 14 10:34 leader-epoch-checkpoint¬ > ¬ > We see the segment files are relatively small and there are files in there > since May 9, 2020, it's currently May, 21 2020 as I compose this email.¬ > ¬ > Inspecting the container logs themselves for this one particular topic and > see this:¬ > ¬ > {"log":"[2020-04-25 12:04:25,685] INFO Deleted log > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685214246Z"}¬ > {"log":"[2020-04-25 12:04:25,685] INFO Deleted offset index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685370948Z"}¬ > {"log":"[2020-04-25 12:04:25,685] INFO Deleted time index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.timeindex.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.68563615Z"}¬ > {"log":"[2020-04-25 12:04:25,686] INFO Deleted log > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.log.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686287357Z"}¬ > {"log":"[2020-04-25 12:04:25,686] INFO Deleted offset index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.index.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686376758Z"}¬ > {"log":"[2020-04-25 12:04:25,686] INFO Deleted time index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.timeindex.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686436058Z"}¬ > {"log":"[2020-04-25 12:04:25,688] INFO Deleted log > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.log.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688237176Z"}¬ > {"log":"[2020-04-25 12:04:25,688] INFO Deleted offset index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.index.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688251276Z"}¬ > {"log":"[2020-04-25 12:04:25,688] INFO Deleted time index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.timeindex.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688332877Z"}¬ > {"log":"[2020-05-09 12:03:35,810] INFO Cleaner 0: Cleaning segment 0 in > log ETD-TEST-0 (largest timestamp Sat Apr 25 11:41:13 UTC 2020) into 0, > retaining deletes. > (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:35.810676351Z"}¬ > {"log":"[2020-05-09 12:03:37,924] INFO Cleaner 0: Cleaning segment 483496 > in log ETD-TEST-0 (largest timestamp Sat May 02 12:00:29 UTC 2020) into 0, > retaining deletes. > (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:37.924653038Z"}¬ > {"log":"[2020-05-09 12:03:38,001] INFO Cleaner 0: Cleaning segment 495708 > in log ETD-TEST-0 (largest timestamp Sat May 09 12:03:28 UTC 2020) into 0, > retaining deletes. > (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:38.001662613Z"}¬ > {"log":"[2020-05-09 12:04:40,354] INFO Deleted log > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355179461Z"}¬ > {"log":"[2020-05-09 12:04:40,355] INFO Deleted offset index > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted. > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355208261Z"}¬ > ¬ > What's weird the log cleaner says it's deleting the logs but they don't > appear to be? Also what's the deal with the retaining deletes line?¬ > ¬ > So based on this, can anyone let us know if there's anything funky with > our configurations or is this potentially a bug in Kafka?¬ > ¬ > Thanks for any input! > >