Re: Kafka retention policy per topic not working as expected

Don Ky Sun, 24 May 2020 14:46:48 -0700

Hello all, any thoughts on this? Just pinging if anyone out there has any
pointers on where to go from here? I was wondering if it could be that the
topic is actually being compacted and not a retention policy of delete?
Could this be because that the topics were initially created with a compact
policy and now the delete policy isn't applicable?


Thanks!

On Thu, May 21, 2020 at 12:49 PM Don Ky <asandy...@gmail.com> wrote:

> Hello Kafka users! I'm posting because we are having an issue with a topic
> retention that we don't understand and need input on what could potentially
> be wrong on our end if it's indeed is a configuration issue.¬
> First we are running a containerized version of Kafka with the following
> environment variables.¬
> ¬
> CONFLUENT_MINOR_VERSION=3¬
> KAFKA_LOG_CLEANUP_POLICY=compact¬
> KAFKA_LOG4J_LOGGERS=kafka.controller=ERROR¬
> CONFLUENT_DEB_VERSION=1¬
> KAFKA_VERSION=5.3.1¬
> CONFLUENT_PATCH_VERSION=1¬
> CONFLUENT_VERSION=5.3.1¬
> KAFKA_LOG_RETENTION_BYTES=-1¬
> PYTHON_VERSION=2.7.9-1¬
> SCALA_VERSION=2.12¬
> CONFLUENT_MAJOR_VERSION=5¬
> KAFKA_NUM_PARTITIONS=4¬
> ZULU_OPENJDK_VERSION=8=8.38.0.13¬
> KAFKA_DEFAULT_REPLICATION_FACTOR=6¬
> KAFKA_LOG_RETENTION_HOURS=2147483647¬
> ¬
> I'm listing these vars because they might be helpful for someone with a
> keen eye to spot something we are missing. So based on this we should have
> a global log clean up/retention of:¬
> ¬
> KAFKA_LOG_CLEANUP_POLICY of compact~¬
> KAFKA_LOG_RETENTION_HOURS=2147483647 = 214 years~¬
> KAFKA_LOG_RETENTION_BYTES=-1~¬
> ¬
> We currently have a topic called "ETD-TEST" shown below:¬
> ¬
> kafka-topics --describe --zookeeper $PRD:2181 --topic ETD-TEST¬
> Topic: ETD-TEST▸--PartitionCount: 4▸ReplicationFactor: 6▸Configs:
> cleanup.policy=delete,retention.ms=1210000000¬
> ▸--Topic: ETD-TEST▸--Partition: 0▸--Leader: 1▸--Replicas:
> 1,4,5,2,3,6▸--Isr: 5,1,6,2,3,4¬
> ▸--Topic: ETD-TEST▸--Partition: 1▸--Leader: 1▸--Replicas:
> 1,2,3,6,5,4▸--Isr: 5,1,6,2,3,4¬
> ▸--Topic: ETD-TEST▸--Partition: 2▸--Leader: 4▸--Replicas:
> 4,5,1,6,2,3▸--Isr: 5,1,6,2,3,4¬
> ▸--Topic: ETD-TEST▸--Partition: 3▸--Leader: 1▸--Replicas:
> 1,3,6,4,2,5▸--Isr: 5,1,6,2,3,4¬
> ¬
> Notice it has a clean up policy of delete which should override the global
> setting of compact?¬
> ¬
> The issue is ETD-TEST has a topic level retention.ms value of 1210000000
> (14 days) but we have messages further in the topic dating much further
> back. Somewhere around 8 months. We only ran into this and were surprised
> when we started consuming the messages and saw close to 2 million
> messages.~¬
> ¬
> From the reading and research I've done in the last few days retention.ms
> isn't necessarily how old messages/events in the topic should be but a
> value of how long the segment files in the topic are kept around before
> getting rotated out. (Please correct if this is false)¬
> ¬
> I learned that the segment files rotated also have a size + duration
> value. The defaults being 1G for segment size (if a segment reaches this
> size it gets rotated similar to logs. The segment.ms determines how old a
> message should be also before it gets rotated the default being 7 days. So
> if a log segment never reaches size it will eventually hit the segment.ms
> of 7 days and be rotated.¬
> ¬
> The weird thing is this seems to only affect this one particular topic,
> I've created a separate topic and flooded it with data and set up similar
> retention/clean up policies and they seem to be deleting fine.¬
> ¬
> What adds to the mystery is here is our output of the segment files of one
> of the brokers-0 in this example.¬
>
> $ ls -lah¬
> total 642M¬
> drwxr-xr-x   2 root root 4.0K May 15 17:28 .¬
> drwxr-xr-x 281 root root  12K May 21 12:38 ..¬
> -rw-r--r--   1 root root  16K May  9 08:03 00000000000000000000.index¬
> -rw-r--r--   1 root root 277M May  9 08:03 00000000000000000000.log¬
> -rw-r--r--   1 root root  23K May  9 08:03 00000000000000000000.timeindex¬
> -rw-r--r--   1 root root  63K May 14 22:12 00000000000000507838.index¬
> -rw-r--r--   1 root root 131M May 14 22:00 00000000000000507838.log¬
> -rw-r--r--   1 root root   10 May  9 08:03 00000000000000507838.snapshot¬
> -rw-r--r--   1 root root  94K May 14 22:12 00000000000000507838.timeindex¬
> -rw-r--r--   1 root root  10M May 21 12:38 00000000000000518083.index¬
> -rw-r--r--   1 root root 170M May 21 12:38 00000000000000518083.log¬
> -rw-r--r--   1 root root   10 May 14 22:12 00000000000000518083.snapshot¬
> -rw-r--r--   1 root root  10M May 21 12:38 00000000000000518083.timeindex¬
> -rw-r--r--   1 root root  383 May 14 10:34 leader-epoch-checkpoint¬
> ¬
> We see the segment files are relatively small and there are files in there
> since May 9, 2020, it's currently May, 21 2020 as I compose this email.¬
> ¬
> Inspecting the container logs themselves for this one particular topic and
> see this:¬
> ¬
> {"log":"[2020-04-25 12:04:25,685] INFO Deleted log
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685214246Z"}¬
> {"log":"[2020-04-25 12:04:25,685] INFO Deleted offset index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685370948Z"}¬
> {"log":"[2020-04-25 12:04:25,685] INFO Deleted time index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.timeindex.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.68563615Z"}¬
> {"log":"[2020-04-25 12:04:25,686] INFO Deleted log
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.log.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686287357Z"}¬
> {"log":"[2020-04-25 12:04:25,686] INFO Deleted offset index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.index.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686376758Z"}¬
> {"log":"[2020-04-25 12:04:25,686] INFO Deleted time index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.timeindex.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686436058Z"}¬
> {"log":"[2020-04-25 12:04:25,688] INFO Deleted log
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.log.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688237176Z"}¬
> {"log":"[2020-04-25 12:04:25,688] INFO Deleted offset index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.index.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688251276Z"}¬
> {"log":"[2020-04-25 12:04:25,688] INFO Deleted time index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.timeindex.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688332877Z"}¬
> {"log":"[2020-05-09 12:03:35,810] INFO Cleaner 0: Cleaning segment 0 in
> log ETD-TEST-0 (largest timestamp Sat Apr 25 11:41:13 UTC 2020) into 0,
> retaining deletes.
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:35.810676351Z"}¬
> {"log":"[2020-05-09 12:03:37,924] INFO Cleaner 0: Cleaning segment 483496
> in log ETD-TEST-0 (largest timestamp Sat May 02 12:00:29 UTC 2020) into 0,
> retaining deletes.
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:37.924653038Z"}¬
> {"log":"[2020-05-09 12:03:38,001] INFO Cleaner 0: Cleaning segment 495708
> in log ETD-TEST-0 (largest timestamp Sat May 09 12:03:28 UTC 2020) into 0,
> retaining deletes.
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:38.001662613Z"}¬
> {"log":"[2020-05-09 12:04:40,354] INFO Deleted log
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355179461Z"}¬
> {"log":"[2020-05-09 12:04:40,355] INFO Deleted offset index
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted.
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355208261Z"}¬
> ¬
> What's weird the log cleaner says it's deleting the logs but they don't
> appear to be? Also what's the deal with the retaining deletes line?¬
> ¬
> So based on this, can anyone let us know if there's anything funky with
> our configurations or is this potentially a bug in Kafka?¬
> ¬
> Thanks for any input!
>
>

Re: Kafka retention policy per topic not working as expected

Reply via email to