Re: Kafka retention policy per topic not working as expected

Christopher Smith Sun, 24 May 2020 17:11:30 -0700

It's a bit weird having the default policy for your brokers being compact,
but yes, the policy for the topic overrides the broker policy.


What you are seeing on with your example of the segment files makes sense
to me. With a  1210000000ms retention policy you've actually got 400
seconds beyond 14 days. The default 7 day segment rotation window, along
with a low data rate to ensure you don't hit segment.bytes limits, ensures
that after 21 days, you'd have three segments, each of which would still
potentially have data within the retention window, so there would be no
deletion. So seeing May 9th data on May 21st seems reasonable to me.

As to why you have messages going back 8 months on a topic with
retention.ms=1210000000,
I can only theorize. One case I could imagine would be if you had the
cleanup policy initially as compact, then it might have been doing
compaction for eight+ months. If so, you might have records in compacted &
merged segments that go back to the ends of time. So, if switched to a
"delete" policy recently, as long as those merged segments have even one
message in them from < 1210000000 ms ago, the delete policy will still not
consider it eligible for deletion.

The other possibility is that the cleaner just hasn't gotten around to
deleting those segments. Given that you have "compact" as your default
policy, you might have a lot of topics that are compacted and it's busy
churning through those.

--Chris

On Sun, May 24, 2020 at 2:46 PM Don Ky <asandy...@gmail.com> wrote:

> Hello all, any thoughts on this? Just pinging if anyone out there has any
> pointers on where to go from here? I was wondering if it could be that the
> topic is actually being compacted and not a retention policy of delete?
> Could this be because that the topics were initially created with a compact
> policy and now the delete policy isn't applicable?
>
> Thanks!
>
> On Thu, May 21, 2020 at 12:49 PM Don Ky <asandy...@gmail.com> wrote:
>
> > Hello Kafka users! I'm posting because we are having an issue with a
> topic
> > retention that we don't understand and need input on what could
> potentially
> > be wrong on our end if it's indeed is a configuration issue.¬
> > First we are running a containerized version of Kafka with the following
> > environment variables.¬
> > ¬
> > CONFLUENT_MINOR_VERSION=3¬
> > KAFKA_LOG_CLEANUP_POLICY=compact¬
> > KAFKA_LOG4J_LOGGERS=kafka.controller=ERROR¬
> > CONFLUENT_DEB_VERSION=1¬
> > KAFKA_VERSION=5.3.1¬
> > CONFLUENT_PATCH_VERSION=1¬
> > CONFLUENT_VERSION=5.3.1¬
> > KAFKA_LOG_RETENTION_BYTES=-1¬
> > PYTHON_VERSION=2.7.9-1¬
> > SCALA_VERSION=2.12¬
> > CONFLUENT_MAJOR_VERSION=5¬
> > KAFKA_NUM_PARTITIONS=4¬
> > ZULU_OPENJDK_VERSION=8=8.38.0.13¬
> > KAFKA_DEFAULT_REPLICATION_FACTOR=6¬
> > KAFKA_LOG_RETENTION_HOURS=2147483647¬
> > ¬
> > I'm listing these vars because they might be helpful for someone with a
> > keen eye to spot something we are missing. So based on this we should
> have
> > a global log clean up/retention of:¬
> > ¬
> > KAFKA_LOG_CLEANUP_POLICY of compact~¬
> > KAFKA_LOG_RETENTION_HOURS=2147483647 = 214 years~¬
> > KAFKA_LOG_RETENTION_BYTES=-1~¬
> > ¬
> > We currently have a topic called "ETD-TEST" shown below:¬
> > ¬
> > kafka-topics --describe --zookeeper $PRD:2181 --topic ETD-TEST¬
> > Topic: ETD-TEST▸--PartitionCount: 4▸ReplicationFactor: 6▸Configs:
> > cleanup.policy=delete,retention.ms=1210000000¬
> > ▸--Topic: ETD-TEST▸--Partition: 0▸--Leader: 1▸--Replicas:
> > 1,4,5,2,3,6▸--Isr: 5,1,6,2,3,4¬
> > ▸--Topic: ETD-TEST▸--Partition: 1▸--Leader: 1▸--Replicas:
> > 1,2,3,6,5,4▸--Isr: 5,1,6,2,3,4¬
> > ▸--Topic: ETD-TEST▸--Partition: 2▸--Leader: 4▸--Replicas:
> > 4,5,1,6,2,3▸--Isr: 5,1,6,2,3,4¬
> > ▸--Topic: ETD-TEST▸--Partition: 3▸--Leader: 1▸--Replicas:
> > 1,3,6,4,2,5▸--Isr: 5,1,6,2,3,4¬
> > ¬
> > Notice it has a clean up policy of delete which should override the
> global
> > setting of compact?¬
> > ¬
> > The issue is ETD-TEST has a topic level retention.ms value of 1210000000
> > (14 days) but we have messages further in the topic dating much further
> > back. Somewhere around 8 months. We only ran into this and were surprised
> > when we started consuming the messages and saw close to 2 million
> > messages.~¬
> > ¬
> > From the reading and research I've done in the last few days
> retention.ms
> > isn't necessarily how old messages/events in the topic should be but a
> > value of how long the segment files in the topic are kept around before
> > getting rotated out. (Please correct if this is false)¬
> > ¬
> > I learned that the segment files rotated also have a size + duration
> > value. The defaults being 1G for segment size (if a segment reaches this
> > size it gets rotated similar to logs. The segment.ms determines how old
> a
> > message should be also before it gets rotated the default being 7 days.
> So
> > if a log segment never reaches size it will eventually hit the
> segment.ms
> > of 7 days and be rotated.¬
> > ¬
> > The weird thing is this seems to only affect this one particular topic,
> > I've created a separate topic and flooded it with data and set up similar
> > retention/clean up policies and they seem to be deleting fine.¬
> > ¬
> > What adds to the mystery is here is our output of the segment files of
> one
> > of the brokers-0 in this example.¬
> >
> > $ ls -lah¬
> > total 642M¬
> > drwxr-xr-x   2 root root 4.0K May 15 17:28 .¬
> > drwxr-xr-x 281 root root  12K May 21 12:38 ..¬
> > -rw-r--r--   1 root root  16K May  9 08:03 00000000000000000000.index¬
> > -rw-r--r--   1 root root 277M May  9 08:03 00000000000000000000.log¬
> > -rw-r--r--   1 root root  23K May  9 08:03
> 00000000000000000000.timeindex¬
> > -rw-r--r--   1 root root  63K May 14 22:12 00000000000000507838.index¬
> > -rw-r--r--   1 root root 131M May 14 22:00 00000000000000507838.log¬
> > -rw-r--r--   1 root root   10 May  9 08:03 00000000000000507838.snapshot¬
> > -rw-r--r--   1 root root  94K May 14 22:12
> 00000000000000507838.timeindex¬
> > -rw-r--r--   1 root root  10M May 21 12:38 00000000000000518083.index¬
> > -rw-r--r--   1 root root 170M May 21 12:38 00000000000000518083.log¬
> > -rw-r--r--   1 root root   10 May 14 22:12 00000000000000518083.snapshot¬
> > -rw-r--r--   1 root root  10M May 21 12:38
> 00000000000000518083.timeindex¬
> > -rw-r--r--   1 root root  383 May 14 10:34 leader-epoch-checkpoint¬
> > ¬
> > We see the segment files are relatively small and there are files in
> there
> > since May 9, 2020, it's currently May, 21 2020 as I compose this email.¬
> > ¬
> > Inspecting the container logs themselves for this one particular topic
> and
> > see this:¬
> > ¬
> > {"log":"[2020-04-25 12:04:25,685] INFO Deleted log
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685214246Z"}¬
> > {"log":"[2020-04-25 12:04:25,685] INFO Deleted offset index
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685370948Z"}¬
> > {"log":"[2020-04-25 12:04:25,685] INFO Deleted time index
> >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.timeindex.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.68563615Z"}¬
> > {"log":"[2020-04-25 12:04:25,686] INFO Deleted log
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.log.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686287357Z"}¬
> > {"log":"[2020-04-25 12:04:25,686] INFO Deleted offset index
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.index.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686376758Z"}¬
> > {"log":"[2020-04-25 12:04:25,686] INFO Deleted time index
> >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.timeindex.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686436058Z"}¬
> > {"log":"[2020-04-25 12:04:25,688] INFO Deleted log
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.log.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688237176Z"}¬
> > {"log":"[2020-04-25 12:04:25,688] INFO Deleted offset index
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.index.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688251276Z"}¬
> > {"log":"[2020-04-25 12:04:25,688] INFO Deleted time index
> >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.timeindex.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688332877Z"}¬
> > {"log":"[2020-05-09 12:03:35,810] INFO Cleaner 0: Cleaning segment 0 in
> > log ETD-TEST-0 (largest timestamp Sat Apr 25 11:41:13 UTC 2020) into 0,
> > retaining deletes.
> >
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:35.810676351Z"}¬
> > {"log":"[2020-05-09 12:03:37,924] INFO Cleaner 0: Cleaning segment 483496
> > in log ETD-TEST-0 (largest timestamp Sat May 02 12:00:29 UTC 2020) into
> 0,
> > retaining deletes.
> >
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:37.924653038Z"}¬
> > {"log":"[2020-05-09 12:03:38,001] INFO Cleaner 0: Cleaning segment 495708
> > in log ETD-TEST-0 (largest timestamp Sat May 09 12:03:28 UTC 2020) into
> 0,
> > retaining deletes.
> >
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:38.001662613Z"}¬
> > {"log":"[2020-05-09 12:04:40,354] INFO Deleted log
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355179461Z"}¬
> > {"log":"[2020-05-09 12:04:40,355] INFO Deleted offset index
> > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted.
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355208261Z"}¬
> > ¬
> > What's weird the log cleaner says it's deleting the logs but they don't
> > appear to be? Also what's the deal with the retaining deletes line?¬
> > ¬
> > So based on this, can anyone let us know if there's anything funky with
> > our configurations or is this potentially a bug in Kafka?¬
> > ¬
> > Thanks for any input!
> >
> >
>


-- 
Chris

Re: Kafka retention policy per topic not working as expected

Reply via email to