What version of Kafka are your using? Have you tried log.retention.bytes? Which ever comes first (ttl or bytes total) should do what you are looking for if I understand you right. http://kafka.apache.org/documentation.html#brokerconfigs
/******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/ On Jul 31, 2014 6:52 PM, "Steven Wu" <steve...@netflix.com.invalid> wrote: > it seems that log retention is purely based on last touch/modified > timestamp. This is undesirable for code push in aws/cloud. > > e.g. let's say retention window is 24 hours. disk size is 1 TB. disk util > is 60% (600GB). when new instance comes up, it will fetch log files (600GB) > from peers. those log files all have newer timestamps. they won't be purged > until 24 hours later. note that during the first 24 hours, new msgs > (another 600GB) continue to come in. This can cause disk full problem > without any intervention. With this behavior, we have to keep disk util > under 50%. > > can last modified timestamp be inserted into the file name when rolling > over log files? then kafka can check the file name for timestamp. does this > make sense? > > Thanks, > Steven >