log.retention.bytes can somewhat help. but it is cumbersome to use because it is a per-topic config for partition limit.
there was an earlier thread regarding global bytes limit. that will work well for my purpose of avoiding disk full. https://issues.apache.org/jira/browse/KAFKA-1489 On Thu, Jul 31, 2014 at 7:39 PM, Joe Stein <joe.st...@stealth.ly> wrote: > What version of Kafka are your using? Have you tried log.retention.bytes? > Which ever comes first (ttl or bytes total) should do what you are looking > for if I understand you right. > http://kafka.apache.org/documentation.html#brokerconfigs > > /******************************************* > Joe Stein > Founder, Principal Consultant > Big Data Open Source Security LLC > http://www.stealth.ly > Twitter: @allthingshadoop > ********************************************/ > On Jul 31, 2014 6:52 PM, "Steven Wu" <steve...@netflix.com.invalid> wrote: > > > it seems that log retention is purely based on last touch/modified > > timestamp. This is undesirable for code push in aws/cloud. > > > > e.g. let's say retention window is 24 hours. disk size is 1 TB. disk util > > is 60% (600GB). when new instance comes up, it will fetch log files > (600GB) > > from peers. those log files all have newer timestamps. they won't be > purged > > until 24 hours later. note that during the first 24 hours, new msgs > > (another 600GB) continue to come in. This can cause disk full problem > > without any intervention. With this behavior, we have to keep disk util > > under 50%. > > > > can last modified timestamp be inserted into the file name when rolling > > over log files? then kafka can check the file name for timestamp. does > this > > make sense? > > > > Thanks, > > Steven > > >