Re: [DISCUSS] Different retention semantics for active segment rotation

2024-03-21 Thread Jorge Esteban Quilcate Otoya
Sure! good to know that is tracked.

Thanks, Luke!

On Thu, 21 Mar 2024 at 07:52, Luke Chen  wrote:

> Hi Jorge,
>
> You should check the JIRA:
> https://issues.apache.org/jira/browse/KAFKA-16385
> where we had some discussion.
> Welcome to provide your thoughts there.
>
> Thanks.
> Luke
>
> On Thu, Mar 21, 2024 at 3:33 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi dev community,
> >
> > I'd like to share some findings on how rotation of active segment differ
> > depending on whether topic retention is time- or size-based.
> >
> > I was (wrongly) under the assumption that active segments are only
> rotated
> > when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global
> > log configs (log.roll.ms) force it  -- regardless of the retention
> > configuration.
> > This seems to be different depending on how retention is defined:
> >
> > - If a topic has a retention based on time[1], the condition to rotate
> the
> > active segment is based on the latest timestamp. If the difference with
> > current time is largest than retention time, then segment (including
> > active) should be deleted. Active segment is rotated, and in next round
> is
> > deleted.
> >
> > - If a topic has retention based on size[2] though, the condition not
> only
> > depends on the size of the segment itself but first on the total log
> size,
> > forcing to always have at least a single (active) segment: first
> difference
> > between total log size and retention is calculated, let's say a single
> > segment of 5MB and retention is 1MB; then total difference is 4MB, then
> the
> > condition to delete validates if the difference of the current segment
> and
> > the total difference is higher than zero, then delete. As the segment
> size
> > will always be higher than the total difference when there is a single
> > segment, then there will always be at least 1 segment. In this case the
> > only case where active segment is rotated it is when a new message
> arrives.
> >
> > Added steps to reproduce[3].
> >
> > Maybe I missing something obvious, but this seems inconsistent to me.
> > Either both retention configs should rotate active segments, or none of
> > them should and active segment should be only governed by segment
> bytes|ms
> > configs or log.roll config.
> >
> > I believe it's a useful feature to "force" active segment rotation
> without
> > changing segment of global log rotation given that features like
> Compaction
> > and Tiered Storage can benefit from this; but would like to clarify this
> > behavior and make it consistent for both retention options, and/or call
> it
> > out explicitly in the documentation.
> >
> > Looking forward to your feedback!
> >
> > Jorge.
> >
> > [1]:
> >
> >
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566
> > [2]:
> >
> >
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583
> >
> > [3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2
> >
>


Re: [DISCUSS] Different retention semantics for active segment rotation

2024-03-21 Thread Luke Chen
Hi Jorge,

You should check the JIRA: https://issues.apache.org/jira/browse/KAFKA-16385
where we had some discussion.
Welcome to provide your thoughts there.

Thanks.
Luke

On Thu, Mar 21, 2024 at 3:33 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi dev community,
>
> I'd like to share some findings on how rotation of active segment differ
> depending on whether topic retention is time- or size-based.
>
> I was (wrongly) under the assumption that active segments are only rotated
> when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global
> log configs (log.roll.ms) force it  -- regardless of the retention
> configuration.
> This seems to be different depending on how retention is defined:
>
> - If a topic has a retention based on time[1], the condition to rotate the
> active segment is based on the latest timestamp. If the difference with
> current time is largest than retention time, then segment (including
> active) should be deleted. Active segment is rotated, and in next round is
> deleted.
>
> - If a topic has retention based on size[2] though, the condition not only
> depends on the size of the segment itself but first on the total log size,
> forcing to always have at least a single (active) segment: first difference
> between total log size and retention is calculated, let's say a single
> segment of 5MB and retention is 1MB; then total difference is 4MB, then the
> condition to delete validates if the difference of the current segment and
> the total difference is higher than zero, then delete. As the segment size
> will always be higher than the total difference when there is a single
> segment, then there will always be at least 1 segment. In this case the
> only case where active segment is rotated it is when a new message arrives.
>
> Added steps to reproduce[3].
>
> Maybe I missing something obvious, but this seems inconsistent to me.
> Either both retention configs should rotate active segments, or none of
> them should and active segment should be only governed by segment bytes|ms
> configs or log.roll config.
>
> I believe it's a useful feature to "force" active segment rotation without
> changing segment of global log rotation given that features like Compaction
> and Tiered Storage can benefit from this; but would like to clarify this
> behavior and make it consistent for both retention options, and/or call it
> out explicitly in the documentation.
>
> Looking forward to your feedback!
>
> Jorge.
>
> [1]:
>
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566
> [2]:
>
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583
>
> [3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2
>