Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Abhijeet Kumar
Thank you all for your comments. As all the comments in the thread are
addressed, I am starting a Vote thread for the KIP. Please have a look.

Regards.
Abhijeet.



On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:

> Hi, Abhijeet,
>
> Thanks for the update.
>
> I have no more comments.
>
> Luke
>
> On Thu, Apr 25, 2024 at 4:21 AM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the updated KIP. It looks good to me.
> >
> > Jun
> >
> > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Please find my comments inline.
> > >
> > >
> > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 1. I am wondering if we could achieve the same result by just
> lowering
> > > > local.retention.ms and local.retention.bytes. This also allows the
> > newly
> > > > started follower to build up the local data before serving the
> consumer
> > > > traffic.
> > > >
> > >
> > > I am not sure I fully followed this. Do you mean we could lower the
> > > local.retention (by size and time)
> > > so that there is little data on the leader's local storage so that the
> > > follower can quickly catch up with the leader?
> > >
> > > In that case, we will need to set small local retention across brokers
> in
> > > the cluster. It will have the undesired
> > > effect where there will be increased remote log fetches for serving
> > consume
> > > requests, and this can cause
> > > degradations. Also, this behaviour (of increased remote fetches) will
> > > happen on all brokers at all times, whereas in
> > > the KIP we are restricting the behavior only to the newly bootstrapped
> > > brokers and only until the time it fully builds
> > > the local logs as per retention defined at the cluster level.
> > > (Deprioritization of the broker could help reduce the impact
> > >  even further)
> > >
> > >
> > > >
> > > > 2. Have you updated the KIP?
> > > >
> > >
> > > The KIP has been updated now.
> > >
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > > also to the existing section that talked about the drawback when a
> > > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > > mentioned, we plan to have a follow-up KIP that will address these
> by
> > > > > having a deprioritzation of these brokers. The deprioritization of
> > > > > those brokers can be removed once they catchup until the local log
> > > > > retention.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > > > >
> > > > > > Hi Abhijeet,
> > > > > >
> > > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > > >
> > > > > > Questions:
> > > > > > 1. We could also get the "pending-upload-offset" and epoch via
> > remote
> > > > log
> > > > > > metadata, instead of adding a new API to fetch from the leader.
> > Could
> > > > you
> > > > > > explain why you choose the later approach, instead of the former?
> > > > > > 2.
> > > > > > > We plan to have a follow-up KIP that will address both the
> > > > > > deprioritization
> > > > > > of these brokers from leadership, as well as
> > > > > > for consumption (when fetching from followers is allowed).
> > > > > >
> > > > > > I agree with Jun that we might need to make it clear all possible
> > > > > drawbacks
> > > > > > that could have. So, could we add the drawbacks that Jun
> mentioned
> > > > about
> > > > > > the performance issue when consumer fetch from follower?
> > > > > >
> > > > > > 3. Could we add "Rejected Alternatives" section to the end of the
> > KIP
> > > > to
> > > > > > add some of them?
> > > > > > Like the "ListOffsetRequest" approach VS
> > > > "Earliest-Pending-Upload-Offset"
> > > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from
> > remote
> > > > log
> > > > > > metadata... etc.
> > > > > >
> > > > > > Thanks.
> > > > > > Luke
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > > > abhijeet.cse@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Christo,
> > > > > > >
> > > > > > > Please find my comments inline.
> > > > > > >
> > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > > > christolo...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello Abhijeet and Jun,
> > > > > > > >
> > > > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > > > >
> > > > > > > > re: Jun
> > > > > > > >
> > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new
> > timestamps
> > > -
> > > > in
> > > > > > > > retrospect it should have been fairly obvious. I would need
> to
> > go
> > > > an
> > > > > > > update
> > > > > > > > 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Luke Chen
Hi, Abhijeet,

Thanks for the update.

I have no more comments.

Luke

On Thu, Apr 25, 2024 at 4:21 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the updated KIP. It looks good to me.
>
> Jun
>
> On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > Please find my comments inline.
> >
> >
> > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the reply.
> > >
> > > 1. I am wondering if we could achieve the same result by just lowering
> > > local.retention.ms and local.retention.bytes. This also allows the
> newly
> > > started follower to build up the local data before serving the consumer
> > > traffic.
> > >
> >
> > I am not sure I fully followed this. Do you mean we could lower the
> > local.retention (by size and time)
> > so that there is little data on the leader's local storage so that the
> > follower can quickly catch up with the leader?
> >
> > In that case, we will need to set small local retention across brokers in
> > the cluster. It will have the undesired
> > effect where there will be increased remote log fetches for serving
> consume
> > requests, and this can cause
> > degradations. Also, this behaviour (of increased remote fetches) will
> > happen on all brokers at all times, whereas in
> > the KIP we are restricting the behavior only to the newly bootstrapped
> > brokers and only until the time it fully builds
> > the local logs as per retention defined at the cluster level.
> > (Deprioritization of the broker could help reduce the impact
> >  even further)
> >
> >
> > >
> > > 2. Have you updated the KIP?
> > >
> >
> > The KIP has been updated now.
> >
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > also to the existing section that talked about the drawback when a
> > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > mentioned, we plan to have a follow-up KIP that will address these by
> > > > having a deprioritzation of these brokers. The deprioritization of
> > > > those brokers can be removed once they catchup until the local log
> > > > retention.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > > >
> > > > > Hi Abhijeet,
> > > > >
> > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > >
> > > > > Questions:
> > > > > 1. We could also get the "pending-upload-offset" and epoch via
> remote
> > > log
> > > > > metadata, instead of adding a new API to fetch from the leader.
> Could
> > > you
> > > > > explain why you choose the later approach, instead of the former?
> > > > > 2.
> > > > > > We plan to have a follow-up KIP that will address both the
> > > > > deprioritization
> > > > > of these brokers from leadership, as well as
> > > > > for consumption (when fetching from followers is allowed).
> > > > >
> > > > > I agree with Jun that we might need to make it clear all possible
> > > > drawbacks
> > > > > that could have. So, could we add the drawbacks that Jun mentioned
> > > about
> > > > > the performance issue when consumer fetch from follower?
> > > > >
> > > > > 3. Could we add "Rejected Alternatives" section to the end of the
> KIP
> > > to
> > > > > add some of them?
> > > > > Like the "ListOffsetRequest" approach VS
> > > "Earliest-Pending-Upload-Offset"
> > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from
> remote
> > > log
> > > > > metadata... etc.
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > >
> > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Christo,
> > > > > >
> > > > > > Please find my comments inline.
> > > > > >
> > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > > christolo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Abhijeet and Jun,
> > > > > > >
> > > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > > >
> > > > > > > re: Jun
> > > > > > >
> > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new
> timestamps
> > -
> > > in
> > > > > > > retrospect it should have been fairly obvious. I would need to
> go
> > > an
> > > > > > update
> > > > > > > KIP-1005 myself then, thank you for giving the useful
> reference!
> > > > > > >
> > > > > > > 4. I think Abhijeet wants to rebuild state from
> > > latest-tiered-offset
> > > > and
> > > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > > > replicas
> > > > > > > which experienced a disk failure) to decrease the time a
> > partition
> > > > spends
> > > > > > > in under-replicated state. In other words, a follower which has
> > > just
> > > > > > fallen
> > > > > > > out of ISR, but has local data will 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-24 Thread Jun Rao
Hi, Abhijeet,

Thanks for the updated KIP. It looks good to me.

Jun

On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar 
wrote:

> Hi Jun,
>
> Please find my comments inline.
>
>
> On Thu, Apr 18, 2024 at 3:26 AM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the reply.
> >
> > 1. I am wondering if we could achieve the same result by just lowering
> > local.retention.ms and local.retention.bytes. This also allows the newly
> > started follower to build up the local data before serving the consumer
> > traffic.
> >
>
> I am not sure I fully followed this. Do you mean we could lower the
> local.retention (by size and time)
> so that there is little data on the leader's local storage so that the
> follower can quickly catch up with the leader?
>
> In that case, we will need to set small local retention across brokers in
> the cluster. It will have the undesired
> effect where there will be increased remote log fetches for serving consume
> requests, and this can cause
> degradations. Also, this behaviour (of increased remote fetches) will
> happen on all brokers at all times, whereas in
> the KIP we are restricting the behavior only to the newly bootstrapped
> brokers and only until the time it fully builds
> the local logs as per retention defined at the cluster level.
> (Deprioritization of the broker could help reduce the impact
>  even further)
>
>
> >
> > 2. Have you updated the KIP?
> >
>
> The KIP has been updated now.
>
>
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
> > wrote:
> >
> > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > also to the existing section that talked about the drawback when a
> > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > mentioned, we plan to have a follow-up KIP that will address these by
> > > having a deprioritzation of these brokers. The deprioritization of
> > > those brokers can be removed once they catchup until the local log
> > > retention.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > >
> > > > Hi Abhijeet,
> > > >
> > > > Thanks for the KIP to improve the tiered storage feature!
> > > >
> > > > Questions:
> > > > 1. We could also get the "pending-upload-offset" and epoch via remote
> > log
> > > > metadata, instead of adding a new API to fetch from the leader. Could
> > you
> > > > explain why you choose the later approach, instead of the former?
> > > > 2.
> > > > > We plan to have a follow-up KIP that will address both the
> > > > deprioritization
> > > > of these brokers from leadership, as well as
> > > > for consumption (when fetching from followers is allowed).
> > > >
> > > > I agree with Jun that we might need to make it clear all possible
> > > drawbacks
> > > > that could have. So, could we add the drawbacks that Jun mentioned
> > about
> > > > the performance issue when consumer fetch from follower?
> > > >
> > > > 3. Could we add "Rejected Alternatives" section to the end of the KIP
> > to
> > > > add some of them?
> > > > Like the "ListOffsetRequest" approach VS
> > "Earliest-Pending-Upload-Offset"
> > > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote
> > log
> > > > metadata... etc.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > >
> > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Christo,
> > > > >
> > > > > Please find my comments inline.
> > > > >
> > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > christolo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello Abhijeet and Jun,
> > > > > >
> > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > >
> > > > > > re: Jun
> > > > > >
> > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps
> -
> > in
> > > > > > retrospect it should have been fairly obvious. I would need to go
> > an
> > > > > update
> > > > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > > > >
> > > > > > 4. I think Abhijeet wants to rebuild state from
> > latest-tiered-offset
> > > and
> > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > > replicas
> > > > > > which experienced a disk failure) to decrease the time a
> partition
> > > spends
> > > > > > in under-replicated state. In other words, a follower which has
> > just
> > > > > fallen
> > > > > > out of ISR, but has local data will continue using today's Tiered
> > > Storage
> > > > > > replication protocol (i.e. fetching from earliest-local). I
> further
> > > > > believe
> > > > > > he has taken this approach so that local state of replicas which
> > have
> > > > > just
> > > > > > fallen out of ISR isn't forcefully wiped thus leading to
> situation
> > 1.
> > > > > > Abhijeet, have I understood (and summarised) what you are
> proposing
> > > > > > correctly?
> > > > > >
> > > > > > Yes, your understanding is 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-22 Thread Abhijeet Kumar
Hi Jun,

Please find my comments inline.


On Thu, Apr 18, 2024 at 3:26 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the reply.
>
> 1. I am wondering if we could achieve the same result by just lowering
> local.retention.ms and local.retention.bytes. This also allows the newly
> started follower to build up the local data before serving the consumer
> traffic.
>

I am not sure I fully followed this. Do you mean we could lower the
local.retention (by size and time)
so that there is little data on the leader's local storage so that the
follower can quickly catch up with the leader?

In that case, we will need to set small local retention across brokers in
the cluster. It will have the undesired
effect where there will be increased remote log fetches for serving consume
requests, and this can cause
degradations. Also, this behaviour (of increased remote fetches) will
happen on all brokers at all times, whereas in
the KIP we are restricting the behavior only to the newly bootstrapped
brokers and only until the time it fully builds
the local logs as per retention defined at the cluster level.
(Deprioritization of the broker could help reduce the impact
 even further)


>
> 2. Have you updated the KIP?
>

The KIP has been updated now.


>
> Thanks,
>
> Jun
>
> On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
> wrote:
>
> > +1 to Jun for adding the consumer fetching from a follower scenario
> > also to the existing section that talked about the drawback when a
> > node built with last-tiered-offset has become a leader. As Abhijeet
> > mentioned, we plan to have a follow-up KIP that will address these by
> > having a deprioritzation of these brokers. The deprioritization of
> > those brokers can be removed once they catchup until the local log
> > retention.
> >
> > Thanks,
> > Satish.
> >
> > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > >
> > > Hi Abhijeet,
> > >
> > > Thanks for the KIP to improve the tiered storage feature!
> > >
> > > Questions:
> > > 1. We could also get the "pending-upload-offset" and epoch via remote
> log
> > > metadata, instead of adding a new API to fetch from the leader. Could
> you
> > > explain why you choose the later approach, instead of the former?
> > > 2.
> > > > We plan to have a follow-up KIP that will address both the
> > > deprioritization
> > > of these brokers from leadership, as well as
> > > for consumption (when fetching from followers is allowed).
> > >
> > > I agree with Jun that we might need to make it clear all possible
> > drawbacks
> > > that could have. So, could we add the drawbacks that Jun mentioned
> about
> > > the performance issue when consumer fetch from follower?
> > >
> > > 3. Could we add "Rejected Alternatives" section to the end of the KIP
> to
> > > add some of them?
> > > Like the "ListOffsetRequest" approach VS
> "Earliest-Pending-Upload-Offset"
> > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote
> log
> > > metadata... etc.
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Christo,
> > > >
> > > > Please find my comments inline.
> > > >
> > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Abhijeet and Jun,
> > > > >
> > > > > I have been mulling this KIP over a bit more in recent days!
> > > > >
> > > > > re: Jun
> > > > >
> > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps -
> in
> > > > > retrospect it should have been fairly obvious. I would need to go
> an
> > > > update
> > > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > > >
> > > > > 4. I think Abhijeet wants to rebuild state from
> latest-tiered-offset
> > and
> > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > replicas
> > > > > which experienced a disk failure) to decrease the time a partition
> > spends
> > > > > in under-replicated state. In other words, a follower which has
> just
> > > > fallen
> > > > > out of ISR, but has local data will continue using today's Tiered
> > Storage
> > > > > replication protocol (i.e. fetching from earliest-local). I further
> > > > believe
> > > > > he has taken this approach so that local state of replicas which
> have
> > > > just
> > > > > fallen out of ISR isn't forcefully wiped thus leading to situation
> 1.
> > > > > Abhijeet, have I understood (and summarised) what you are proposing
> > > > > correctly?
> > > > >
> > > > > Yes, your understanding is correct. We want to limit the behavior
> > changes
> > > > only to new replicas.
> > > >
> > > >
> > > > > 5. I think in today's Tiered Storage we know the leader's
> > > > log-start-offset
> > > > > from the FetchResponse and we can learn its local-log-start-offset
> > from
> > > > the
> > > > > ListOffsets by asking for earliest-local timestamp (-4). But
> granted,
> > > > this
> > > > > ought to be added as an 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-22 Thread Abhijeet Kumar
Hi Luke,

Thanks for your comments. Please find my responses inline.

On Tue, Apr 9, 2024 at 2:08 PM Luke Chen  wrote:

> Hi Abhijeet,
>
> Thanks for the KIP to improve the tiered storage feature!
>
> Questions:
> 1. We could also get the "pending-upload-offset" and epoch via remote log
> metadata, instead of adding a new API to fetch from the leader. Could you
> explain why you choose the later approach, instead of the former?
>

The remote log metadata could be tracking overlapping log segments. The
maximum offset
across the log segments it may be tracking, may not be the
last-tiered-offset because of truncations
during unclean leader election. Remote Log metadata alone is not sufficient
to identify last-tiered-offset or
pending-upload-offset.

Only the leader knows the correct lineage of offsets that is required to
identify the "pending-upload-offset".
That is the reason we chose to add a new API to fetch this information from
the leader.


2.
> > We plan to have a follow-up KIP that will address both the
> deprioritization
> of these brokers from leadership, as well as
> for consumption (when fetching from followers is allowed).
>
> I agree with Jun that we might need to make it clear all possible drawbacks
> that could have. So, could we add the drawbacks that Jun mentioned about
> the performance issue when consumer fetch from follower?
>
>
Updated the KIP to call this out.


> 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> add some of them?
> Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> metadata... etc.
>
> Added the section on Rejected Alternatives


> Thanks.
> Luke
>
>
> On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
> wrote:
>
> > Hi Christo,
> >
> > Please find my comments inline.
> >
> > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > wrote:
> >
> > > Hello Abhijeet and Jun,
> > >
> > > I have been mulling this KIP over a bit more in recent days!
> > >
> > > re: Jun
> > >
> > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > retrospect it should have been fairly obvious. I would need to go an
> > update
> > > KIP-1005 myself then, thank you for giving the useful reference!
> > >
> > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset
> and
> > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > > which experienced a disk failure) to decrease the time a partition
> spends
> > > in under-replicated state. In other words, a follower which has just
> > fallen
> > > out of ISR, but has local data will continue using today's Tiered
> Storage
> > > replication protocol (i.e. fetching from earliest-local). I further
> > believe
> > > he has taken this approach so that local state of replicas which have
> > just
> > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > Abhijeet, have I understood (and summarised) what you are proposing
> > > correctly?
> > >
> > > Yes, your understanding is correct. We want to limit the behavior
> changes
> > only to new replicas.
> >
> >
> > > 5. I think in today's Tiered Storage we know the leader's
> > log-start-offset
> > > from the FetchResponse and we can learn its local-log-start-offset from
> > the
> > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > this
> > > ought to be added as an additional API call in the KIP.
> > >
> > >
> > Yes, I clarified this in my reply to Jun. I will add this missing detail
> in
> > the KIP.
> >
> >
> > > re: Abhijeet
> > >
> > > 101. I am still a bit confused as to why you want to include a new
> offset
> > > (i.e. pending-upload-offset) when you yourself mention that you could
> use
> > > an already existing offset (i.e. last-tiered-offset + 1). In essence,
> you
> > > end your Motivation with "In this KIP, we will focus only on the
> follower
> > > fetch protocol using the *last-tiered-offset*" and then in the
> following
> > > sections you talk about pending-upload-offset. I understand this might
> be
> > > classified as an implementation detail, but if you introduce a new
> offset
> > > (i.e. pending-upload-offset) you have to make a change to the
> ListOffsets
> > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > However,
> > > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > > (under implementation). Am I misunderstanding something here?
> > >
> >
> > I have tried to clarify this in my reply to Jun.
> >
> > > The follower needs to build the local data starting from the offset
> > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > > corresponding leader-epoch.
> > > There are two ways to do this:
> > >1. We add support in ListOffsetRequest to be able to fetch this
> offset
> > > (and leader epoch) from the leader.
> > >2. Or, fetch the tiered-offset (which is already supported). From
> this
> 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-17 Thread Jun Rao
Hi, Abhijeet,

Thanks for the reply.

1. I am wondering if we could achieve the same result by just lowering
local.retention.ms and local.retention.bytes. This also allows the newly
started follower to build up the local data before serving the consumer
traffic.

2. Have you updated the KIP?

Thanks,

Jun

On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
wrote:

> +1 to Jun for adding the consumer fetching from a follower scenario
> also to the existing section that talked about the drawback when a
> node built with last-tiered-offset has become a leader. As Abhijeet
> mentioned, we plan to have a follow-up KIP that will address these by
> having a deprioritzation of these brokers. The deprioritization of
> those brokers can be removed once they catchup until the local log
> retention.
>
> Thanks,
> Satish.
>
> On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> >
> > Hi Abhijeet,
> >
> > Thanks for the KIP to improve the tiered storage feature!
> >
> > Questions:
> > 1. We could also get the "pending-upload-offset" and epoch via remote log
> > metadata, instead of adding a new API to fetch from the leader. Could you
> > explain why you choose the later approach, instead of the former?
> > 2.
> > > We plan to have a follow-up KIP that will address both the
> > deprioritization
> > of these brokers from leadership, as well as
> > for consumption (when fetching from followers is allowed).
> >
> > I agree with Jun that we might need to make it clear all possible
> drawbacks
> > that could have. So, could we add the drawbacks that Jun mentioned about
> > the performance issue when consumer fetch from follower?
> >
> > 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> > add some of them?
> > Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> > approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> > metadata... etc.
> >
> > Thanks.
> > Luke
> >
> >
> > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Christo,
> > >
> > > Please find my comments inline.
> > >
> > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > > wrote:
> > >
> > > > Hello Abhijeet and Jun,
> > > >
> > > > I have been mulling this KIP over a bit more in recent days!
> > > >
> > > > re: Jun
> > > >
> > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > > retrospect it should have been fairly obvious. I would need to go an
> > > update
> > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > >
> > > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset
> and
> > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> replicas
> > > > which experienced a disk failure) to decrease the time a partition
> spends
> > > > in under-replicated state. In other words, a follower which has just
> > > fallen
> > > > out of ISR, but has local data will continue using today's Tiered
> Storage
> > > > replication protocol (i.e. fetching from earliest-local). I further
> > > believe
> > > > he has taken this approach so that local state of replicas which have
> > > just
> > > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > > Abhijeet, have I understood (and summarised) what you are proposing
> > > > correctly?
> > > >
> > > > Yes, your understanding is correct. We want to limit the behavior
> changes
> > > only to new replicas.
> > >
> > >
> > > > 5. I think in today's Tiered Storage we know the leader's
> > > log-start-offset
> > > > from the FetchResponse and we can learn its local-log-start-offset
> from
> > > the
> > > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > > this
> > > > ought to be added as an additional API call in the KIP.
> > > >
> > > >
> > > Yes, I clarified this in my reply to Jun. I will add this missing
> detail in
> > > the KIP.
> > >
> > >
> > > > re: Abhijeet
> > > >
> > > > 101. I am still a bit confused as to why you want to include a new
> offset
> > > > (i.e. pending-upload-offset) when you yourself mention that you
> could use
> > > > an already existing offset (i.e. last-tiered-offset + 1). In
> essence, you
> > > > end your Motivation with "In this KIP, we will focus only on the
> follower
> > > > fetch protocol using the *last-tiered-offset*" and then in the
> following
> > > > sections you talk about pending-upload-offset. I understand this
> might be
> > > > classified as an implementation detail, but if you introduce a new
> offset
> > > > (i.e. pending-upload-offset) you have to make a change to the
> ListOffsets
> > > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > > However,
> > > > the last-tiered-offset ought to already be exposed as part of
> KIP-1005
> > > > (under implementation). Am I misunderstanding something here?
> > > >
> > >
> > > I have tried to clarify this in my reply to Jun.
> > >
> > > > The follower needs to build the local 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Satish Duggana
+1 to Jun for adding the consumer fetching from a follower scenario
also to the existing section that talked about the drawback when a
node built with last-tiered-offset has become a leader. As Abhijeet
mentioned, we plan to have a follow-up KIP that will address these by
having a deprioritzation of these brokers. The deprioritization of
those brokers can be removed once they catchup until the local log
retention.

Thanks,
Satish.

On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
>
> Hi Abhijeet,
>
> Thanks for the KIP to improve the tiered storage feature!
>
> Questions:
> 1. We could also get the "pending-upload-offset" and epoch via remote log
> metadata, instead of adding a new API to fetch from the leader. Could you
> explain why you choose the later approach, instead of the former?
> 2.
> > We plan to have a follow-up KIP that will address both the
> deprioritization
> of these brokers from leadership, as well as
> for consumption (when fetching from followers is allowed).
>
> I agree with Jun that we might need to make it clear all possible drawbacks
> that could have. So, could we add the drawbacks that Jun mentioned about
> the performance issue when consumer fetch from follower?
>
> 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> add some of them?
> Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> metadata... etc.
>
> Thanks.
> Luke
>
>
> On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
> wrote:
>
> > Hi Christo,
> >
> > Please find my comments inline.
> >
> > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > wrote:
> >
> > > Hello Abhijeet and Jun,
> > >
> > > I have been mulling this KIP over a bit more in recent days!
> > >
> > > re: Jun
> > >
> > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > retrospect it should have been fairly obvious. I would need to go an
> > update
> > > KIP-1005 myself then, thank you for giving the useful reference!
> > >
> > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > > which experienced a disk failure) to decrease the time a partition spends
> > > in under-replicated state. In other words, a follower which has just
> > fallen
> > > out of ISR, but has local data will continue using today's Tiered Storage
> > > replication protocol (i.e. fetching from earliest-local). I further
> > believe
> > > he has taken this approach so that local state of replicas which have
> > just
> > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > Abhijeet, have I understood (and summarised) what you are proposing
> > > correctly?
> > >
> > > Yes, your understanding is correct. We want to limit the behavior changes
> > only to new replicas.
> >
> >
> > > 5. I think in today's Tiered Storage we know the leader's
> > log-start-offset
> > > from the FetchResponse and we can learn its local-log-start-offset from
> > the
> > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > this
> > > ought to be added as an additional API call in the KIP.
> > >
> > >
> > Yes, I clarified this in my reply to Jun. I will add this missing detail in
> > the KIP.
> >
> >
> > > re: Abhijeet
> > >
> > > 101. I am still a bit confused as to why you want to include a new offset
> > > (i.e. pending-upload-offset) when you yourself mention that you could use
> > > an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> > > end your Motivation with "In this KIP, we will focus only on the follower
> > > fetch protocol using the *last-tiered-offset*" and then in the following
> > > sections you talk about pending-upload-offset. I understand this might be
> > > classified as an implementation detail, but if you introduce a new offset
> > > (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > However,
> > > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > > (under implementation). Am I misunderstanding something here?
> > >
> >
> > I have tried to clarify this in my reply to Jun.
> >
> > > The follower needs to build the local data starting from the offset
> > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > > corresponding leader-epoch.
> > > There are two ways to do this:
> > >1. We add support in ListOffsetRequest to be able to fetch this offset
> > > (and leader epoch) from the leader.
> > >2. Or, fetch the tiered-offset (which is already supported). From this
> > > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1
> > to
> > > the tiered-offset.
> > >   However, we still need the leader epoch for offset, since there is
> > > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset
> > will
> 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Luke Chen
Hi Abhijeet,

Thanks for the KIP to improve the tiered storage feature!

Questions:
1. We could also get the "pending-upload-offset" and epoch via remote log
metadata, instead of adding a new API to fetch from the leader. Could you
explain why you choose the later approach, instead of the former?
2.
> We plan to have a follow-up KIP that will address both the
deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).

I agree with Jun that we might need to make it clear all possible drawbacks
that could have. So, could we add the drawbacks that Jun mentioned about
the performance issue when consumer fetch from follower?

3. Could we add "Rejected Alternatives" section to the end of the KIP to
add some of them?
Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
metadata... etc.

Thanks.
Luke


On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
wrote:

> Hi Christo,
>
> Please find my comments inline.
>
> On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> wrote:
>
> > Hello Abhijeet and Jun,
> >
> > I have been mulling this KIP over a bit more in recent days!
> >
> > re: Jun
> >
> > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > retrospect it should have been fairly obvious. I would need to go an
> update
> > KIP-1005 myself then, thank you for giving the useful reference!
> >
> > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > which experienced a disk failure) to decrease the time a partition spends
> > in under-replicated state. In other words, a follower which has just
> fallen
> > out of ISR, but has local data will continue using today's Tiered Storage
> > replication protocol (i.e. fetching from earliest-local). I further
> believe
> > he has taken this approach so that local state of replicas which have
> just
> > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > Abhijeet, have I understood (and summarised) what you are proposing
> > correctly?
> >
> > Yes, your understanding is correct. We want to limit the behavior changes
> only to new replicas.
>
>
> > 5. I think in today's Tiered Storage we know the leader's
> log-start-offset
> > from the FetchResponse and we can learn its local-log-start-offset from
> the
> > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> this
> > ought to be added as an additional API call in the KIP.
> >
> >
> Yes, I clarified this in my reply to Jun. I will add this missing detail in
> the KIP.
>
>
> > re: Abhijeet
> >
> > 101. I am still a bit confused as to why you want to include a new offset
> > (i.e. pending-upload-offset) when you yourself mention that you could use
> > an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> > end your Motivation with "In this KIP, we will focus only on the follower
> > fetch protocol using the *last-tiered-offset*" and then in the following
> > sections you talk about pending-upload-offset. I understand this might be
> > classified as an implementation detail, but if you introduce a new offset
> > (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> > API (i.e. introduce -6) and thus document it in this KIP as such.
> However,
> > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > (under implementation). Am I misunderstanding something here?
> >
>
> I have tried to clarify this in my reply to Jun.
>
> > The follower needs to build the local data starting from the offset
> > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > corresponding leader-epoch.
> > There are two ways to do this:
> >1. We add support in ListOffsetRequest to be able to fetch this offset
> > (and leader epoch) from the leader.
> >2. Or, fetch the tiered-offset (which is already supported). From this
> > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1
> to
> > the tiered-offset.
> >   However, we still need the leader epoch for offset, since there is
> > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset
> will
> > be the same as the
> >   leader epoch for tiered-offset. We may need another API call to the
> > leader for this.
> > I prefer the first approach. The only problem with the first approach is
> > that it introduces one more offset. The second approach avoids this
> problem
> > but is a little complicated.
>
>
>
> >
> > Best,
> > Christo
> >
> > On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the KIP. Left a few comments.
> > >
> > > 1. "A drawback of using the last-tiered-offset is that this new
> follower
> > > would possess only a limited number of locally stored segments. Should
> it
> > > ascend to the role of leader, there is a risk of 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Abhijeet Kumar
Hi Christo,

Please find my comments inline.

On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
wrote:

> Hello Abhijeet and Jun,
>
> I have been mulling this KIP over a bit more in recent days!
>
> re: Jun
>
> I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> retrospect it should have been fairly obvious. I would need to go an update
> KIP-1005 myself then, thank you for giving the useful reference!
>
> 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> which experienced a disk failure) to decrease the time a partition spends
> in under-replicated state. In other words, a follower which has just fallen
> out of ISR, but has local data will continue using today's Tiered Storage
> replication protocol (i.e. fetching from earliest-local). I further believe
> he has taken this approach so that local state of replicas which have just
> fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> Abhijeet, have I understood (and summarised) what you are proposing
> correctly?
>
> Yes, your understanding is correct. We want to limit the behavior changes
only to new replicas.


> 5. I think in today's Tiered Storage we know the leader's log-start-offset
> from the FetchResponse and we can learn its local-log-start-offset from the
> ListOffsets by asking for earliest-local timestamp (-4). But granted, this
> ought to be added as an additional API call in the KIP.
>
>
Yes, I clarified this in my reply to Jun. I will add this missing detail in
the KIP.


> re: Abhijeet
>
> 101. I am still a bit confused as to why you want to include a new offset
> (i.e. pending-upload-offset) when you yourself mention that you could use
> an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> end your Motivation with "In this KIP, we will focus only on the follower
> fetch protocol using the *last-tiered-offset*" and then in the following
> sections you talk about pending-upload-offset. I understand this might be
> classified as an implementation detail, but if you introduce a new offset
> (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> API (i.e. introduce -6) and thus document it in this KIP as such. However,
> the last-tiered-offset ought to already be exposed as part of KIP-1005
> (under implementation). Am I misunderstanding something here?
>

I have tried to clarify this in my reply to Jun.

> The follower needs to build the local data starting from the offset
> Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> corresponding leader-epoch.
> There are two ways to do this:
>1. We add support in ListOffsetRequest to be able to fetch this offset
> (and leader epoch) from the leader.
>2. Or, fetch the tiered-offset (which is already supported). From this
> offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
> the tiered-offset.
>   However, we still need the leader epoch for offset, since there is
> no guarantee that the leader epoch for Earliest-Pending-Upload-Offset will
> be the same as the
>   leader epoch for tiered-offset. We may need another API call to the
> leader for this.
> I prefer the first approach. The only problem with the first approach is
> that it introduces one more offset. The second approach avoids this problem
> but is a little complicated.



>
> Best,
> Christo
>
> On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the KIP. Left a few comments.
> >
> > 1. "A drawback of using the last-tiered-offset is that this new follower
> > would possess only a limited number of locally stored segments. Should it
> > ascend to the role of leader, there is a risk of needing to fetch these
> > segments from the remote storage, potentially impacting broker
> > performance."
> > Since we support consumers fetching from followers, this is a potential
> > issue on the follower side too. In theory, it's possible for a segment to
> > be tiered immediately after rolling. In that case, there could be very
> > little data after last-tiered-offset. It would be useful to think through
> > how to address this issue.
> >
> > 2. ListOffsetsRequest:
> > 2.1 Typically, we need to bump up the version of the request if we add a
> > new value for timestamp. See
> >
> >
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> > .
> > 2.2 Since this changes the inter broker request protocol, it would be
> > useful to have a section on upgrade (e.g. new IBP/metadata.version).
> >
> > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch
> the
> > last-tiered-offset from the leader, and make a separate leader call to
> > fetch leader epoch for the following offset."
> > Why do we need to make a separate call for the leader epoch?
> > ListOffsetsResponse include both the offset and the corresponding 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Abhijeet Kumar
Hi Jun,

Thank you for taking the time to review the KIP. Please find my comments
inline.

On Fri, Apr 5, 2024 at 12:09 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>

We plan to have a follow-up KIP that will address both the deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).


>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
>

Yes, let me update the KIP to include this change. We will need a new
timestamp corresponding to Earliest-Pending-Upload-Offset.


> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> Make sense. I will update the KIP to capture this.


> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> I understand there is some confusion here. Let me try to explain this.

The follower needs to build the local data starting from the offset
Earliest-Pending-Upload-Offset. Hence it needs the offset and the
corresponding leader-epoch.
There are two ways to do this:
   1. We add support in ListOffsetRequest to be able to fetch this offset
(and leader epoch) from the leader.
   2. Or, fetch the tiered-offset (which is already supported). From this
offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
the tiered-offset.
  However, we still need the leader epoch for offset, since there is no
guarantee that the leader epoch for Earliest-Pending-Upload-Offset will be
the same as the
  leader epoch for tiered-offset. We may need another API call to the
leader for this.

I prefer the first approach. The only problem with the first approach is
that it introduces one more offset. The second approach avoids this problem
but is a little complicated.


> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
>
We want to limit this new behavior only to new replicas. Replicas that
become out of ISR are excluded from this behavior change. Those will
continue with the existing behavior.


> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>

Missed this detail. The follower will need to call the leader APi to fetch
the EarliestLocal offset for this.


> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader to
> > give
> > > back to a follower - earliest-pending-upload-offset or the
> > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> ought
> > to
> > > already be exposing that offset as part of the ListOffsets API, no?
> > >
> > > Best,
> > > Christo
> > >
> > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have created KIP-1023 to introduce follower fetch from tiered
> offset.
> > > > This feature will be helpful in significantly reducing Kafka
> > > > rebalance/rebuild times when the cluster is enabled with tiered
> > storage.
> > > >
> > > >
> > > >
> > >
> >
> 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-05 Thread Christo Lolov
Hello Abhijeet and Jun,

I have been mulling this KIP over a bit more in recent days!

re: Jun

I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
retrospect it should have been fairly obvious. I would need to go an update
KIP-1005 myself then, thank you for giving the useful reference!

4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
fetch from latest-tiered-offset + 1 only for new replicas (or replicas
which experienced a disk failure) to decrease the time a partition spends
in under-replicated state. In other words, a follower which has just fallen
out of ISR, but has local data will continue using today's Tiered Storage
replication protocol (i.e. fetching from earliest-local). I further believe
he has taken this approach so that local state of replicas which have just
fallen out of ISR isn't forcefully wiped thus leading to situation 1.
Abhijeet, have I understood (and summarised) what you are proposing
correctly?

5. I think in today's Tiered Storage we know the leader's log-start-offset
from the FetchResponse and we can learn its local-log-start-offset from the
ListOffsets by asking for earliest-local timestamp (-4). But granted, this
ought to be added as an additional API call in the KIP.

re: Abhijeet

101. I am still a bit confused as to why you want to include a new offset
(i.e. pending-upload-offset) when you yourself mention that you could use
an already existing offset (i.e. last-tiered-offset + 1). In essence, you
end your Motivation with "In this KIP, we will focus only on the follower
fetch protocol using the *last-tiered-offset*" and then in the following
sections you talk about pending-upload-offset. I understand this might be
classified as an implementation detail, but if you introduce a new offset
(i.e. pending-upload-offset) you have to make a change to the ListOffsets
API (i.e. introduce -6) and thus document it in this KIP as such. However,
the last-tiered-offset ought to already be exposed as part of KIP-1005
(under implementation). Am I misunderstanding something here?

Best,
Christo

On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>
> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader to
> > give
> > > back to a follower - earliest-pending-upload-offset or the
> > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> ought
> > to
> > > already be exposing that offset as part of the ListOffsets API, no?
> > >
> > > Best,
> > > Christo
> > >
> > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-04 Thread Jun Rao
Hi, Abhijeet,

Thanks for the KIP. Left a few comments.

1. "A drawback of using the last-tiered-offset is that this new follower
would possess only a limited number of locally stored segments. Should it
ascend to the role of leader, there is a risk of needing to fetch these
segments from the remote storage, potentially impacting broker performance."
Since we support consumers fetching from followers, this is a potential
issue on the follower side too. In theory, it's possible for a segment to
be tiered immediately after rolling. In that case, there could be very
little data after last-tiered-offset. It would be useful to think through
how to address this issue.

2. ListOffsetsRequest:
2.1 Typically, we need to bump up the version of the request if we add a
new value for timestamp. See
https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
.
2.2 Since this changes the inter broker request protocol, it would be
useful to have a section on upgrade (e.g. new IBP/metadata.version).

3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
last-tiered-offset from the leader, and make a separate leader call to
fetch leader epoch for the following offset."
Why do we need to make a separate call for the leader epoch?
ListOffsetsResponse include both the offset and the corresponding epoch.

4. "Check if the follower replica is empty and if the feature to use
last-tiered-offset is enabled."
Why do we need to check if the follower replica is empty?

5. It can be confirmed by checking if the leader's Log-Start-Offset is the
same as the Leader's Local-Log-Start-Offset.
How does the follower know Local-Log-Start-Offset?

Jun

On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar 
wrote:

> Hi Christo,
>
> Thanks for reviewing the KIP.
>
> The follower needs the earliest-pending-upload-offset (and the
> corresponding leader epoch) from the leader.
> This is the first offset the follower will have locally.
>
> Regards,
> Abhijeet.
>
>
>
> On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > First of all, thank you very much for the proposal, you have explained
> the
> > problem you want solved very well - I think a faster bootstrap of an
> empty
> > replica is definitely an improvement!
> >
> > For my understanding, which concrete offset do you want the leader to
> give
> > back to a follower - earliest-pending-upload-offset or the
> > latest-tiered-offset? If it is the second, then I believe KIP-1005 ought
> to
> > already be exposing that offset as part of the ListOffsets API, no?
> >
> > Best,
> > Christo
> >
> > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar  >
> > wrote:
> >
> > > Hi All,
> > >
> > > I have created KIP-1023 to introduce follower fetch from tiered offset.
> > > This feature will be helpful in significantly reducing Kafka
> > > rebalance/rebuild times when the cluster is enabled with tiered
> storage.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > >
> > > Feedback and suggestions are welcome.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-30 Thread Abhijeet Kumar
Hi Christo,

Thanks for reviewing the KIP.

The follower needs the earliest-pending-upload-offset (and the
corresponding leader epoch) from the leader.
This is the first offset the follower will have locally.

Regards,
Abhijeet.



On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
wrote:

> Heya!
>
> First of all, thank you very much for the proposal, you have explained the
> problem you want solved very well - I think a faster bootstrap of an empty
> replica is definitely an improvement!
>
> For my understanding, which concrete offset do you want the leader to give
> back to a follower - earliest-pending-upload-offset or the
> latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to
> already be exposing that offset as part of the ListOffsets API, no?
>
> Best,
> Christo
>
> On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar 
> wrote:
>
> > Hi All,
> >
> > I have created KIP-1023 to introduce follower fetch from tiered offset.
> > This feature will be helpful in significantly reducing Kafka
> > rebalance/rebuild times when the cluster is enabled with tiered storage.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> >
> > Feedback and suggestions are welcome.
> >
> > Regards,
> > Abhijeet.
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-29 Thread Christo Lolov
Heya!

First of all, thank you very much for the proposal, you have explained the
problem you want solved very well - I think a faster bootstrap of an empty
replica is definitely an improvement!

For my understanding, which concrete offset do you want the leader to give
back to a follower - earliest-pending-upload-offset or the
latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to
already be exposing that offset as part of the ListOffsets API, no?

Best,
Christo

On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar 
wrote:

> Hi All,
>
> I have created KIP-1023 to introduce follower fetch from tiered offset.
> This feature will be helpful in significantly reducing Kafka
> rebalance/rebuild times when the cluster is enabled with tiered storage.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
>
> Feedback and suggestions are welcome.
>
> Regards,
> Abhijeet.
>


[DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-27 Thread Abhijeet Kumar
Hi All,

I have created KIP-1023 to introduce follower fetch from tiered offset.
This feature will be helpful in significantly reducing Kafka
rebalance/rebuild times when the cluster is enabled with tiered storage.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset

Feedback and suggestions are welcome.

Regards,
Abhijeet.