Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-25 Thread Abhijeet Kumar
>
> I am ok with adding a note in the KIP but the note should say that it has
> an elevated risk for this scenario due to increased probability of having
> an aggressive local cleanup with Tiered Storage.
>

I would like to clarify that there is no elevated risk because of this KIP.
This risk
already exists with the current protocol where an empty follower for a
tiered storage
enabled topic catches up with the leader where it replicates data beginning
from
leader's local log start offset.

In the current KIP, the follower starts from last tiered offset instead
which is usually
much ahead of leader's local log start offset and can never be behind the
leader's
local log start offset. Hence this offset is more likely to be available on
the leader's
local log compared to the local log start offset in this rare scenario.

Therefore, the risk is actually lower when the follower starts with last
tiered offset instead
of the local log start offset.

Abhijeet.



On Wed, Jul 24, 2024 at 7:26 PM Divij Vaidya 
wrote:

> The difference between the two scenarios you mentioned is that with Tiered
> Storage, the chances of hitting this scenario increases since a user is
> likely to have an aggressive setting for local disk data cleanup, which
> would not be the case in empty followers catching up in a non-tiered
> storage world.
>
> I am ok with adding a note in the KIP but the note should say that it has
> an elevated risk for this scenario due to increased probability of having
> an aggressive local cleanup with Tiered Storage.
>
> --
> Divij Vaidya
>
>
>
> On Wed, Jul 24, 2024 at 1:22 PM Abhijeet Kumar  >
> wrote:
>
> > Hi Divij,
> >
> > The rare scenario we are discussing is similar to an empty follower
> trying
> > to catch up with the leader for a topic that is not enabled with tiered
> > storage. Consider the following steps:
> >
> > 1. Follower requests offset 0 from the leader.
> > 2. Offset 0 is no more valid on the leader as its log start offset is 10,
> > hence leader throws Out of Range error
> > 3. Follower fetches the earliest offset from the leader and gets 10, then
> > resets its Fetch offset to 10.
> > 4. Follower requests offset 10 from the leader, but the previous log
> start
> > offset (10) is deleted from the leader and the new log start offset is
> 15.
> > Hence the leader throws an Out of Range error. The follower goes back to
> > step 3
> >
> > Even in this scenario, theoretically, the follower will never be able to
> > catch up. Since this is an existing problem, that affects even regular
> > replication for topics without tiered storage, should we take this up
> > separately?
> > I can add a small note in the KIP saying that this behavior for follower
> > catchup is similar to a scenario when tiered storage is not enabled.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Tue, Jul 23, 2024 at 4:49 PM Divij Vaidya 
> > wrote:
> >
> > > Thank you for your response Abhijeet. You have understood the scenario
> > > correctly. For the purpose of discussion, please consider the latter
> case
> > > where offset 11 is not available on the leader anymore (it got cleaned
> > > locally since the last tiered offset is 15). In such a case, you
> > > mentioned, the follower will eventually be able to catch up with the
> > leader
> > > by resetting its fetch offset until the offset is available on the
> > leader's
> > > local log. Correct me if I am wrong but it is not guaranteed that it
> will
> > > eventually catch up because theoretically, everytime it asks for a
> newer
> > > fetch offset, the leader may have deleted it locally. I understand that
> > it
> > > is an edge case scenario which will only happen with configurations for
> > > small segment sizes and aggressive cleaning but nevertheless, it is a
> > > possible scenario.
> > >
> > > Do you agree that theoretically it is possible for the follower to loop
> > > such that it is never able to catch up?
> > >
> > > We can proceed with the KIP with an understanding that this scenario is
> > > rare and we are willing to accept the risk of it. In such a case, we
> > should
> > > add a detection mechanism for such a scenario in the KIP, so that if we
> > > encounter this scenario, the user has a way to detect (and mitigate
> it).
> > > Alternatively, we can change the KIP design to ensure that we never
> > > encounter this scenario. Given the rarity of the scenario, I am ok with
> > > having a detection mechanism (metric?) in place and having this
> scenario
> > > documented as an acceptable risk in current design.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > Seems like there is some confusion about the new protocol for
> fetching
> > > from
> > > > tiered offset.
> > > > The scenario you are highlighting is where,
> > > > Leader's Log Start Offset = 0
> > > > Last Tiered Offset = 10
> > > >
> > > > Following 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-24 Thread Divij Vaidya
The difference between the two scenarios you mentioned is that with Tiered
Storage, the chances of hitting this scenario increases since a user is
likely to have an aggressive setting for local disk data cleanup, which
would not be the case in empty followers catching up in a non-tiered
storage world.

I am ok with adding a note in the KIP but the note should say that it has
an elevated risk for this scenario due to increased probability of having
an aggressive local cleanup with Tiered Storage.

--
Divij Vaidya



On Wed, Jul 24, 2024 at 1:22 PM Abhijeet Kumar 
wrote:

> Hi Divij,
>
> The rare scenario we are discussing is similar to an empty follower trying
> to catch up with the leader for a topic that is not enabled with tiered
> storage. Consider the following steps:
>
> 1. Follower requests offset 0 from the leader.
> 2. Offset 0 is no more valid on the leader as its log start offset is 10,
> hence leader throws Out of Range error
> 3. Follower fetches the earliest offset from the leader and gets 10, then
> resets its Fetch offset to 10.
> 4. Follower requests offset 10 from the leader, but the previous log start
> offset (10) is deleted from the leader and the new log start offset is 15.
> Hence the leader throws an Out of Range error. The follower goes back to
> step 3
>
> Even in this scenario, theoretically, the follower will never be able to
> catch up. Since this is an existing problem, that affects even regular
> replication for topics without tiered storage, should we take this up
> separately?
> I can add a small note in the KIP saying that this behavior for follower
> catchup is similar to a scenario when tiered storage is not enabled.
>
> Regards,
> Abhijeet.
>
>
>
> On Tue, Jul 23, 2024 at 4:49 PM Divij Vaidya 
> wrote:
>
> > Thank you for your response Abhijeet. You have understood the scenario
> > correctly. For the purpose of discussion, please consider the latter case
> > where offset 11 is not available on the leader anymore (it got cleaned
> > locally since the last tiered offset is 15). In such a case, you
> > mentioned, the follower will eventually be able to catch up with the
> leader
> > by resetting its fetch offset until the offset is available on the
> leader's
> > local log. Correct me if I am wrong but it is not guaranteed that it will
> > eventually catch up because theoretically, everytime it asks for a newer
> > fetch offset, the leader may have deleted it locally. I understand that
> it
> > is an edge case scenario which will only happen with configurations for
> > small segment sizes and aggressive cleaning but nevertheless, it is a
> > possible scenario.
> >
> > Do you agree that theoretically it is possible for the follower to loop
> > such that it is never able to catch up?
> >
> > We can proceed with the KIP with an understanding that this scenario is
> > rare and we are willing to accept the risk of it. In such a case, we
> should
> > add a detection mechanism for such a scenario in the KIP, so that if we
> > encounter this scenario, the user has a way to detect (and mitigate it).
> > Alternatively, we can change the KIP design to ensure that we never
> > encounter this scenario. Given the rarity of the scenario, I am ok with
> > having a detection mechanism (metric?) in place and having this scenario
> > documented as an acceptable risk in current design.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Divij,
> > >
> > > Seems like there is some confusion about the new protocol for fetching
> > from
> > > tiered offset.
> > > The scenario you are highlighting is where,
> > > Leader's Log Start Offset = 0
> > > Last Tiered Offset = 10
> > >
> > > Following is the sequence of events that will happen:
> > >
> > > 1. Follower requests offset 0 from the leader
> > > 2. Assuming offset 0 is not available locally (to arrive at your
> > scenario),
> > > Leader returns OffsetMovedToTieredStorageException
> > > 3. Follower fetches the earliest pending upload offset and receives 11
> > > 4. Follower builds aux state from [0-10] and sets the fetch offset to
> 11
> > > (This step corresponds to step 3 in your email)
> > >
> > > At this stage, even if the leader has uploaded more data and the
> > > last-tiered offset has changed (say to 15), it will not matter
> > > because offset 11 should still be available on the leader and when the
> > > follower requests data with fetch offset 11, the leader
> > > will return with a valid partition data response which the follower can
> > > consume and proceed further. Even if the offset 11 is not
> > > available anymore, the follower will eventually be able to catch up
> with
> > > the leader by resetting its fetch offset until the offset
> > > is available on the leader's local log. Once it catches up, replication
> > on
> > > the follower can proceed.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> > >
> > >
> > > On Tue, Jul 2, 2024 at 3

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-24 Thread Abhijeet Kumar
Hi Divij,

The rare scenario we are discussing is similar to an empty follower trying
to catch up with the leader for a topic that is not enabled with tiered
storage. Consider the following steps:

1. Follower requests offset 0 from the leader.
2. Offset 0 is no more valid on the leader as its log start offset is 10,
hence leader throws Out of Range error
3. Follower fetches the earliest offset from the leader and gets 10, then
resets its Fetch offset to 10.
4. Follower requests offset 10 from the leader, but the previous log start
offset (10) is deleted from the leader and the new log start offset is 15.
Hence the leader throws an Out of Range error. The follower goes back to
step 3

Even in this scenario, theoretically, the follower will never be able to
catch up. Since this is an existing problem, that affects even regular
replication for topics without tiered storage, should we take this up
separately?
I can add a small note in the KIP saying that this behavior for follower
catchup is similar to a scenario when tiered storage is not enabled.

Regards,
Abhijeet.



On Tue, Jul 23, 2024 at 4:49 PM Divij Vaidya 
wrote:

> Thank you for your response Abhijeet. You have understood the scenario
> correctly. For the purpose of discussion, please consider the latter case
> where offset 11 is not available on the leader anymore (it got cleaned
> locally since the last tiered offset is 15). In such a case, you
> mentioned, the follower will eventually be able to catch up with the leader
> by resetting its fetch offset until the offset is available on the leader's
> local log. Correct me if I am wrong but it is not guaranteed that it will
> eventually catch up because theoretically, everytime it asks for a newer
> fetch offset, the leader may have deleted it locally. I understand that it
> is an edge case scenario which will only happen with configurations for
> small segment sizes and aggressive cleaning but nevertheless, it is a
> possible scenario.
>
> Do you agree that theoretically it is possible for the follower to loop
> such that it is never able to catch up?
>
> We can proceed with the KIP with an understanding that this scenario is
> rare and we are willing to accept the risk of it. In such a case, we should
> add a detection mechanism for such a scenario in the KIP, so that if we
> encounter this scenario, the user has a way to detect (and mitigate it).
> Alternatively, we can change the KIP design to ensure that we never
> encounter this scenario. Given the rarity of the scenario, I am ok with
> having a detection mechanism (metric?) in place and having this scenario
> documented as an acceptable risk in current design.
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Hi Divij,
> >
> > Seems like there is some confusion about the new protocol for fetching
> from
> > tiered offset.
> > The scenario you are highlighting is where,
> > Leader's Log Start Offset = 0
> > Last Tiered Offset = 10
> >
> > Following is the sequence of events that will happen:
> >
> > 1. Follower requests offset 0 from the leader
> > 2. Assuming offset 0 is not available locally (to arrive at your
> scenario),
> > Leader returns OffsetMovedToTieredStorageException
> > 3. Follower fetches the earliest pending upload offset and receives 11
> > 4. Follower builds aux state from [0-10] and sets the fetch offset to 11
> > (This step corresponds to step 3 in your email)
> >
> > At this stage, even if the leader has uploaded more data and the
> > last-tiered offset has changed (say to 15), it will not matter
> > because offset 11 should still be available on the leader and when the
> > follower requests data with fetch offset 11, the leader
> > will return with a valid partition data response which the follower can
> > consume and proceed further. Even if the offset 11 is not
> > available anymore, the follower will eventually be able to catch up with
> > the leader by resetting its fetch offset until the offset
> > is available on the leader's local log. Once it catches up, replication
> on
> > the follower can proceed.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Tue, Jul 2, 2024 at 3:55 PM Divij Vaidya 
> > wrote:
> >
> > > Hi folks.
> > >
> > > I am late to the party but I have a question on the proposal.
> > >
> > > How are we preventing a situation such as the following:
> > >
> > > 1. Empty follower asks leader for 0
> > > 2. Leader compares 0 with last-tiered-offset, and responds with 11
> > (where10
> > > is last-tiered-offset) and a OffsetMovedToTieredException
> > > 3. Follower builds aux state from [0-10] and sets the fetch offset to
> 11
> > > 4. But leader has already uploaded more data and now the new
> > > last-tiered-offset is 15
> > > 5. Go back to 2
> > >
> > > This could cause a cycle where the replica will be stuck trying to
> > > reconcile with the leader.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Fri, Apr 26, 2024 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-23 Thread Divij Vaidya
Thank you for your response Abhijeet. You have understood the scenario
correctly. For the purpose of discussion, please consider the latter case
where offset 11 is not available on the leader anymore (it got cleaned
locally since the last tiered offset is 15). In such a case, you
mentioned, the follower will eventually be able to catch up with the leader
by resetting its fetch offset until the offset is available on the leader's
local log. Correct me if I am wrong but it is not guaranteed that it will
eventually catch up because theoretically, everytime it asks for a newer
fetch offset, the leader may have deleted it locally. I understand that it
is an edge case scenario which will only happen with configurations for
small segment sizes and aggressive cleaning but nevertheless, it is a
possible scenario.

Do you agree that theoretically it is possible for the follower to loop
such that it is never able to catch up?

We can proceed with the KIP with an understanding that this scenario is
rare and we are willing to accept the risk of it. In such a case, we should
add a detection mechanism for such a scenario in the KIP, so that if we
encounter this scenario, the user has a way to detect (and mitigate it).
Alternatively, we can change the KIP design to ensure that we never
encounter this scenario. Given the rarity of the scenario, I am ok with
having a detection mechanism (metric?) in place and having this scenario
documented as an acceptable risk in current design.

--
Divij Vaidya



On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar 
wrote:

> Hi Divij,
>
> Seems like there is some confusion about the new protocol for fetching from
> tiered offset.
> The scenario you are highlighting is where,
> Leader's Log Start Offset = 0
> Last Tiered Offset = 10
>
> Following is the sequence of events that will happen:
>
> 1. Follower requests offset 0 from the leader
> 2. Assuming offset 0 is not available locally (to arrive at your scenario),
> Leader returns OffsetMovedToTieredStorageException
> 3. Follower fetches the earliest pending upload offset and receives 11
> 4. Follower builds aux state from [0-10] and sets the fetch offset to 11
> (This step corresponds to step 3 in your email)
>
> At this stage, even if the leader has uploaded more data and the
> last-tiered offset has changed (say to 15), it will not matter
> because offset 11 should still be available on the leader and when the
> follower requests data with fetch offset 11, the leader
> will return with a valid partition data response which the follower can
> consume and proceed further. Even if the offset 11 is not
> available anymore, the follower will eventually be able to catch up with
> the leader by resetting its fetch offset until the offset
> is available on the leader's local log. Once it catches up, replication on
> the follower can proceed.
>
> Regards,
> Abhijeet.
>
>
>
> On Tue, Jul 2, 2024 at 3:55 PM Divij Vaidya 
> wrote:
>
> > Hi folks.
> >
> > I am late to the party but I have a question on the proposal.
> >
> > How are we preventing a situation such as the following:
> >
> > 1. Empty follower asks leader for 0
> > 2. Leader compares 0 with last-tiered-offset, and responds with 11
> (where10
> > is last-tiered-offset) and a OffsetMovedToTieredException
> > 3. Follower builds aux state from [0-10] and sets the fetch offset to 11
> > 4. But leader has already uploaded more data and now the new
> > last-tiered-offset is 15
> > 5. Go back to 2
> >
> > This could cause a cycle where the replica will be stuck trying to
> > reconcile with the leader.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Thank you all for your comments. As all the comments in the thread are
> > > addressed, I am starting a Vote thread for the KIP. Please have a look.
> > >
> > > Regards.
> > > Abhijeet.
> > >
> > >
> > >
> > > On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the update.
> > > >
> > > > I have no more comments.
> > > >
> > > > Luke
> > > >
> > > > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the updated KIP. It looks good to me.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Please find my comments inline.
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao  >
> > > > > wrote:
> > > > > >
> > > > > > > Hi, Abhijeet,
> > > > > > >
> > > > > > > Thanks for the reply.
> > > > > > >
> > > > > > > 1. I am wondering if we could achieve the same result by just
> > > > lowering
> > > > > > > local.retention.ms and local.retention.bytes. This also allows
> > the
> > > > > newly
> > > > > > > started follower to build up the local data before serving the
> > > >

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-23 Thread Abhijeet Kumar
Hi Divij,

Seems like there is some confusion about the new protocol for fetching from
tiered offset.
The scenario you are highlighting is where,
Leader's Log Start Offset = 0
Last Tiered Offset = 10

Following is the sequence of events that will happen:

1. Follower requests offset 0 from the leader
2. Assuming offset 0 is not available locally (to arrive at your scenario),
Leader returns OffsetMovedToTieredStorageException
3. Follower fetches the earliest pending upload offset and receives 11
4. Follower builds aux state from [0-10] and sets the fetch offset to 11
(This step corresponds to step 3 in your email)

At this stage, even if the leader has uploaded more data and the
last-tiered offset has changed (say to 15), it will not matter
because offset 11 should still be available on the leader and when the
follower requests data with fetch offset 11, the leader
will return with a valid partition data response which the follower can
consume and proceed further. Even if the offset 11 is not
available anymore, the follower will eventually be able to catch up with
the leader by resetting its fetch offset until the offset
is available on the leader's local log. Once it catches up, replication on
the follower can proceed.

Regards,
Abhijeet.



On Tue, Jul 2, 2024 at 3:55 PM Divij Vaidya  wrote:

> Hi folks.
>
> I am late to the party but I have a question on the proposal.
>
> How are we preventing a situation such as the following:
>
> 1. Empty follower asks leader for 0
> 2. Leader compares 0 with last-tiered-offset, and responds with 11 (where10
> is last-tiered-offset) and a OffsetMovedToTieredException
> 3. Follower builds aux state from [0-10] and sets the fetch offset to 11
> 4. But leader has already uploaded more data and now the new
> last-tiered-offset is 15
> 5. Go back to 2
>
> This could cause a cycle where the replica will be stuck trying to
> reconcile with the leader.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar  >
> wrote:
>
> > Thank you all for your comments. As all the comments in the thread are
> > addressed, I am starting a Vote thread for the KIP. Please have a look.
> >
> > Regards.
> > Abhijeet.
> >
> >
> >
> > On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the update.
> > >
> > > I have no more comments.
> > >
> > > Luke
> > >
> > > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the updated KIP. It looks good to me.
> > > >
> > > > Jun
> > > >
> > > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Please find my comments inline.
> > > > >
> > > > >
> > > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > > > wrote:
> > > > >
> > > > > > Hi, Abhijeet,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 1. I am wondering if we could achieve the same result by just
> > > lowering
> > > > > > local.retention.ms and local.retention.bytes. This also allows
> the
> > > > newly
> > > > > > started follower to build up the local data before serving the
> > > consumer
> > > > > > traffic.
> > > > > >
> > > > >
> > > > > I am not sure I fully followed this. Do you mean we could lower the
> > > > > local.retention (by size and time)
> > > > > so that there is little data on the leader's local storage so that
> > the
> > > > > follower can quickly catch up with the leader?
> > > > >
> > > > > In that case, we will need to set small local retention across
> > brokers
> > > in
> > > > > the cluster. It will have the undesired
> > > > > effect where there will be increased remote log fetches for serving
> > > > consume
> > > > > requests, and this can cause
> > > > > degradations. Also, this behaviour (of increased remote fetches)
> will
> > > > > happen on all brokers at all times, whereas in
> > > > > the KIP we are restricting the behavior only to the newly
> > bootstrapped
> > > > > brokers and only until the time it fully builds
> > > > > the local logs as per retention defined at the cluster level.
> > > > > (Deprioritization of the broker could help reduce the impact
> > > > >  even further)
> > > > >
> > > > >
> > > > > >
> > > > > > 2. Have you updated the KIP?
> > > > > >
> > > > >
> > > > > The KIP has been updated now.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 to Jun for adding the consumer fetching from a follower
> > scenario
> > > > > > > also to the existing section that talked about the drawback
> when
> > a
> > > > > > > node built with last-tiered-offset has become a leader. As
> > Abhijeet
> > > > > > > mentioned, we plan to have a follow-up KIP that will address
> > these
> > > by
> > > > > > > having a deprioritzation of these brokers.

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-02 Thread Divij Vaidya
Following up on my previous comment:

An alternative approach could be to have an empty follower start
replication from last-tiered-offset (already available as part of
listOffsets) inclusive. On the leader, we change the logic (based on a
configurable threshold) on when we return OffsetMovedToTieredException vs.
when we fetch from remote and return data to follower.

As an example, the solution works as follows:
1. Follower asks the leader for fetch offset Y.
2. Leader compares if (last-tiered-offset - Y > Z), where Z is a configured
threshold. If true, we will return OffsetMovedToTieredException and the
follower will ask again with fetch offset = last-tiered-offset. If false,
leader will fetch offset Y from remote and return it to the follower.

The advantages of this approach over the proposed solution are:
1. we won't be in a cyclic situation as mentioned in my previous email
2. it works with existing protocol which returns last-tiered-offset, i.e.
we won't have to make changes to the protocol to add the
new Earliest-Pending-Upload-Offset

The disadvantages of this approach over the proposed solution are:
1. on the leader, we may have to fetch some data from remote to respond to
the follower. The amount of this data can be controlled via the configured
value Z which can be set based on how aggressive the upload/archival
process is.

--
Divij Vaidya



On Tue, Jul 2, 2024 at 12:25 PM Divij Vaidya 
wrote:

> Hi folks.
>
> I am late to the party but I have a question on the proposal.
>
> How are we preventing a situation such as the following:
>
> 1. Empty follower asks leader for 0
> 2. Leader compares 0 with last-tiered-offset, and responds with 11
> (where10 is last-tiered-offset) and a OffsetMovedToTieredException
> 3. Follower builds aux state from [0-10] and sets the fetch offset to 11
> 4. But leader has already uploaded more data and now the new
> last-tiered-offset is 15
> 5. Go back to 2
>
> This could cause a cycle where the replica will be stuck trying to
> reconcile with the leader.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar 
> wrote:
>
>> Thank you all for your comments. As all the comments in the thread are
>> addressed, I am starting a Vote thread for the KIP. Please have a look.
>>
>> Regards.
>> Abhijeet.
>>
>>
>>
>> On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
>>
>> > Hi, Abhijeet,
>> >
>> > Thanks for the update.
>> >
>> > I have no more comments.
>> >
>> > Luke
>> >
>> > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
>> wrote:
>> >
>> > > Hi, Abhijeet,
>> > >
>> > > Thanks for the updated KIP. It looks good to me.
>> > >
>> > > Jun
>> > >
>> > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
>> > > abhijeet.cse@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > Please find my comments inline.
>> > > >
>> > > >
>> > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
>> > > wrote:
>> > > >
>> > > > > Hi, Abhijeet,
>> > > > >
>> > > > > Thanks for the reply.
>> > > > >
>> > > > > 1. I am wondering if we could achieve the same result by just
>> > lowering
>> > > > > local.retention.ms and local.retention.bytes. This also allows
>> the
>> > > newly
>> > > > > started follower to build up the local data before serving the
>> > consumer
>> > > > > traffic.
>> > > > >
>> > > >
>> > > > I am not sure I fully followed this. Do you mean we could lower the
>> > > > local.retention (by size and time)
>> > > > so that there is little data on the leader's local storage so that
>> the
>> > > > follower can quickly catch up with the leader?
>> > > >
>> > > > In that case, we will need to set small local retention across
>> brokers
>> > in
>> > > > the cluster. It will have the undesired
>> > > > effect where there will be increased remote log fetches for serving
>> > > consume
>> > > > requests, and this can cause
>> > > > degradations. Also, this behaviour (of increased remote fetches)
>> will
>> > > > happen on all brokers at all times, whereas in
>> > > > the KIP we are restricting the behavior only to the newly
>> bootstrapped
>> > > > brokers and only until the time it fully builds
>> > > > the local logs as per retention defined at the cluster level.
>> > > > (Deprioritization of the broker could help reduce the impact
>> > > >  even further)
>> > > >
>> > > >
>> > > > >
>> > > > > 2. Have you updated the KIP?
>> > > > >
>> > > >
>> > > > The KIP has been updated now.
>> > > >
>> > > >
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
>> > > satish.dugg...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > +1 to Jun for adding the consumer fetching from a follower
>> scenario
>> > > > > > also to the existing section that talked about the drawback
>> when a
>> > > > > > node built with last-tiered-offset has become a leader. As
>> Abhijeet
>> > > > > > mentioned, we plan to have a follow-up KIP that will address
>> these
>> > by
>> > > > > > having a deprioritzation of 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-02 Thread Divij Vaidya
Hi folks.

I am late to the party but I have a question on the proposal.

How are we preventing a situation such as the following:

1. Empty follower asks leader for 0
2. Leader compares 0 with last-tiered-offset, and responds with 11 (where10
is last-tiered-offset) and a OffsetMovedToTieredException
3. Follower builds aux state from [0-10] and sets the fetch offset to 11
4. But leader has already uploaded more data and now the new
last-tiered-offset is 15
5. Go back to 2

This could cause a cycle where the replica will be stuck trying to
reconcile with the leader.

--
Divij Vaidya



On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar 
wrote:

> Thank you all for your comments. As all the comments in the thread are
> addressed, I am starting a Vote thread for the KIP. Please have a look.
>
> Regards.
> Abhijeet.
>
>
>
> On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the update.
> >
> > I have no more comments.
> >
> > Luke
> >
> > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the updated KIP. It looks good to me.
> > >
> > > Jun
> > >
> > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Please find my comments inline.
> > > >
> > > >
> > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 1. I am wondering if we could achieve the same result by just
> > lowering
> > > > > local.retention.ms and local.retention.bytes. This also allows the
> > > newly
> > > > > started follower to build up the local data before serving the
> > consumer
> > > > > traffic.
> > > > >
> > > >
> > > > I am not sure I fully followed this. Do you mean we could lower the
> > > > local.retention (by size and time)
> > > > so that there is little data on the leader's local storage so that
> the
> > > > follower can quickly catch up with the leader?
> > > >
> > > > In that case, we will need to set small local retention across
> brokers
> > in
> > > > the cluster. It will have the undesired
> > > > effect where there will be increased remote log fetches for serving
> > > consume
> > > > requests, and this can cause
> > > > degradations. Also, this behaviour (of increased remote fetches) will
> > > > happen on all brokers at all times, whereas in
> > > > the KIP we are restricting the behavior only to the newly
> bootstrapped
> > > > brokers and only until the time it fully builds
> > > > the local logs as per retention defined at the cluster level.
> > > > (Deprioritization of the broker could help reduce the impact
> > > >  even further)
> > > >
> > > >
> > > > >
> > > > > 2. Have you updated the KIP?
> > > > >
> > > >
> > > > The KIP has been updated now.
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 to Jun for adding the consumer fetching from a follower
> scenario
> > > > > > also to the existing section that talked about the drawback when
> a
> > > > > > node built with last-tiered-offset has become a leader. As
> Abhijeet
> > > > > > mentioned, we plan to have a follow-up KIP that will address
> these
> > by
> > > > > > having a deprioritzation of these brokers. The deprioritization
> of
> > > > > > those brokers can be removed once they catchup until the local
> log
> > > > > > retention.
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen 
> wrote:
> > > > > > >
> > > > > > > Hi Abhijeet,
> > > > > > >
> > > > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > > > >
> > > > > > > Questions:
> > > > > > > 1. We could also get the "pending-upload-offset" and epoch via
> > > remote
> > > > > log
> > > > > > > metadata, instead of adding a new API to fetch from the leader.
> > > Could
> > > > > you
> > > > > > > explain why you choose the later approach, instead of the
> former?
> > > > > > > 2.
> > > > > > > > We plan to have a follow-up KIP that will address both the
> > > > > > > deprioritization
> > > > > > > of these brokers from leadership, as well as
> > > > > > > for consumption (when fetching from followers is allowed).
> > > > > > >
> > > > > > > I agree with Jun that we might need to make it clear all
> possible
> > > > > > drawbacks
> > > > > > > that could have. So, could we add the drawbacks that Jun
> > mentioned
> > > > > about
> > > > > > > the performance issue when consumer fetch from follower?
> > > > > > >
> > > > > > > 3. Could we add "Rejected Alternatives" section to the end of
> the
> > > KIP
> > > > > to
> > > > > > > add some of them?
> > > > > > > Like the "ListOffsetRequest" approach VS
> > > > > "Earliest-Pending-Upload-Offset"
> > > > > > > approach, or getting the "Earliest-Pending-Upl

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Abhijeet Kumar
Thank you all for your comments. As all the comments in the thread are
addressed, I am starting a Vote thread for the KIP. Please have a look.

Regards.
Abhijeet.



On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:

> Hi, Abhijeet,
>
> Thanks for the update.
>
> I have no more comments.
>
> Luke
>
> On Thu, Apr 25, 2024 at 4:21 AM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the updated KIP. It looks good to me.
> >
> > Jun
> >
> > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Please find my comments inline.
> > >
> > >
> > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 1. I am wondering if we could achieve the same result by just
> lowering
> > > > local.retention.ms and local.retention.bytes. This also allows the
> > newly
> > > > started follower to build up the local data before serving the
> consumer
> > > > traffic.
> > > >
> > >
> > > I am not sure I fully followed this. Do you mean we could lower the
> > > local.retention (by size and time)
> > > so that there is little data on the leader's local storage so that the
> > > follower can quickly catch up with the leader?
> > >
> > > In that case, we will need to set small local retention across brokers
> in
> > > the cluster. It will have the undesired
> > > effect where there will be increased remote log fetches for serving
> > consume
> > > requests, and this can cause
> > > degradations. Also, this behaviour (of increased remote fetches) will
> > > happen on all brokers at all times, whereas in
> > > the KIP we are restricting the behavior only to the newly bootstrapped
> > > brokers and only until the time it fully builds
> > > the local logs as per retention defined at the cluster level.
> > > (Deprioritization of the broker could help reduce the impact
> > >  even further)
> > >
> > >
> > > >
> > > > 2. Have you updated the KIP?
> > > >
> > >
> > > The KIP has been updated now.
> > >
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > > also to the existing section that talked about the drawback when a
> > > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > > mentioned, we plan to have a follow-up KIP that will address these
> by
> > > > > having a deprioritzation of these brokers. The deprioritization of
> > > > > those brokers can be removed once they catchup until the local log
> > > > > retention.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > > > >
> > > > > > Hi Abhijeet,
> > > > > >
> > > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > > >
> > > > > > Questions:
> > > > > > 1. We could also get the "pending-upload-offset" and epoch via
> > remote
> > > > log
> > > > > > metadata, instead of adding a new API to fetch from the leader.
> > Could
> > > > you
> > > > > > explain why you choose the later approach, instead of the former?
> > > > > > 2.
> > > > > > > We plan to have a follow-up KIP that will address both the
> > > > > > deprioritization
> > > > > > of these brokers from leadership, as well as
> > > > > > for consumption (when fetching from followers is allowed).
> > > > > >
> > > > > > I agree with Jun that we might need to make it clear all possible
> > > > > drawbacks
> > > > > > that could have. So, could we add the drawbacks that Jun
> mentioned
> > > > about
> > > > > > the performance issue when consumer fetch from follower?
> > > > > >
> > > > > > 3. Could we add "Rejected Alternatives" section to the end of the
> > KIP
> > > > to
> > > > > > add some of them?
> > > > > > Like the "ListOffsetRequest" approach VS
> > > > "Earliest-Pending-Upload-Offset"
> > > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from
> > remote
> > > > log
> > > > > > metadata... etc.
> > > > > >
> > > > > > Thanks.
> > > > > > Luke
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > > > abhijeet.cse@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Christo,
> > > > > > >
> > > > > > > Please find my comments inline.
> > > > > > >
> > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > > > christolo...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello Abhijeet and Jun,
> > > > > > > >
> > > > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > > > >
> > > > > > > > re: Jun
> > > > > > > >
> > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new
> > timestamps
> > > -
> > > > in
> > > > > > > > retrospect it should have been fairly obvious. I would need
> to
> > go
> > > > an
> > > > > > > update
> > > > > > > > KIP-1005

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Luke Chen
Hi, Abhijeet,

Thanks for the update.

I have no more comments.

Luke

On Thu, Apr 25, 2024 at 4:21 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the updated KIP. It looks good to me.
>
> Jun
>
> On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > Please find my comments inline.
> >
> >
> > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the reply.
> > >
> > > 1. I am wondering if we could achieve the same result by just lowering
> > > local.retention.ms and local.retention.bytes. This also allows the
> newly
> > > started follower to build up the local data before serving the consumer
> > > traffic.
> > >
> >
> > I am not sure I fully followed this. Do you mean we could lower the
> > local.retention (by size and time)
> > so that there is little data on the leader's local storage so that the
> > follower can quickly catch up with the leader?
> >
> > In that case, we will need to set small local retention across brokers in
> > the cluster. It will have the undesired
> > effect where there will be increased remote log fetches for serving
> consume
> > requests, and this can cause
> > degradations. Also, this behaviour (of increased remote fetches) will
> > happen on all brokers at all times, whereas in
> > the KIP we are restricting the behavior only to the newly bootstrapped
> > brokers and only until the time it fully builds
> > the local logs as per retention defined at the cluster level.
> > (Deprioritization of the broker could help reduce the impact
> >  even further)
> >
> >
> > >
> > > 2. Have you updated the KIP?
> > >
> >
> > The KIP has been updated now.
> >
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > also to the existing section that talked about the drawback when a
> > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > mentioned, we plan to have a follow-up KIP that will address these by
> > > > having a deprioritzation of these brokers. The deprioritization of
> > > > those brokers can be removed once they catchup until the local log
> > > > retention.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > > >
> > > > > Hi Abhijeet,
> > > > >
> > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > >
> > > > > Questions:
> > > > > 1. We could also get the "pending-upload-offset" and epoch via
> remote
> > > log
> > > > > metadata, instead of adding a new API to fetch from the leader.
> Could
> > > you
> > > > > explain why you choose the later approach, instead of the former?
> > > > > 2.
> > > > > > We plan to have a follow-up KIP that will address both the
> > > > > deprioritization
> > > > > of these brokers from leadership, as well as
> > > > > for consumption (when fetching from followers is allowed).
> > > > >
> > > > > I agree with Jun that we might need to make it clear all possible
> > > > drawbacks
> > > > > that could have. So, could we add the drawbacks that Jun mentioned
> > > about
> > > > > the performance issue when consumer fetch from follower?
> > > > >
> > > > > 3. Could we add "Rejected Alternatives" section to the end of the
> KIP
> > > to
> > > > > add some of them?
> > > > > Like the "ListOffsetRequest" approach VS
> > > "Earliest-Pending-Upload-Offset"
> > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from
> remote
> > > log
> > > > > metadata... etc.
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > >
> > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Christo,
> > > > > >
> > > > > > Please find my comments inline.
> > > > > >
> > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > > christolo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Abhijeet and Jun,
> > > > > > >
> > > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > > >
> > > > > > > re: Jun
> > > > > > >
> > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new
> timestamps
> > -
> > > in
> > > > > > > retrospect it should have been fairly obvious. I would need to
> go
> > > an
> > > > > > update
> > > > > > > KIP-1005 myself then, thank you for giving the useful
> reference!
> > > > > > >
> > > > > > > 4. I think Abhijeet wants to rebuild state from
> > > latest-tiered-offset
> > > > and
> > > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > > > replicas
> > > > > > > which experienced a disk failure) to decrease the time a
> > partition
> > > > spends
> > > > > > > in under-replicated state. In other words, a follower which has
> > > just
> > > > > > fallen
> > > > > > > out of ISR, but has local data will conti

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-24 Thread Jun Rao
Hi, Abhijeet,

Thanks for the updated KIP. It looks good to me.

Jun

On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar 
wrote:

> Hi Jun,
>
> Please find my comments inline.
>
>
> On Thu, Apr 18, 2024 at 3:26 AM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the reply.
> >
> > 1. I am wondering if we could achieve the same result by just lowering
> > local.retention.ms and local.retention.bytes. This also allows the newly
> > started follower to build up the local data before serving the consumer
> > traffic.
> >
>
> I am not sure I fully followed this. Do you mean we could lower the
> local.retention (by size and time)
> so that there is little data on the leader's local storage so that the
> follower can quickly catch up with the leader?
>
> In that case, we will need to set small local retention across brokers in
> the cluster. It will have the undesired
> effect where there will be increased remote log fetches for serving consume
> requests, and this can cause
> degradations. Also, this behaviour (of increased remote fetches) will
> happen on all brokers at all times, whereas in
> the KIP we are restricting the behavior only to the newly bootstrapped
> brokers and only until the time it fully builds
> the local logs as per retention defined at the cluster level.
> (Deprioritization of the broker could help reduce the impact
>  even further)
>
>
> >
> > 2. Have you updated the KIP?
> >
>
> The KIP has been updated now.
>
>
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
> > wrote:
> >
> > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > also to the existing section that talked about the drawback when a
> > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > mentioned, we plan to have a follow-up KIP that will address these by
> > > having a deprioritzation of these brokers. The deprioritization of
> > > those brokers can be removed once they catchup until the local log
> > > retention.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > >
> > > > Hi Abhijeet,
> > > >
> > > > Thanks for the KIP to improve the tiered storage feature!
> > > >
> > > > Questions:
> > > > 1. We could also get the "pending-upload-offset" and epoch via remote
> > log
> > > > metadata, instead of adding a new API to fetch from the leader. Could
> > you
> > > > explain why you choose the later approach, instead of the former?
> > > > 2.
> > > > > We plan to have a follow-up KIP that will address both the
> > > > deprioritization
> > > > of these brokers from leadership, as well as
> > > > for consumption (when fetching from followers is allowed).
> > > >
> > > > I agree with Jun that we might need to make it clear all possible
> > > drawbacks
> > > > that could have. So, could we add the drawbacks that Jun mentioned
> > about
> > > > the performance issue when consumer fetch from follower?
> > > >
> > > > 3. Could we add "Rejected Alternatives" section to the end of the KIP
> > to
> > > > add some of them?
> > > > Like the "ListOffsetRequest" approach VS
> > "Earliest-Pending-Upload-Offset"
> > > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote
> > log
> > > > metadata... etc.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > >
> > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Christo,
> > > > >
> > > > > Please find my comments inline.
> > > > >
> > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > christolo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello Abhijeet and Jun,
> > > > > >
> > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > >
> > > > > > re: Jun
> > > > > >
> > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps
> -
> > in
> > > > > > retrospect it should have been fairly obvious. I would need to go
> > an
> > > > > update
> > > > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > > > >
> > > > > > 4. I think Abhijeet wants to rebuild state from
> > latest-tiered-offset
> > > and
> > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > > replicas
> > > > > > which experienced a disk failure) to decrease the time a
> partition
> > > spends
> > > > > > in under-replicated state. In other words, a follower which has
> > just
> > > > > fallen
> > > > > > out of ISR, but has local data will continue using today's Tiered
> > > Storage
> > > > > > replication protocol (i.e. fetching from earliest-local). I
> further
> > > > > believe
> > > > > > he has taken this approach so that local state of replicas which
> > have
> > > > > just
> > > > > > fallen out of ISR isn't forcefully wiped thus leading to
> situation
> > 1.
> > > > > > Abhijeet, have I understood (and summarised) what you are
> proposing
> > > > > > correctly?
> > > > > >
> > > > > > Yes, your understanding is co

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-22 Thread Abhijeet Kumar
Hi Jun,

Please find my comments inline.


On Thu, Apr 18, 2024 at 3:26 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the reply.
>
> 1. I am wondering if we could achieve the same result by just lowering
> local.retention.ms and local.retention.bytes. This also allows the newly
> started follower to build up the local data before serving the consumer
> traffic.
>

I am not sure I fully followed this. Do you mean we could lower the
local.retention (by size and time)
so that there is little data on the leader's local storage so that the
follower can quickly catch up with the leader?

In that case, we will need to set small local retention across brokers in
the cluster. It will have the undesired
effect where there will be increased remote log fetches for serving consume
requests, and this can cause
degradations. Also, this behaviour (of increased remote fetches) will
happen on all brokers at all times, whereas in
the KIP we are restricting the behavior only to the newly bootstrapped
brokers and only until the time it fully builds
the local logs as per retention defined at the cluster level.
(Deprioritization of the broker could help reduce the impact
 even further)


>
> 2. Have you updated the KIP?
>

The KIP has been updated now.


>
> Thanks,
>
> Jun
>
> On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
> wrote:
>
> > +1 to Jun for adding the consumer fetching from a follower scenario
> > also to the existing section that talked about the drawback when a
> > node built with last-tiered-offset has become a leader. As Abhijeet
> > mentioned, we plan to have a follow-up KIP that will address these by
> > having a deprioritzation of these brokers. The deprioritization of
> > those brokers can be removed once they catchup until the local log
> > retention.
> >
> > Thanks,
> > Satish.
> >
> > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > >
> > > Hi Abhijeet,
> > >
> > > Thanks for the KIP to improve the tiered storage feature!
> > >
> > > Questions:
> > > 1. We could also get the "pending-upload-offset" and epoch via remote
> log
> > > metadata, instead of adding a new API to fetch from the leader. Could
> you
> > > explain why you choose the later approach, instead of the former?
> > > 2.
> > > > We plan to have a follow-up KIP that will address both the
> > > deprioritization
> > > of these brokers from leadership, as well as
> > > for consumption (when fetching from followers is allowed).
> > >
> > > I agree with Jun that we might need to make it clear all possible
> > drawbacks
> > > that could have. So, could we add the drawbacks that Jun mentioned
> about
> > > the performance issue when consumer fetch from follower?
> > >
> > > 3. Could we add "Rejected Alternatives" section to the end of the KIP
> to
> > > add some of them?
> > > Like the "ListOffsetRequest" approach VS
> "Earliest-Pending-Upload-Offset"
> > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote
> log
> > > metadata... etc.
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Christo,
> > > >
> > > > Please find my comments inline.
> > > >
> > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Abhijeet and Jun,
> > > > >
> > > > > I have been mulling this KIP over a bit more in recent days!
> > > > >
> > > > > re: Jun
> > > > >
> > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps -
> in
> > > > > retrospect it should have been fairly obvious. I would need to go
> an
> > > > update
> > > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > > >
> > > > > 4. I think Abhijeet wants to rebuild state from
> latest-tiered-offset
> > and
> > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > replicas
> > > > > which experienced a disk failure) to decrease the time a partition
> > spends
> > > > > in under-replicated state. In other words, a follower which has
> just
> > > > fallen
> > > > > out of ISR, but has local data will continue using today's Tiered
> > Storage
> > > > > replication protocol (i.e. fetching from earliest-local). I further
> > > > believe
> > > > > he has taken this approach so that local state of replicas which
> have
> > > > just
> > > > > fallen out of ISR isn't forcefully wiped thus leading to situation
> 1.
> > > > > Abhijeet, have I understood (and summarised) what you are proposing
> > > > > correctly?
> > > > >
> > > > > Yes, your understanding is correct. We want to limit the behavior
> > changes
> > > > only to new replicas.
> > > >
> > > >
> > > > > 5. I think in today's Tiered Storage we know the leader's
> > > > log-start-offset
> > > > > from the FetchResponse and we can learn its local-log-start-offset
> > from
> > > > the
> > > > > ListOffsets by asking for earliest-local timestamp (-4). But
> granted,
> > > > this
> > > > > ought to be added as an add

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-22 Thread Abhijeet Kumar
Hi Luke,

Thanks for your comments. Please find my responses inline.

On Tue, Apr 9, 2024 at 2:08 PM Luke Chen  wrote:

> Hi Abhijeet,
>
> Thanks for the KIP to improve the tiered storage feature!
>
> Questions:
> 1. We could also get the "pending-upload-offset" and epoch via remote log
> metadata, instead of adding a new API to fetch from the leader. Could you
> explain why you choose the later approach, instead of the former?
>

The remote log metadata could be tracking overlapping log segments. The
maximum offset
across the log segments it may be tracking, may not be the
last-tiered-offset because of truncations
during unclean leader election. Remote Log metadata alone is not sufficient
to identify last-tiered-offset or
pending-upload-offset.

Only the leader knows the correct lineage of offsets that is required to
identify the "pending-upload-offset".
That is the reason we chose to add a new API to fetch this information from
the leader.


2.
> > We plan to have a follow-up KIP that will address both the
> deprioritization
> of these brokers from leadership, as well as
> for consumption (when fetching from followers is allowed).
>
> I agree with Jun that we might need to make it clear all possible drawbacks
> that could have. So, could we add the drawbacks that Jun mentioned about
> the performance issue when consumer fetch from follower?
>
>
Updated the KIP to call this out.


> 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> add some of them?
> Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> metadata... etc.
>
> Added the section on Rejected Alternatives


> Thanks.
> Luke
>
>
> On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
> wrote:
>
> > Hi Christo,
> >
> > Please find my comments inline.
> >
> > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > wrote:
> >
> > > Hello Abhijeet and Jun,
> > >
> > > I have been mulling this KIP over a bit more in recent days!
> > >
> > > re: Jun
> > >
> > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > retrospect it should have been fairly obvious. I would need to go an
> > update
> > > KIP-1005 myself then, thank you for giving the useful reference!
> > >
> > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset
> and
> > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > > which experienced a disk failure) to decrease the time a partition
> spends
> > > in under-replicated state. In other words, a follower which has just
> > fallen
> > > out of ISR, but has local data will continue using today's Tiered
> Storage
> > > replication protocol (i.e. fetching from earliest-local). I further
> > believe
> > > he has taken this approach so that local state of replicas which have
> > just
> > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > Abhijeet, have I understood (and summarised) what you are proposing
> > > correctly?
> > >
> > > Yes, your understanding is correct. We want to limit the behavior
> changes
> > only to new replicas.
> >
> >
> > > 5. I think in today's Tiered Storage we know the leader's
> > log-start-offset
> > > from the FetchResponse and we can learn its local-log-start-offset from
> > the
> > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > this
> > > ought to be added as an additional API call in the KIP.
> > >
> > >
> > Yes, I clarified this in my reply to Jun. I will add this missing detail
> in
> > the KIP.
> >
> >
> > > re: Abhijeet
> > >
> > > 101. I am still a bit confused as to why you want to include a new
> offset
> > > (i.e. pending-upload-offset) when you yourself mention that you could
> use
> > > an already existing offset (i.e. last-tiered-offset + 1). In essence,
> you
> > > end your Motivation with "In this KIP, we will focus only on the
> follower
> > > fetch protocol using the *last-tiered-offset*" and then in the
> following
> > > sections you talk about pending-upload-offset. I understand this might
> be
> > > classified as an implementation detail, but if you introduce a new
> offset
> > > (i.e. pending-upload-offset) you have to make a change to the
> ListOffsets
> > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > However,
> > > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > > (under implementation). Am I misunderstanding something here?
> > >
> >
> > I have tried to clarify this in my reply to Jun.
> >
> > > The follower needs to build the local data starting from the offset
> > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > > corresponding leader-epoch.
> > > There are two ways to do this:
> > >1. We add support in ListOffsetRequest to be able to fetch this
> offset
> > > (and leader epoch) from the leader.
> > >2. Or, fetch the tiered-offset (which is already supported). From
> this
> >

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-17 Thread Jun Rao
Hi, Abhijeet,

Thanks for the reply.

1. I am wondering if we could achieve the same result by just lowering
local.retention.ms and local.retention.bytes. This also allows the newly
started follower to build up the local data before serving the consumer
traffic.

2. Have you updated the KIP?

Thanks,

Jun

On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
wrote:

> +1 to Jun for adding the consumer fetching from a follower scenario
> also to the existing section that talked about the drawback when a
> node built with last-tiered-offset has become a leader. As Abhijeet
> mentioned, we plan to have a follow-up KIP that will address these by
> having a deprioritzation of these brokers. The deprioritization of
> those brokers can be removed once they catchup until the local log
> retention.
>
> Thanks,
> Satish.
>
> On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> >
> > Hi Abhijeet,
> >
> > Thanks for the KIP to improve the tiered storage feature!
> >
> > Questions:
> > 1. We could also get the "pending-upload-offset" and epoch via remote log
> > metadata, instead of adding a new API to fetch from the leader. Could you
> > explain why you choose the later approach, instead of the former?
> > 2.
> > > We plan to have a follow-up KIP that will address both the
> > deprioritization
> > of these brokers from leadership, as well as
> > for consumption (when fetching from followers is allowed).
> >
> > I agree with Jun that we might need to make it clear all possible
> drawbacks
> > that could have. So, could we add the drawbacks that Jun mentioned about
> > the performance issue when consumer fetch from follower?
> >
> > 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> > add some of them?
> > Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> > approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> > metadata... etc.
> >
> > Thanks.
> > Luke
> >
> >
> > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Christo,
> > >
> > > Please find my comments inline.
> > >
> > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > > wrote:
> > >
> > > > Hello Abhijeet and Jun,
> > > >
> > > > I have been mulling this KIP over a bit more in recent days!
> > > >
> > > > re: Jun
> > > >
> > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > > retrospect it should have been fairly obvious. I would need to go an
> > > update
> > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > >
> > > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset
> and
> > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> replicas
> > > > which experienced a disk failure) to decrease the time a partition
> spends
> > > > in under-replicated state. In other words, a follower which has just
> > > fallen
> > > > out of ISR, but has local data will continue using today's Tiered
> Storage
> > > > replication protocol (i.e. fetching from earliest-local). I further
> > > believe
> > > > he has taken this approach so that local state of replicas which have
> > > just
> > > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > > Abhijeet, have I understood (and summarised) what you are proposing
> > > > correctly?
> > > >
> > > > Yes, your understanding is correct. We want to limit the behavior
> changes
> > > only to new replicas.
> > >
> > >
> > > > 5. I think in today's Tiered Storage we know the leader's
> > > log-start-offset
> > > > from the FetchResponse and we can learn its local-log-start-offset
> from
> > > the
> > > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > > this
> > > > ought to be added as an additional API call in the KIP.
> > > >
> > > >
> > > Yes, I clarified this in my reply to Jun. I will add this missing
> detail in
> > > the KIP.
> > >
> > >
> > > > re: Abhijeet
> > > >
> > > > 101. I am still a bit confused as to why you want to include a new
> offset
> > > > (i.e. pending-upload-offset) when you yourself mention that you
> could use
> > > > an already existing offset (i.e. last-tiered-offset + 1). In
> essence, you
> > > > end your Motivation with "In this KIP, we will focus only on the
> follower
> > > > fetch protocol using the *last-tiered-offset*" and then in the
> following
> > > > sections you talk about pending-upload-offset. I understand this
> might be
> > > > classified as an implementation detail, but if you introduce a new
> offset
> > > > (i.e. pending-upload-offset) you have to make a change to the
> ListOffsets
> > > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > > However,
> > > > the last-tiered-offset ought to already be exposed as part of
> KIP-1005
> > > > (under implementation). Am I misunderstanding something here?
> > > >
> > >
> > > I have tried to clarify this in my reply to Jun.
> > >
> > > > The follower needs to build the local da

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Satish Duggana
+1 to Jun for adding the consumer fetching from a follower scenario
also to the existing section that talked about the drawback when a
node built with last-tiered-offset has become a leader. As Abhijeet
mentioned, we plan to have a follow-up KIP that will address these by
having a deprioritzation of these brokers. The deprioritization of
those brokers can be removed once they catchup until the local log
retention.

Thanks,
Satish.

On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
>
> Hi Abhijeet,
>
> Thanks for the KIP to improve the tiered storage feature!
>
> Questions:
> 1. We could also get the "pending-upload-offset" and epoch via remote log
> metadata, instead of adding a new API to fetch from the leader. Could you
> explain why you choose the later approach, instead of the former?
> 2.
> > We plan to have a follow-up KIP that will address both the
> deprioritization
> of these brokers from leadership, as well as
> for consumption (when fetching from followers is allowed).
>
> I agree with Jun that we might need to make it clear all possible drawbacks
> that could have. So, could we add the drawbacks that Jun mentioned about
> the performance issue when consumer fetch from follower?
>
> 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> add some of them?
> Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> metadata... etc.
>
> Thanks.
> Luke
>
>
> On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
> wrote:
>
> > Hi Christo,
> >
> > Please find my comments inline.
> >
> > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > wrote:
> >
> > > Hello Abhijeet and Jun,
> > >
> > > I have been mulling this KIP over a bit more in recent days!
> > >
> > > re: Jun
> > >
> > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > retrospect it should have been fairly obvious. I would need to go an
> > update
> > > KIP-1005 myself then, thank you for giving the useful reference!
> > >
> > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > > which experienced a disk failure) to decrease the time a partition spends
> > > in under-replicated state. In other words, a follower which has just
> > fallen
> > > out of ISR, but has local data will continue using today's Tiered Storage
> > > replication protocol (i.e. fetching from earliest-local). I further
> > believe
> > > he has taken this approach so that local state of replicas which have
> > just
> > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > Abhijeet, have I understood (and summarised) what you are proposing
> > > correctly?
> > >
> > > Yes, your understanding is correct. We want to limit the behavior changes
> > only to new replicas.
> >
> >
> > > 5. I think in today's Tiered Storage we know the leader's
> > log-start-offset
> > > from the FetchResponse and we can learn its local-log-start-offset from
> > the
> > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > this
> > > ought to be added as an additional API call in the KIP.
> > >
> > >
> > Yes, I clarified this in my reply to Jun. I will add this missing detail in
> > the KIP.
> >
> >
> > > re: Abhijeet
> > >
> > > 101. I am still a bit confused as to why you want to include a new offset
> > > (i.e. pending-upload-offset) when you yourself mention that you could use
> > > an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> > > end your Motivation with "In this KIP, we will focus only on the follower
> > > fetch protocol using the *last-tiered-offset*" and then in the following
> > > sections you talk about pending-upload-offset. I understand this might be
> > > classified as an implementation detail, but if you introduce a new offset
> > > (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > However,
> > > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > > (under implementation). Am I misunderstanding something here?
> > >
> >
> > I have tried to clarify this in my reply to Jun.
> >
> > > The follower needs to build the local data starting from the offset
> > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > > corresponding leader-epoch.
> > > There are two ways to do this:
> > >1. We add support in ListOffsetRequest to be able to fetch this offset
> > > (and leader epoch) from the leader.
> > >2. Or, fetch the tiered-offset (which is already supported). From this
> > > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1
> > to
> > > the tiered-offset.
> > >   However, we still need the leader epoch for offset, since there is
> > > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset
> > will
> 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-09 Thread Luke Chen
Hi Abhijeet,

Thanks for the KIP to improve the tiered storage feature!

Questions:
1. We could also get the "pending-upload-offset" and epoch via remote log
metadata, instead of adding a new API to fetch from the leader. Could you
explain why you choose the later approach, instead of the former?
2.
> We plan to have a follow-up KIP that will address both the
deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).

I agree with Jun that we might need to make it clear all possible drawbacks
that could have. So, could we add the drawbacks that Jun mentioned about
the performance issue when consumer fetch from follower?

3. Could we add "Rejected Alternatives" section to the end of the KIP to
add some of them?
Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
metadata... etc.

Thanks.
Luke


On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
wrote:

> Hi Christo,
>
> Please find my comments inline.
>
> On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> wrote:
>
> > Hello Abhijeet and Jun,
> >
> > I have been mulling this KIP over a bit more in recent days!
> >
> > re: Jun
> >
> > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > retrospect it should have been fairly obvious. I would need to go an
> update
> > KIP-1005 myself then, thank you for giving the useful reference!
> >
> > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > which experienced a disk failure) to decrease the time a partition spends
> > in under-replicated state. In other words, a follower which has just
> fallen
> > out of ISR, but has local data will continue using today's Tiered Storage
> > replication protocol (i.e. fetching from earliest-local). I further
> believe
> > he has taken this approach so that local state of replicas which have
> just
> > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > Abhijeet, have I understood (and summarised) what you are proposing
> > correctly?
> >
> > Yes, your understanding is correct. We want to limit the behavior changes
> only to new replicas.
>
>
> > 5. I think in today's Tiered Storage we know the leader's
> log-start-offset
> > from the FetchResponse and we can learn its local-log-start-offset from
> the
> > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> this
> > ought to be added as an additional API call in the KIP.
> >
> >
> Yes, I clarified this in my reply to Jun. I will add this missing detail in
> the KIP.
>
>
> > re: Abhijeet
> >
> > 101. I am still a bit confused as to why you want to include a new offset
> > (i.e. pending-upload-offset) when you yourself mention that you could use
> > an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> > end your Motivation with "In this KIP, we will focus only on the follower
> > fetch protocol using the *last-tiered-offset*" and then in the following
> > sections you talk about pending-upload-offset. I understand this might be
> > classified as an implementation detail, but if you introduce a new offset
> > (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> > API (i.e. introduce -6) and thus document it in this KIP as such.
> However,
> > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > (under implementation). Am I misunderstanding something here?
> >
>
> I have tried to clarify this in my reply to Jun.
>
> > The follower needs to build the local data starting from the offset
> > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > corresponding leader-epoch.
> > There are two ways to do this:
> >1. We add support in ListOffsetRequest to be able to fetch this offset
> > (and leader epoch) from the leader.
> >2. Or, fetch the tiered-offset (which is already supported). From this
> > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1
> to
> > the tiered-offset.
> >   However, we still need the leader epoch for offset, since there is
> > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset
> will
> > be the same as the
> >   leader epoch for tiered-offset. We may need another API call to the
> > leader for this.
> > I prefer the first approach. The only problem with the first approach is
> > that it introduces one more offset. The second approach avoids this
> problem
> > but is a little complicated.
>
>
>
> >
> > Best,
> > Christo
> >
> > On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the KIP. Left a few comments.
> > >
> > > 1. "A drawback of using the last-tiered-offset is that this new
> follower
> > > would possess only a limited number of locally stored segments. Should
> it
> > > ascend to the role of leader, there is a risk of n

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-08 Thread Abhijeet Kumar
Hi Christo,

Please find my comments inline.

On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
wrote:

> Hello Abhijeet and Jun,
>
> I have been mulling this KIP over a bit more in recent days!
>
> re: Jun
>
> I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> retrospect it should have been fairly obvious. I would need to go an update
> KIP-1005 myself then, thank you for giving the useful reference!
>
> 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> which experienced a disk failure) to decrease the time a partition spends
> in under-replicated state. In other words, a follower which has just fallen
> out of ISR, but has local data will continue using today's Tiered Storage
> replication protocol (i.e. fetching from earliest-local). I further believe
> he has taken this approach so that local state of replicas which have just
> fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> Abhijeet, have I understood (and summarised) what you are proposing
> correctly?
>
> Yes, your understanding is correct. We want to limit the behavior changes
only to new replicas.


> 5. I think in today's Tiered Storage we know the leader's log-start-offset
> from the FetchResponse and we can learn its local-log-start-offset from the
> ListOffsets by asking for earliest-local timestamp (-4). But granted, this
> ought to be added as an additional API call in the KIP.
>
>
Yes, I clarified this in my reply to Jun. I will add this missing detail in
the KIP.


> re: Abhijeet
>
> 101. I am still a bit confused as to why you want to include a new offset
> (i.e. pending-upload-offset) when you yourself mention that you could use
> an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> end your Motivation with "In this KIP, we will focus only on the follower
> fetch protocol using the *last-tiered-offset*" and then in the following
> sections you talk about pending-upload-offset. I understand this might be
> classified as an implementation detail, but if you introduce a new offset
> (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> API (i.e. introduce -6) and thus document it in this KIP as such. However,
> the last-tiered-offset ought to already be exposed as part of KIP-1005
> (under implementation). Am I misunderstanding something here?
>

I have tried to clarify this in my reply to Jun.

> The follower needs to build the local data starting from the offset
> Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> corresponding leader-epoch.
> There are two ways to do this:
>1. We add support in ListOffsetRequest to be able to fetch this offset
> (and leader epoch) from the leader.
>2. Or, fetch the tiered-offset (which is already supported). From this
> offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
> the tiered-offset.
>   However, we still need the leader epoch for offset, since there is
> no guarantee that the leader epoch for Earliest-Pending-Upload-Offset will
> be the same as the
>   leader epoch for tiered-offset. We may need another API call to the
> leader for this.
> I prefer the first approach. The only problem with the first approach is
> that it introduces one more offset. The second approach avoids this problem
> but is a little complicated.



>
> Best,
> Christo
>
> On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the KIP. Left a few comments.
> >
> > 1. "A drawback of using the last-tiered-offset is that this new follower
> > would possess only a limited number of locally stored segments. Should it
> > ascend to the role of leader, there is a risk of needing to fetch these
> > segments from the remote storage, potentially impacting broker
> > performance."
> > Since we support consumers fetching from followers, this is a potential
> > issue on the follower side too. In theory, it's possible for a segment to
> > be tiered immediately after rolling. In that case, there could be very
> > little data after last-tiered-offset. It would be useful to think through
> > how to address this issue.
> >
> > 2. ListOffsetsRequest:
> > 2.1 Typically, we need to bump up the version of the request if we add a
> > new value for timestamp. See
> >
> >
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> > .
> > 2.2 Since this changes the inter broker request protocol, it would be
> > useful to have a section on upgrade (e.g. new IBP/metadata.version).
> >
> > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch
> the
> > last-tiered-offset from the leader, and make a separate leader call to
> > fetch leader epoch for the following offset."
> > Why do we need to make a separate call for the leader epoch?
> > ListOffsetsResponse include both the offset and the corresponding epo

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-08 Thread Abhijeet Kumar
Hi Jun,

Thank you for taking the time to review the KIP. Please find my comments
inline.

On Fri, Apr 5, 2024 at 12:09 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>

We plan to have a follow-up KIP that will address both the deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).


>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
>

Yes, let me update the KIP to include this change. We will need a new
timestamp corresponding to Earliest-Pending-Upload-Offset.


> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> Make sense. I will update the KIP to capture this.


> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> I understand there is some confusion here. Let me try to explain this.

The follower needs to build the local data starting from the offset
Earliest-Pending-Upload-Offset. Hence it needs the offset and the
corresponding leader-epoch.
There are two ways to do this:
   1. We add support in ListOffsetRequest to be able to fetch this offset
(and leader epoch) from the leader.
   2. Or, fetch the tiered-offset (which is already supported). From this
offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
the tiered-offset.
  However, we still need the leader epoch for offset, since there is no
guarantee that the leader epoch for Earliest-Pending-Upload-Offset will be
the same as the
  leader epoch for tiered-offset. We may need another API call to the
leader for this.

I prefer the first approach. The only problem with the first approach is
that it introduces one more offset. The second approach avoids this problem
but is a little complicated.


> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
>
We want to limit this new behavior only to new replicas. Replicas that
become out of ISR are excluded from this behavior change. Those will
continue with the existing behavior.


> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>

Missed this detail. The follower will need to call the leader APi to fetch
the EarliestLocal offset for this.


> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader to
> > give
> > > back to a follower - earliest-pending-upload-offset or the
> > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> ought
> > to
> > > already be exposing that offset as part of the ListOffsets API, no?
> > >
> > > Best,
> > > Christo
> > >
> > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have created KIP-1023 to introduce follower fetch from tiered
> offset.
> > > > This feature will be helpful in significantly reducing Kafka
> > > > rebalance/rebuild times when the cluster is enabled with tiered
> > storage.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-05 Thread Christo Lolov
Hello Abhijeet and Jun,

I have been mulling this KIP over a bit more in recent days!

re: Jun

I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
retrospect it should have been fairly obvious. I would need to go an update
KIP-1005 myself then, thank you for giving the useful reference!

4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
fetch from latest-tiered-offset + 1 only for new replicas (or replicas
which experienced a disk failure) to decrease the time a partition spends
in under-replicated state. In other words, a follower which has just fallen
out of ISR, but has local data will continue using today's Tiered Storage
replication protocol (i.e. fetching from earliest-local). I further believe
he has taken this approach so that local state of replicas which have just
fallen out of ISR isn't forcefully wiped thus leading to situation 1.
Abhijeet, have I understood (and summarised) what you are proposing
correctly?

5. I think in today's Tiered Storage we know the leader's log-start-offset
from the FetchResponse and we can learn its local-log-start-offset from the
ListOffsets by asking for earliest-local timestamp (-4). But granted, this
ought to be added as an additional API call in the KIP.

re: Abhijeet

101. I am still a bit confused as to why you want to include a new offset
(i.e. pending-upload-offset) when you yourself mention that you could use
an already existing offset (i.e. last-tiered-offset + 1). In essence, you
end your Motivation with "In this KIP, we will focus only on the follower
fetch protocol using the *last-tiered-offset*" and then in the following
sections you talk about pending-upload-offset. I understand this might be
classified as an implementation detail, but if you introduce a new offset
(i.e. pending-upload-offset) you have to make a change to the ListOffsets
API (i.e. introduce -6) and thus document it in this KIP as such. However,
the last-tiered-offset ought to already be exposed as part of KIP-1005
(under implementation). Am I misunderstanding something here?

Best,
Christo

On Thu, 4 Apr 2024 at 19:37, Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>
> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader to
> > give
> > > back to a follower - earliest-pending-upload-offset or the
> > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> ought
> > to
> > > already be exposing that offset as part of the ListOffsets API, no?
> > >
> > > Best,
> > > Christo
> > >
> > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> abhijeet.cse@gma

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-04 Thread Jun Rao
Hi, Abhijeet,

Thanks for the KIP. Left a few comments.

1. "A drawback of using the last-tiered-offset is that this new follower
would possess only a limited number of locally stored segments. Should it
ascend to the role of leader, there is a risk of needing to fetch these
segments from the remote storage, potentially impacting broker performance."
Since we support consumers fetching from followers, this is a potential
issue on the follower side too. In theory, it's possible for a segment to
be tiered immediately after rolling. In that case, there could be very
little data after last-tiered-offset. It would be useful to think through
how to address this issue.

2. ListOffsetsRequest:
2.1 Typically, we need to bump up the version of the request if we add a
new value for timestamp. See
https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
.
2.2 Since this changes the inter broker request protocol, it would be
useful to have a section on upgrade (e.g. new IBP/metadata.version).

3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
last-tiered-offset from the leader, and make a separate leader call to
fetch leader epoch for the following offset."
Why do we need to make a separate call for the leader epoch?
ListOffsetsResponse include both the offset and the corresponding epoch.

4. "Check if the follower replica is empty and if the feature to use
last-tiered-offset is enabled."
Why do we need to check if the follower replica is empty?

5. It can be confirmed by checking if the leader's Log-Start-Offset is the
same as the Leader's Local-Log-Start-Offset.
How does the follower know Local-Log-Start-Offset?

Jun

On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar 
wrote:

> Hi Christo,
>
> Thanks for reviewing the KIP.
>
> The follower needs the earliest-pending-upload-offset (and the
> corresponding leader epoch) from the leader.
> This is the first offset the follower will have locally.
>
> Regards,
> Abhijeet.
>
>
>
> On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > First of all, thank you very much for the proposal, you have explained
> the
> > problem you want solved very well - I think a faster bootstrap of an
> empty
> > replica is definitely an improvement!
> >
> > For my understanding, which concrete offset do you want the leader to
> give
> > back to a follower - earliest-pending-upload-offset or the
> > latest-tiered-offset? If it is the second, then I believe KIP-1005 ought
> to
> > already be exposing that offset as part of the ListOffsets API, no?
> >
> > Best,
> > Christo
> >
> > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar  >
> > wrote:
> >
> > > Hi All,
> > >
> > > I have created KIP-1023 to introduce follower fetch from tiered offset.
> > > This feature will be helpful in significantly reducing Kafka
> > > rebalance/rebuild times when the cluster is enabled with tiered
> storage.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > >
> > > Feedback and suggestions are welcome.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-30 Thread Abhijeet Kumar
Hi Christo,

Thanks for reviewing the KIP.

The follower needs the earliest-pending-upload-offset (and the
corresponding leader epoch) from the leader.
This is the first offset the follower will have locally.

Regards,
Abhijeet.



On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
wrote:

> Heya!
>
> First of all, thank you very much for the proposal, you have explained the
> problem you want solved very well - I think a faster bootstrap of an empty
> replica is definitely an improvement!
>
> For my understanding, which concrete offset do you want the leader to give
> back to a follower - earliest-pending-upload-offset or the
> latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to
> already be exposing that offset as part of the ListOffsets API, no?
>
> Best,
> Christo
>
> On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar 
> wrote:
>
> > Hi All,
> >
> > I have created KIP-1023 to introduce follower fetch from tiered offset.
> > This feature will be helpful in significantly reducing Kafka
> > rebalance/rebuild times when the cluster is enabled with tiered storage.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> >
> > Feedback and suggestions are welcome.
> >
> > Regards,
> > Abhijeet.
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-29 Thread Christo Lolov
Heya!

First of all, thank you very much for the proposal, you have explained the
problem you want solved very well - I think a faster bootstrap of an empty
replica is definitely an improvement!

For my understanding, which concrete offset do you want the leader to give
back to a follower - earliest-pending-upload-offset or the
latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to
already be exposing that offset as part of the ListOffsets API, no?

Best,
Christo

On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar 
wrote:

> Hi All,
>
> I have created KIP-1023 to introduce follower fetch from tiered offset.
> This feature will be helpful in significantly reducing Kafka
> rebalance/rebuild times when the cluster is enabled with tiered storage.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
>
> Feedback and suggestions are welcome.
>
> Regards,
> Abhijeet.
>