Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Thank you all for your comments. As all the comments in the thread are addressed, I am starting a Vote thread for the KIP. Please have a look. Regards. Abhijeet. On Thu, Apr 25, 2024 at 6:08 PM Luke Chen wrote: > Hi, Abhijeet, > > Thanks for the update. > > I have no more comments. > > Luke > > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao wrote: > > > Hi, Abhijeet, > > > > Thanks for the updated KIP. It looks good to me. > > > > Jun > > > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar < > > abhijeet.cse@gmail.com> > > wrote: > > > > > Hi Jun, > > > > > > Please find my comments inline. > > > > > > > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao > > wrote: > > > > > > > Hi, Abhijeet, > > > > > > > > Thanks for the reply. > > > > > > > > 1. I am wondering if we could achieve the same result by just > lowering > > > > local.retention.ms and local.retention.bytes. This also allows the > > newly > > > > started follower to build up the local data before serving the > consumer > > > > traffic. > > > > > > > > > > I am not sure I fully followed this. Do you mean we could lower the > > > local.retention (by size and time) > > > so that there is little data on the leader's local storage so that the > > > follower can quickly catch up with the leader? > > > > > > In that case, we will need to set small local retention across brokers > in > > > the cluster. It will have the undesired > > > effect where there will be increased remote log fetches for serving > > consume > > > requests, and this can cause > > > degradations. Also, this behaviour (of increased remote fetches) will > > > happen on all brokers at all times, whereas in > > > the KIP we are restricting the behavior only to the newly bootstrapped > > > brokers and only until the time it fully builds > > > the local logs as per retention defined at the cluster level. > > > (Deprioritization of the broker could help reduce the impact > > > even further) > > > > > > > > > > > > > > 2. Have you updated the KIP? > > > > > > > > > > The KIP has been updated now. > > > > > > > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana < > > satish.dugg...@gmail.com> > > > > wrote: > > > > > > > > > +1 to Jun for adding the consumer fetching from a follower scenario > > > > > also to the existing section that talked about the drawback when a > > > > > node built with last-tiered-offset has become a leader. As Abhijeet > > > > > mentioned, we plan to have a follow-up KIP that will address these > by > > > > > having a deprioritzation of these brokers. The deprioritization of > > > > > those brokers can be removed once they catchup until the local log > > > > > retention. > > > > > > > > > > Thanks, > > > > > Satish. > > > > > > > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen wrote: > > > > > > > > > > > > Hi Abhijeet, > > > > > > > > > > > > Thanks for the KIP to improve the tiered storage feature! > > > > > > > > > > > > Questions: > > > > > > 1. We could also get the "pending-upload-offset" and epoch via > > remote > > > > log > > > > > > metadata, instead of adding a new API to fetch from the leader. > > Could > > > > you > > > > > > explain why you choose the later approach, instead of the former? > > > > > > 2. > > > > > > > We plan to have a follow-up KIP that will address both the > > > > > > deprioritization > > > > > > of these brokers from leadership, as well as > > > > > > for consumption (when fetching from followers is allowed). > > > > > > > > > > > > I agree with Jun that we might need to make it clear all possible > > > > > drawbacks > > > > > > that could have. So, could we add the drawbacks that Jun > mentioned > > > > about > > > > > > the performance issue when consumer fetch from follower? > > > > > > > > > > > > 3. Could we add "Rejected Alternatives" section to the end of the > > KIP > > > > to > > > > > > add some of them? > > > > > > Like the "ListOffsetRequest" approach VS > > > > "Earliest-Pending-Upload-Offset" > > > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from > > remote > > > > log > > > > > > metadata... etc. > > > > > > > > > > > > Thanks. > > > > > > Luke > > > > > > > > > > > > > > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar < > > > > > abhijeet.cse@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi Christo, > > > > > > > > > > > > > > Please find my comments inline. > > > > > > > > > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov < > > > > christolo...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hello Abhijeet and Jun, > > > > > > > > > > > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > > > > > > > > > > > re: Jun > > > > > > > > > > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new > > timestamps > > > - > > > > in > > > > > > > > retrospect it should have been fairly obvious. I would need > to > > go > > > > an > > > > > > > update > > > > > > > >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi, Abhijeet, Thanks for the update. I have no more comments. Luke On Thu, Apr 25, 2024 at 4:21 AM Jun Rao wrote: > Hi, Abhijeet, > > Thanks for the updated KIP. It looks good to me. > > Jun > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar < > abhijeet.cse@gmail.com> > wrote: > > > Hi Jun, > > > > Please find my comments inline. > > > > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao > wrote: > > > > > Hi, Abhijeet, > > > > > > Thanks for the reply. > > > > > > 1. I am wondering if we could achieve the same result by just lowering > > > local.retention.ms and local.retention.bytes. This also allows the > newly > > > started follower to build up the local data before serving the consumer > > > traffic. > > > > > > > I am not sure I fully followed this. Do you mean we could lower the > > local.retention (by size and time) > > so that there is little data on the leader's local storage so that the > > follower can quickly catch up with the leader? > > > > In that case, we will need to set small local retention across brokers in > > the cluster. It will have the undesired > > effect where there will be increased remote log fetches for serving > consume > > requests, and this can cause > > degradations. Also, this behaviour (of increased remote fetches) will > > happen on all brokers at all times, whereas in > > the KIP we are restricting the behavior only to the newly bootstrapped > > brokers and only until the time it fully builds > > the local logs as per retention defined at the cluster level. > > (Deprioritization of the broker could help reduce the impact > > even further) > > > > > > > > > > 2. Have you updated the KIP? > > > > > > > The KIP has been updated now. > > > > > > > > > > Thanks, > > > > > > Jun > > > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana < > satish.dugg...@gmail.com> > > > wrote: > > > > > > > +1 to Jun for adding the consumer fetching from a follower scenario > > > > also to the existing section that talked about the drawback when a > > > > node built with last-tiered-offset has become a leader. As Abhijeet > > > > mentioned, we plan to have a follow-up KIP that will address these by > > > > having a deprioritzation of these brokers. The deprioritization of > > > > those brokers can be removed once they catchup until the local log > > > > retention. > > > > > > > > Thanks, > > > > Satish. > > > > > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen wrote: > > > > > > > > > > Hi Abhijeet, > > > > > > > > > > Thanks for the KIP to improve the tiered storage feature! > > > > > > > > > > Questions: > > > > > 1. We could also get the "pending-upload-offset" and epoch via > remote > > > log > > > > > metadata, instead of adding a new API to fetch from the leader. > Could > > > you > > > > > explain why you choose the later approach, instead of the former? > > > > > 2. > > > > > > We plan to have a follow-up KIP that will address both the > > > > > deprioritization > > > > > of these brokers from leadership, as well as > > > > > for consumption (when fetching from followers is allowed). > > > > > > > > > > I agree with Jun that we might need to make it clear all possible > > > > drawbacks > > > > > that could have. So, could we add the drawbacks that Jun mentioned > > > about > > > > > the performance issue when consumer fetch from follower? > > > > > > > > > > 3. Could we add "Rejected Alternatives" section to the end of the > KIP > > > to > > > > > add some of them? > > > > > Like the "ListOffsetRequest" approach VS > > > "Earliest-Pending-Upload-Offset" > > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from > remote > > > log > > > > > metadata... etc. > > > > > > > > > > Thanks. > > > > > Luke > > > > > > > > > > > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar < > > > > abhijeet.cse@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi Christo, > > > > > > > > > > > > Please find my comments inline. > > > > > > > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov < > > > christolo...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hello Abhijeet and Jun, > > > > > > > > > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > > > > > > > > > re: Jun > > > > > > > > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new > timestamps > > - > > > in > > > > > > > retrospect it should have been fairly obvious. I would need to > go > > > an > > > > > > update > > > > > > > KIP-1005 myself then, thank you for giving the useful > reference! > > > > > > > > > > > > > > 4. I think Abhijeet wants to rebuild state from > > > latest-tiered-offset > > > > and > > > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or > > > > replicas > > > > > > > which experienced a disk failure) to decrease the time a > > partition > > > > spends > > > > > > > in under-replicated state. In other words, a follower which has > > > just > > > > > > fallen > > > > > > > out of ISR, but has local data will
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi, Abhijeet, Thanks for the updated KIP. It looks good to me. Jun On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar wrote: > Hi Jun, > > Please find my comments inline. > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao wrote: > > > Hi, Abhijeet, > > > > Thanks for the reply. > > > > 1. I am wondering if we could achieve the same result by just lowering > > local.retention.ms and local.retention.bytes. This also allows the newly > > started follower to build up the local data before serving the consumer > > traffic. > > > > I am not sure I fully followed this. Do you mean we could lower the > local.retention (by size and time) > so that there is little data on the leader's local storage so that the > follower can quickly catch up with the leader? > > In that case, we will need to set small local retention across brokers in > the cluster. It will have the undesired > effect where there will be increased remote log fetches for serving consume > requests, and this can cause > degradations. Also, this behaviour (of increased remote fetches) will > happen on all brokers at all times, whereas in > the KIP we are restricting the behavior only to the newly bootstrapped > brokers and only until the time it fully builds > the local logs as per retention defined at the cluster level. > (Deprioritization of the broker could help reduce the impact > even further) > > > > > > 2. Have you updated the KIP? > > > > The KIP has been updated now. > > > > > > Thanks, > > > > Jun > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana > > wrote: > > > > > +1 to Jun for adding the consumer fetching from a follower scenario > > > also to the existing section that talked about the drawback when a > > > node built with last-tiered-offset has become a leader. As Abhijeet > > > mentioned, we plan to have a follow-up KIP that will address these by > > > having a deprioritzation of these brokers. The deprioritization of > > > those brokers can be removed once they catchup until the local log > > > retention. > > > > > > Thanks, > > > Satish. > > > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen wrote: > > > > > > > > Hi Abhijeet, > > > > > > > > Thanks for the KIP to improve the tiered storage feature! > > > > > > > > Questions: > > > > 1. We could also get the "pending-upload-offset" and epoch via remote > > log > > > > metadata, instead of adding a new API to fetch from the leader. Could > > you > > > > explain why you choose the later approach, instead of the former? > > > > 2. > > > > > We plan to have a follow-up KIP that will address both the > > > > deprioritization > > > > of these brokers from leadership, as well as > > > > for consumption (when fetching from followers is allowed). > > > > > > > > I agree with Jun that we might need to make it clear all possible > > > drawbacks > > > > that could have. So, could we add the drawbacks that Jun mentioned > > about > > > > the performance issue when consumer fetch from follower? > > > > > > > > 3. Could we add "Rejected Alternatives" section to the end of the KIP > > to > > > > add some of them? > > > > Like the "ListOffsetRequest" approach VS > > "Earliest-Pending-Upload-Offset" > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote > > log > > > > metadata... etc. > > > > > > > > Thanks. > > > > Luke > > > > > > > > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar < > > > abhijeet.cse@gmail.com> > > > > wrote: > > > > > > > > > Hi Christo, > > > > > > > > > > Please find my comments inline. > > > > > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov < > > christolo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hello Abhijeet and Jun, > > > > > > > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > > > > > > > re: Jun > > > > > > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps > - > > in > > > > > > retrospect it should have been fairly obvious. I would need to go > > an > > > > > update > > > > > > KIP-1005 myself then, thank you for giving the useful reference! > > > > > > > > > > > > 4. I think Abhijeet wants to rebuild state from > > latest-tiered-offset > > > and > > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or > > > replicas > > > > > > which experienced a disk failure) to decrease the time a > partition > > > spends > > > > > > in under-replicated state. In other words, a follower which has > > just > > > > > fallen > > > > > > out of ISR, but has local data will continue using today's Tiered > > > Storage > > > > > > replication protocol (i.e. fetching from earliest-local). I > further > > > > > believe > > > > > > he has taken this approach so that local state of replicas which > > have > > > > > just > > > > > > fallen out of ISR isn't forcefully wiped thus leading to > situation > > 1. > > > > > > Abhijeet, have I understood (and summarised) what you are > proposing > > > > > > correctly? > > > > > > > > > > > > Yes, your understanding is
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi Jun, Please find my comments inline. On Thu, Apr 18, 2024 at 3:26 AM Jun Rao wrote: > Hi, Abhijeet, > > Thanks for the reply. > > 1. I am wondering if we could achieve the same result by just lowering > local.retention.ms and local.retention.bytes. This also allows the newly > started follower to build up the local data before serving the consumer > traffic. > I am not sure I fully followed this. Do you mean we could lower the local.retention (by size and time) so that there is little data on the leader's local storage so that the follower can quickly catch up with the leader? In that case, we will need to set small local retention across brokers in the cluster. It will have the undesired effect where there will be increased remote log fetches for serving consume requests, and this can cause degradations. Also, this behaviour (of increased remote fetches) will happen on all brokers at all times, whereas in the KIP we are restricting the behavior only to the newly bootstrapped brokers and only until the time it fully builds the local logs as per retention defined at the cluster level. (Deprioritization of the broker could help reduce the impact even further) > > 2. Have you updated the KIP? > The KIP has been updated now. > > Thanks, > > Jun > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana > wrote: > > > +1 to Jun for adding the consumer fetching from a follower scenario > > also to the existing section that talked about the drawback when a > > node built with last-tiered-offset has become a leader. As Abhijeet > > mentioned, we plan to have a follow-up KIP that will address these by > > having a deprioritzation of these brokers. The deprioritization of > > those brokers can be removed once they catchup until the local log > > retention. > > > > Thanks, > > Satish. > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen wrote: > > > > > > Hi Abhijeet, > > > > > > Thanks for the KIP to improve the tiered storage feature! > > > > > > Questions: > > > 1. We could also get the "pending-upload-offset" and epoch via remote > log > > > metadata, instead of adding a new API to fetch from the leader. Could > you > > > explain why you choose the later approach, instead of the former? > > > 2. > > > > We plan to have a follow-up KIP that will address both the > > > deprioritization > > > of these brokers from leadership, as well as > > > for consumption (when fetching from followers is allowed). > > > > > > I agree with Jun that we might need to make it clear all possible > > drawbacks > > > that could have. So, could we add the drawbacks that Jun mentioned > about > > > the performance issue when consumer fetch from follower? > > > > > > 3. Could we add "Rejected Alternatives" section to the end of the KIP > to > > > add some of them? > > > Like the "ListOffsetRequest" approach VS > "Earliest-Pending-Upload-Offset" > > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote > log > > > metadata... etc. > > > > > > Thanks. > > > Luke > > > > > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar < > > abhijeet.cse@gmail.com> > > > wrote: > > > > > > > Hi Christo, > > > > > > > > Please find my comments inline. > > > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov < > christolo...@gmail.com> > > > > wrote: > > > > > > > > > Hello Abhijeet and Jun, > > > > > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > > > > > re: Jun > > > > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - > in > > > > > retrospect it should have been fairly obvious. I would need to go > an > > > > update > > > > > KIP-1005 myself then, thank you for giving the useful reference! > > > > > > > > > > 4. I think Abhijeet wants to rebuild state from > latest-tiered-offset > > and > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or > > replicas > > > > > which experienced a disk failure) to decrease the time a partition > > spends > > > > > in under-replicated state. In other words, a follower which has > just > > > > fallen > > > > > out of ISR, but has local data will continue using today's Tiered > > Storage > > > > > replication protocol (i.e. fetching from earliest-local). I further > > > > believe > > > > > he has taken this approach so that local state of replicas which > have > > > > just > > > > > fallen out of ISR isn't forcefully wiped thus leading to situation > 1. > > > > > Abhijeet, have I understood (and summarised) what you are proposing > > > > > correctly? > > > > > > > > > > Yes, your understanding is correct. We want to limit the behavior > > changes > > > > only to new replicas. > > > > > > > > > > > > > 5. I think in today's Tiered Storage we know the leader's > > > > log-start-offset > > > > > from the FetchResponse and we can learn its local-log-start-offset > > from > > > > the > > > > > ListOffsets by asking for earliest-local timestamp (-4). But > granted, > > > > this > > > > > ought to be added as an
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi Luke, Thanks for your comments. Please find my responses inline. On Tue, Apr 9, 2024 at 2:08 PM Luke Chen wrote: > Hi Abhijeet, > > Thanks for the KIP to improve the tiered storage feature! > > Questions: > 1. We could also get the "pending-upload-offset" and epoch via remote log > metadata, instead of adding a new API to fetch from the leader. Could you > explain why you choose the later approach, instead of the former? > The remote log metadata could be tracking overlapping log segments. The maximum offset across the log segments it may be tracking, may not be the last-tiered-offset because of truncations during unclean leader election. Remote Log metadata alone is not sufficient to identify last-tiered-offset or pending-upload-offset. Only the leader knows the correct lineage of offsets that is required to identify the "pending-upload-offset". That is the reason we chose to add a new API to fetch this information from the leader. 2. > > We plan to have a follow-up KIP that will address both the > deprioritization > of these brokers from leadership, as well as > for consumption (when fetching from followers is allowed). > > I agree with Jun that we might need to make it clear all possible drawbacks > that could have. So, could we add the drawbacks that Jun mentioned about > the performance issue when consumer fetch from follower? > > Updated the KIP to call this out. > 3. Could we add "Rejected Alternatives" section to the end of the KIP to > add some of them? > Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset" > approach, or getting the "Earliest-Pending-Upload-Offset" from remote log > metadata... etc. > > Added the section on Rejected Alternatives > Thanks. > Luke > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar > wrote: > > > Hi Christo, > > > > Please find my comments inline. > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov > > wrote: > > > > > Hello Abhijeet and Jun, > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > re: Jun > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in > > > retrospect it should have been fairly obvious. I would need to go an > > update > > > KIP-1005 myself then, thank you for giving the useful reference! > > > > > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset > and > > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas > > > which experienced a disk failure) to decrease the time a partition > spends > > > in under-replicated state. In other words, a follower which has just > > fallen > > > out of ISR, but has local data will continue using today's Tiered > Storage > > > replication protocol (i.e. fetching from earliest-local). I further > > believe > > > he has taken this approach so that local state of replicas which have > > just > > > fallen out of ISR isn't forcefully wiped thus leading to situation 1. > > > Abhijeet, have I understood (and summarised) what you are proposing > > > correctly? > > > > > > Yes, your understanding is correct. We want to limit the behavior > changes > > only to new replicas. > > > > > > > 5. I think in today's Tiered Storage we know the leader's > > log-start-offset > > > from the FetchResponse and we can learn its local-log-start-offset from > > the > > > ListOffsets by asking for earliest-local timestamp (-4). But granted, > > this > > > ought to be added as an additional API call in the KIP. > > > > > > > > Yes, I clarified this in my reply to Jun. I will add this missing detail > in > > the KIP. > > > > > > > re: Abhijeet > > > > > > 101. I am still a bit confused as to why you want to include a new > offset > > > (i.e. pending-upload-offset) when you yourself mention that you could > use > > > an already existing offset (i.e. last-tiered-offset + 1). In essence, > you > > > end your Motivation with "In this KIP, we will focus only on the > follower > > > fetch protocol using the *last-tiered-offset*" and then in the > following > > > sections you talk about pending-upload-offset. I understand this might > be > > > classified as an implementation detail, but if you introduce a new > offset > > > (i.e. pending-upload-offset) you have to make a change to the > ListOffsets > > > API (i.e. introduce -6) and thus document it in this KIP as such. > > However, > > > the last-tiered-offset ought to already be exposed as part of KIP-1005 > > > (under implementation). Am I misunderstanding something here? > > > > > > > I have tried to clarify this in my reply to Jun. > > > > > The follower needs to build the local data starting from the offset > > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the > > > corresponding leader-epoch. > > > There are two ways to do this: > > >1. We add support in ListOffsetRequest to be able to fetch this > offset > > > (and leader epoch) from the leader. > > >2. Or, fetch the tiered-offset (which is already supported). From > this >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi, Abhijeet, Thanks for the reply. 1. I am wondering if we could achieve the same result by just lowering local.retention.ms and local.retention.bytes. This also allows the newly started follower to build up the local data before serving the consumer traffic. 2. Have you updated the KIP? Thanks, Jun On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana wrote: > +1 to Jun for adding the consumer fetching from a follower scenario > also to the existing section that talked about the drawback when a > node built with last-tiered-offset has become a leader. As Abhijeet > mentioned, we plan to have a follow-up KIP that will address these by > having a deprioritzation of these brokers. The deprioritization of > those brokers can be removed once they catchup until the local log > retention. > > Thanks, > Satish. > > On Tue, 9 Apr 2024 at 14:08, Luke Chen wrote: > > > > Hi Abhijeet, > > > > Thanks for the KIP to improve the tiered storage feature! > > > > Questions: > > 1. We could also get the "pending-upload-offset" and epoch via remote log > > metadata, instead of adding a new API to fetch from the leader. Could you > > explain why you choose the later approach, instead of the former? > > 2. > > > We plan to have a follow-up KIP that will address both the > > deprioritization > > of these brokers from leadership, as well as > > for consumption (when fetching from followers is allowed). > > > > I agree with Jun that we might need to make it clear all possible > drawbacks > > that could have. So, could we add the drawbacks that Jun mentioned about > > the performance issue when consumer fetch from follower? > > > > 3. Could we add "Rejected Alternatives" section to the end of the KIP to > > add some of them? > > Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset" > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote log > > metadata... etc. > > > > Thanks. > > Luke > > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar < > abhijeet.cse@gmail.com> > > wrote: > > > > > Hi Christo, > > > > > > Please find my comments inline. > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov > > > wrote: > > > > > > > Hello Abhijeet and Jun, > > > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > > > re: Jun > > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in > > > > retrospect it should have been fairly obvious. I would need to go an > > > update > > > > KIP-1005 myself then, thank you for giving the useful reference! > > > > > > > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset > and > > > > fetch from latest-tiered-offset + 1 only for new replicas (or > replicas > > > > which experienced a disk failure) to decrease the time a partition > spends > > > > in under-replicated state. In other words, a follower which has just > > > fallen > > > > out of ISR, but has local data will continue using today's Tiered > Storage > > > > replication protocol (i.e. fetching from earliest-local). I further > > > believe > > > > he has taken this approach so that local state of replicas which have > > > just > > > > fallen out of ISR isn't forcefully wiped thus leading to situation 1. > > > > Abhijeet, have I understood (and summarised) what you are proposing > > > > correctly? > > > > > > > > Yes, your understanding is correct. We want to limit the behavior > changes > > > only to new replicas. > > > > > > > > > > 5. I think in today's Tiered Storage we know the leader's > > > log-start-offset > > > > from the FetchResponse and we can learn its local-log-start-offset > from > > > the > > > > ListOffsets by asking for earliest-local timestamp (-4). But granted, > > > this > > > > ought to be added as an additional API call in the KIP. > > > > > > > > > > > Yes, I clarified this in my reply to Jun. I will add this missing > detail in > > > the KIP. > > > > > > > > > > re: Abhijeet > > > > > > > > 101. I am still a bit confused as to why you want to include a new > offset > > > > (i.e. pending-upload-offset) when you yourself mention that you > could use > > > > an already existing offset (i.e. last-tiered-offset + 1). In > essence, you > > > > end your Motivation with "In this KIP, we will focus only on the > follower > > > > fetch protocol using the *last-tiered-offset*" and then in the > following > > > > sections you talk about pending-upload-offset. I understand this > might be > > > > classified as an implementation detail, but if you introduce a new > offset > > > > (i.e. pending-upload-offset) you have to make a change to the > ListOffsets > > > > API (i.e. introduce -6) and thus document it in this KIP as such. > > > However, > > > > the last-tiered-offset ought to already be exposed as part of > KIP-1005 > > > > (under implementation). Am I misunderstanding something here? > > > > > > > > > > I have tried to clarify this in my reply to Jun. > > > > > > > The follower needs to build the local
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
+1 to Jun for adding the consumer fetching from a follower scenario also to the existing section that talked about the drawback when a node built with last-tiered-offset has become a leader. As Abhijeet mentioned, we plan to have a follow-up KIP that will address these by having a deprioritzation of these brokers. The deprioritization of those brokers can be removed once they catchup until the local log retention. Thanks, Satish. On Tue, 9 Apr 2024 at 14:08, Luke Chen wrote: > > Hi Abhijeet, > > Thanks for the KIP to improve the tiered storage feature! > > Questions: > 1. We could also get the "pending-upload-offset" and epoch via remote log > metadata, instead of adding a new API to fetch from the leader. Could you > explain why you choose the later approach, instead of the former? > 2. > > We plan to have a follow-up KIP that will address both the > deprioritization > of these brokers from leadership, as well as > for consumption (when fetching from followers is allowed). > > I agree with Jun that we might need to make it clear all possible drawbacks > that could have. So, could we add the drawbacks that Jun mentioned about > the performance issue when consumer fetch from follower? > > 3. Could we add "Rejected Alternatives" section to the end of the KIP to > add some of them? > Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset" > approach, or getting the "Earliest-Pending-Upload-Offset" from remote log > metadata... etc. > > Thanks. > Luke > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar > wrote: > > > Hi Christo, > > > > Please find my comments inline. > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov > > wrote: > > > > > Hello Abhijeet and Jun, > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > re: Jun > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in > > > retrospect it should have been fairly obvious. I would need to go an > > update > > > KIP-1005 myself then, thank you for giving the useful reference! > > > > > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and > > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas > > > which experienced a disk failure) to decrease the time a partition spends > > > in under-replicated state. In other words, a follower which has just > > fallen > > > out of ISR, but has local data will continue using today's Tiered Storage > > > replication protocol (i.e. fetching from earliest-local). I further > > believe > > > he has taken this approach so that local state of replicas which have > > just > > > fallen out of ISR isn't forcefully wiped thus leading to situation 1. > > > Abhijeet, have I understood (and summarised) what you are proposing > > > correctly? > > > > > > Yes, your understanding is correct. We want to limit the behavior changes > > only to new replicas. > > > > > > > 5. I think in today's Tiered Storage we know the leader's > > log-start-offset > > > from the FetchResponse and we can learn its local-log-start-offset from > > the > > > ListOffsets by asking for earliest-local timestamp (-4). But granted, > > this > > > ought to be added as an additional API call in the KIP. > > > > > > > > Yes, I clarified this in my reply to Jun. I will add this missing detail in > > the KIP. > > > > > > > re: Abhijeet > > > > > > 101. I am still a bit confused as to why you want to include a new offset > > > (i.e. pending-upload-offset) when you yourself mention that you could use > > > an already existing offset (i.e. last-tiered-offset + 1). In essence, you > > > end your Motivation with "In this KIP, we will focus only on the follower > > > fetch protocol using the *last-tiered-offset*" and then in the following > > > sections you talk about pending-upload-offset. I understand this might be > > > classified as an implementation detail, but if you introduce a new offset > > > (i.e. pending-upload-offset) you have to make a change to the ListOffsets > > > API (i.e. introduce -6) and thus document it in this KIP as such. > > However, > > > the last-tiered-offset ought to already be exposed as part of KIP-1005 > > > (under implementation). Am I misunderstanding something here? > > > > > > > I have tried to clarify this in my reply to Jun. > > > > > The follower needs to build the local data starting from the offset > > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the > > > corresponding leader-epoch. > > > There are two ways to do this: > > >1. We add support in ListOffsetRequest to be able to fetch this offset > > > (and leader epoch) from the leader. > > >2. Or, fetch the tiered-offset (which is already supported). From this > > > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 > > to > > > the tiered-offset. > > > However, we still need the leader epoch for offset, since there is > > > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset > > will >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi Abhijeet, Thanks for the KIP to improve the tiered storage feature! Questions: 1. We could also get the "pending-upload-offset" and epoch via remote log metadata, instead of adding a new API to fetch from the leader. Could you explain why you choose the later approach, instead of the former? 2. > We plan to have a follow-up KIP that will address both the deprioritization of these brokers from leadership, as well as for consumption (when fetching from followers is allowed). I agree with Jun that we might need to make it clear all possible drawbacks that could have. So, could we add the drawbacks that Jun mentioned about the performance issue when consumer fetch from follower? 3. Could we add "Rejected Alternatives" section to the end of the KIP to add some of them? Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset" approach, or getting the "Earliest-Pending-Upload-Offset" from remote log metadata... etc. Thanks. Luke On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar wrote: > Hi Christo, > > Please find my comments inline. > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov > wrote: > > > Hello Abhijeet and Jun, > > > > I have been mulling this KIP over a bit more in recent days! > > > > re: Jun > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in > > retrospect it should have been fairly obvious. I would need to go an > update > > KIP-1005 myself then, thank you for giving the useful reference! > > > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas > > which experienced a disk failure) to decrease the time a partition spends > > in under-replicated state. In other words, a follower which has just > fallen > > out of ISR, but has local data will continue using today's Tiered Storage > > replication protocol (i.e. fetching from earliest-local). I further > believe > > he has taken this approach so that local state of replicas which have > just > > fallen out of ISR isn't forcefully wiped thus leading to situation 1. > > Abhijeet, have I understood (and summarised) what you are proposing > > correctly? > > > > Yes, your understanding is correct. We want to limit the behavior changes > only to new replicas. > > > > 5. I think in today's Tiered Storage we know the leader's > log-start-offset > > from the FetchResponse and we can learn its local-log-start-offset from > the > > ListOffsets by asking for earliest-local timestamp (-4). But granted, > this > > ought to be added as an additional API call in the KIP. > > > > > Yes, I clarified this in my reply to Jun. I will add this missing detail in > the KIP. > > > > re: Abhijeet > > > > 101. I am still a bit confused as to why you want to include a new offset > > (i.e. pending-upload-offset) when you yourself mention that you could use > > an already existing offset (i.e. last-tiered-offset + 1). In essence, you > > end your Motivation with "In this KIP, we will focus only on the follower > > fetch protocol using the *last-tiered-offset*" and then in the following > > sections you talk about pending-upload-offset. I understand this might be > > classified as an implementation detail, but if you introduce a new offset > > (i.e. pending-upload-offset) you have to make a change to the ListOffsets > > API (i.e. introduce -6) and thus document it in this KIP as such. > However, > > the last-tiered-offset ought to already be exposed as part of KIP-1005 > > (under implementation). Am I misunderstanding something here? > > > > I have tried to clarify this in my reply to Jun. > > > The follower needs to build the local data starting from the offset > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the > > corresponding leader-epoch. > > There are two ways to do this: > >1. We add support in ListOffsetRequest to be able to fetch this offset > > (and leader epoch) from the leader. > >2. Or, fetch the tiered-offset (which is already supported). From this > > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 > to > > the tiered-offset. > > However, we still need the leader epoch for offset, since there is > > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset > will > > be the same as the > > leader epoch for tiered-offset. We may need another API call to the > > leader for this. > > I prefer the first approach. The only problem with the first approach is > > that it introduces one more offset. The second approach avoids this > problem > > but is a little complicated. > > > > > > > Best, > > Christo > > > > On Thu, 4 Apr 2024 at 19:37, Jun Rao wrote: > > > > > Hi, Abhijeet, > > > > > > Thanks for the KIP. Left a few comments. > > > > > > 1. "A drawback of using the last-tiered-offset is that this new > follower > > > would possess only a limited number of locally stored segments. Should > it > > > ascend to the role of leader, there is a risk of
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi Christo, Please find my comments inline. On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov wrote: > Hello Abhijeet and Jun, > > I have been mulling this KIP over a bit more in recent days! > > re: Jun > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in > retrospect it should have been fairly obvious. I would need to go an update > KIP-1005 myself then, thank you for giving the useful reference! > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and > fetch from latest-tiered-offset + 1 only for new replicas (or replicas > which experienced a disk failure) to decrease the time a partition spends > in under-replicated state. In other words, a follower which has just fallen > out of ISR, but has local data will continue using today's Tiered Storage > replication protocol (i.e. fetching from earliest-local). I further believe > he has taken this approach so that local state of replicas which have just > fallen out of ISR isn't forcefully wiped thus leading to situation 1. > Abhijeet, have I understood (and summarised) what you are proposing > correctly? > > Yes, your understanding is correct. We want to limit the behavior changes only to new replicas. > 5. I think in today's Tiered Storage we know the leader's log-start-offset > from the FetchResponse and we can learn its local-log-start-offset from the > ListOffsets by asking for earliest-local timestamp (-4). But granted, this > ought to be added as an additional API call in the KIP. > > Yes, I clarified this in my reply to Jun. I will add this missing detail in the KIP. > re: Abhijeet > > 101. I am still a bit confused as to why you want to include a new offset > (i.e. pending-upload-offset) when you yourself mention that you could use > an already existing offset (i.e. last-tiered-offset + 1). In essence, you > end your Motivation with "In this KIP, we will focus only on the follower > fetch protocol using the *last-tiered-offset*" and then in the following > sections you talk about pending-upload-offset. I understand this might be > classified as an implementation detail, but if you introduce a new offset > (i.e. pending-upload-offset) you have to make a change to the ListOffsets > API (i.e. introduce -6) and thus document it in this KIP as such. However, > the last-tiered-offset ought to already be exposed as part of KIP-1005 > (under implementation). Am I misunderstanding something here? > I have tried to clarify this in my reply to Jun. > The follower needs to build the local data starting from the offset > Earliest-Pending-Upload-Offset. Hence it needs the offset and the > corresponding leader-epoch. > There are two ways to do this: >1. We add support in ListOffsetRequest to be able to fetch this offset > (and leader epoch) from the leader. >2. Or, fetch the tiered-offset (which is already supported). From this > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to > the tiered-offset. > However, we still need the leader epoch for offset, since there is > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset will > be the same as the > leader epoch for tiered-offset. We may need another API call to the > leader for this. > I prefer the first approach. The only problem with the first approach is > that it introduces one more offset. The second approach avoids this problem > but is a little complicated. > > Best, > Christo > > On Thu, 4 Apr 2024 at 19:37, Jun Rao wrote: > > > Hi, Abhijeet, > > > > Thanks for the KIP. Left a few comments. > > > > 1. "A drawback of using the last-tiered-offset is that this new follower > > would possess only a limited number of locally stored segments. Should it > > ascend to the role of leader, there is a risk of needing to fetch these > > segments from the remote storage, potentially impacting broker > > performance." > > Since we support consumers fetching from followers, this is a potential > > issue on the follower side too. In theory, it's possible for a segment to > > be tiered immediately after rolling. In that case, there could be very > > little data after last-tiered-offset. It would be useful to think through > > how to address this issue. > > > > 2. ListOffsetsRequest: > > 2.1 Typically, we need to bump up the version of the request if we add a > > new value for timestamp. See > > > > > https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633 > > . > > 2.2 Since this changes the inter broker request protocol, it would be > > useful to have a section on upgrade (e.g. new IBP/metadata.version). > > > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch > the > > last-tiered-offset from the leader, and make a separate leader call to > > fetch leader epoch for the following offset." > > Why do we need to make a separate call for the leader epoch? > > ListOffsetsResponse include both the offset and the corresponding
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi Jun, Thank you for taking the time to review the KIP. Please find my comments inline. On Fri, Apr 5, 2024 at 12:09 AM Jun Rao wrote: > Hi, Abhijeet, > > Thanks for the KIP. Left a few comments. > > 1. "A drawback of using the last-tiered-offset is that this new follower > would possess only a limited number of locally stored segments. Should it > ascend to the role of leader, there is a risk of needing to fetch these > segments from the remote storage, potentially impacting broker > performance." > Since we support consumers fetching from followers, this is a potential > issue on the follower side too. In theory, it's possible for a segment to > be tiered immediately after rolling. In that case, there could be very > little data after last-tiered-offset. It would be useful to think through > how to address this issue. > We plan to have a follow-up KIP that will address both the deprioritization of these brokers from leadership, as well as for consumption (when fetching from followers is allowed). > > 2. ListOffsetsRequest: > 2.1 Typically, we need to bump up the version of the request if we add a > new value for timestamp. See > > https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633 > . > Yes, let me update the KIP to include this change. We will need a new timestamp corresponding to Earliest-Pending-Upload-Offset. > 2.2 Since this changes the inter broker request protocol, it would be > useful to have a section on upgrade (e.g. new IBP/metadata.version). > > Make sense. I will update the KIP to capture this. > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the > last-tiered-offset from the leader, and make a separate leader call to > fetch leader epoch for the following offset." > Why do we need to make a separate call for the leader epoch? > ListOffsetsResponse include both the offset and the corresponding epoch. > > I understand there is some confusion here. Let me try to explain this. The follower needs to build the local data starting from the offset Earliest-Pending-Upload-Offset. Hence it needs the offset and the corresponding leader-epoch. There are two ways to do this: 1. We add support in ListOffsetRequest to be able to fetch this offset (and leader epoch) from the leader. 2. Or, fetch the tiered-offset (which is already supported). From this offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to the tiered-offset. However, we still need the leader epoch for offset, since there is no guarantee that the leader epoch for Earliest-Pending-Upload-Offset will be the same as the leader epoch for tiered-offset. We may need another API call to the leader for this. I prefer the first approach. The only problem with the first approach is that it introduces one more offset. The second approach avoids this problem but is a little complicated. > 4. "Check if the follower replica is empty and if the feature to use > last-tiered-offset is enabled." > Why do we need to check if the follower replica is empty? > > We want to limit this new behavior only to new replicas. Replicas that become out of ISR are excluded from this behavior change. Those will continue with the existing behavior. > 5. It can be confirmed by checking if the leader's Log-Start-Offset is the > same as the Leader's Local-Log-Start-Offset. > How does the follower know Local-Log-Start-Offset? > Missed this detail. The follower will need to call the leader APi to fetch the EarliestLocal offset for this. > Jun > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar > > wrote: > > > Hi Christo, > > > > Thanks for reviewing the KIP. > > > > The follower needs the earliest-pending-upload-offset (and the > > corresponding leader epoch) from the leader. > > This is the first offset the follower will have locally. > > > > Regards, > > Abhijeet. > > > > > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov > > wrote: > > > > > Heya! > > > > > > First of all, thank you very much for the proposal, you have explained > > the > > > problem you want solved very well - I think a faster bootstrap of an > > empty > > > replica is definitely an improvement! > > > > > > For my understanding, which concrete offset do you want the leader to > > give > > > back to a follower - earliest-pending-upload-offset or the > > > latest-tiered-offset? If it is the second, then I believe KIP-1005 > ought > > to > > > already be exposing that offset as part of the ListOffsets API, no? > > > > > > Best, > > > Christo > > > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar < > abhijeet.cse@gmail.com > > > > > > wrote: > > > > > > > Hi All, > > > > > > > > I have created KIP-1023 to introduce follower fetch from tiered > offset. > > > > This feature will be helpful in significantly reducing Kafka > > > > rebalance/rebuild times when the cluster is enabled with tiered > > storage. > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hello Abhijeet and Jun, I have been mulling this KIP over a bit more in recent days! re: Jun I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in retrospect it should have been fairly obvious. I would need to go an update KIP-1005 myself then, thank you for giving the useful reference! 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and fetch from latest-tiered-offset + 1 only for new replicas (or replicas which experienced a disk failure) to decrease the time a partition spends in under-replicated state. In other words, a follower which has just fallen out of ISR, but has local data will continue using today's Tiered Storage replication protocol (i.e. fetching from earliest-local). I further believe he has taken this approach so that local state of replicas which have just fallen out of ISR isn't forcefully wiped thus leading to situation 1. Abhijeet, have I understood (and summarised) what you are proposing correctly? 5. I think in today's Tiered Storage we know the leader's log-start-offset from the FetchResponse and we can learn its local-log-start-offset from the ListOffsets by asking for earliest-local timestamp (-4). But granted, this ought to be added as an additional API call in the KIP. re: Abhijeet 101. I am still a bit confused as to why you want to include a new offset (i.e. pending-upload-offset) when you yourself mention that you could use an already existing offset (i.e. last-tiered-offset + 1). In essence, you end your Motivation with "In this KIP, we will focus only on the follower fetch protocol using the *last-tiered-offset*" and then in the following sections you talk about pending-upload-offset. I understand this might be classified as an implementation detail, but if you introduce a new offset (i.e. pending-upload-offset) you have to make a change to the ListOffsets API (i.e. introduce -6) and thus document it in this KIP as such. However, the last-tiered-offset ought to already be exposed as part of KIP-1005 (under implementation). Am I misunderstanding something here? Best, Christo On Thu, 4 Apr 2024 at 19:37, Jun Rao wrote: > Hi, Abhijeet, > > Thanks for the KIP. Left a few comments. > > 1. "A drawback of using the last-tiered-offset is that this new follower > would possess only a limited number of locally stored segments. Should it > ascend to the role of leader, there is a risk of needing to fetch these > segments from the remote storage, potentially impacting broker > performance." > Since we support consumers fetching from followers, this is a potential > issue on the follower side too. In theory, it's possible for a segment to > be tiered immediately after rolling. In that case, there could be very > little data after last-tiered-offset. It would be useful to think through > how to address this issue. > > 2. ListOffsetsRequest: > 2.1 Typically, we need to bump up the version of the request if we add a > new value for timestamp. See > > https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633 > . > 2.2 Since this changes the inter broker request protocol, it would be > useful to have a section on upgrade (e.g. new IBP/metadata.version). > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the > last-tiered-offset from the leader, and make a separate leader call to > fetch leader epoch for the following offset." > Why do we need to make a separate call for the leader epoch? > ListOffsetsResponse include both the offset and the corresponding epoch. > > 4. "Check if the follower replica is empty and if the feature to use > last-tiered-offset is enabled." > Why do we need to check if the follower replica is empty? > > 5. It can be confirmed by checking if the leader's Log-Start-Offset is the > same as the Leader's Local-Log-Start-Offset. > How does the follower know Local-Log-Start-Offset? > > Jun > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar > > wrote: > > > Hi Christo, > > > > Thanks for reviewing the KIP. > > > > The follower needs the earliest-pending-upload-offset (and the > > corresponding leader epoch) from the leader. > > This is the first offset the follower will have locally. > > > > Regards, > > Abhijeet. > > > > > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov > > wrote: > > > > > Heya! > > > > > > First of all, thank you very much for the proposal, you have explained > > the > > > problem you want solved very well - I think a faster bootstrap of an > > empty > > > replica is definitely an improvement! > > > > > > For my understanding, which concrete offset do you want the leader to > > give > > > back to a follower - earliest-pending-upload-offset or the > > > latest-tiered-offset? If it is the second, then I believe KIP-1005 > ought > > to > > > already be exposing that offset as part of the ListOffsets API, no? > > > > > > Best, > > > Christo > > > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar < >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi, Abhijeet, Thanks for the KIP. Left a few comments. 1. "A drawback of using the last-tiered-offset is that this new follower would possess only a limited number of locally stored segments. Should it ascend to the role of leader, there is a risk of needing to fetch these segments from the remote storage, potentially impacting broker performance." Since we support consumers fetching from followers, this is a potential issue on the follower side too. In theory, it's possible for a segment to be tiered immediately after rolling. In that case, there could be very little data after last-tiered-offset. It would be useful to think through how to address this issue. 2. ListOffsetsRequest: 2.1 Typically, we need to bump up the version of the request if we add a new value for timestamp. See https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633 . 2.2 Since this changes the inter broker request protocol, it would be useful to have a section on upgrade (e.g. new IBP/metadata.version). 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the last-tiered-offset from the leader, and make a separate leader call to fetch leader epoch for the following offset." Why do we need to make a separate call for the leader epoch? ListOffsetsResponse include both the offset and the corresponding epoch. 4. "Check if the follower replica is empty and if the feature to use last-tiered-offset is enabled." Why do we need to check if the follower replica is empty? 5. It can be confirmed by checking if the leader's Log-Start-Offset is the same as the Leader's Local-Log-Start-Offset. How does the follower know Local-Log-Start-Offset? Jun On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar wrote: > Hi Christo, > > Thanks for reviewing the KIP. > > The follower needs the earliest-pending-upload-offset (and the > corresponding leader epoch) from the leader. > This is the first offset the follower will have locally. > > Regards, > Abhijeet. > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov > wrote: > > > Heya! > > > > First of all, thank you very much for the proposal, you have explained > the > > problem you want solved very well - I think a faster bootstrap of an > empty > > replica is definitely an improvement! > > > > For my understanding, which concrete offset do you want the leader to > give > > back to a follower - earliest-pending-upload-offset or the > > latest-tiered-offset? If it is the second, then I believe KIP-1005 ought > to > > already be exposing that offset as part of the ListOffsets API, no? > > > > Best, > > Christo > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar > > > wrote: > > > > > Hi All, > > > > > > I have created KIP-1023 to introduce follower fetch from tiered offset. > > > This feature will be helpful in significantly reducing Kafka > > > rebalance/rebuild times when the cluster is enabled with tiered > storage. > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset > > > > > > Feedback and suggestions are welcome. > > > > > > Regards, > > > Abhijeet. > > > > > >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi Christo, Thanks for reviewing the KIP. The follower needs the earliest-pending-upload-offset (and the corresponding leader epoch) from the leader. This is the first offset the follower will have locally. Regards, Abhijeet. On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov wrote: > Heya! > > First of all, thank you very much for the proposal, you have explained the > problem you want solved very well - I think a faster bootstrap of an empty > replica is definitely an improvement! > > For my understanding, which concrete offset do you want the leader to give > back to a follower - earliest-pending-upload-offset or the > latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to > already be exposing that offset as part of the ListOffsets API, no? > > Best, > Christo > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar > wrote: > > > Hi All, > > > > I have created KIP-1023 to introduce follower fetch from tiered offset. > > This feature will be helpful in significantly reducing Kafka > > rebalance/rebuild times when the cluster is enabled with tiered storage. > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset > > > > Feedback and suggestions are welcome. > > > > Regards, > > Abhijeet. > > >
Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset
Heya! First of all, thank you very much for the proposal, you have explained the problem you want solved very well - I think a faster bootstrap of an empty replica is definitely an improvement! For my understanding, which concrete offset do you want the leader to give back to a follower - earliest-pending-upload-offset or the latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to already be exposing that offset as part of the ListOffsets API, no? Best, Christo On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar wrote: > Hi All, > > I have created KIP-1023 to introduce follower fetch from tiered offset. > This feature will be helpful in significantly reducing Kafka > rebalance/rebuild times when the cluster is enabled with tiered storage. > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset > > Feedback and suggestions are welcome. > > Regards, > Abhijeet. >
[DISCUSS] KIP-1023: Follower fetch from tiered offset
Hi All, I have created KIP-1023 to introduce follower fetch from tiered offset. This feature will be helpful in significantly reducing Kafka rebalance/rebuild times when the cluster is enabled with tiered storage. https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset Feedback and suggestions are welcome. Regards, Abhijeet.