[jira] [Resolved] (KAFKA-15661) KIP-951: Server side and protocol changes

2023-11-27 Thread Crispin Bernier (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Crispin Bernier resolved KAFKA-15661.
-
Resolution: Resolved

> KIP-951: Server side and protocol changes
> -
>
> Key: KAFKA-15661
> URL: https://issues.apache.org/jira/browse/KAFKA-15661
> Project: Kafka
>  Issue Type: Task
>  Components: protocol
>        Reporter: Crispin Bernier
>    Assignee: Crispin Bernier
>Priority: Major
> Fix For: 3.7.0
>
>
> Server side and protocol changes for implementing KIP-951, passing back the 
> new leader to the client on NOT_LEADER_OR_FOLLOWER errors for fetch and 
> produce requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15661) KIP-951: Server side and protocol changes

2023-10-20 Thread Crispin Bernier (Jira)
Crispin Bernier created KAFKA-15661:
---

 Summary: KIP-951: Server side and protocol changes
 Key: KAFKA-15661
 URL: https://issues.apache.org/jira/browse/KAFKA-15661
 Project: Kafka
  Issue Type: Task
  Components: protocol
Reporter: Crispin Bernier


Server side and protocol changes for implementing KIP-951, passing back the new 
leader to the client on NOT_LEADER_OR_FOLLOWER errors for fetch and produce 
requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-09-20 Thread Crispin Bernier
Hi,

I've updated the KIP with benchmark results focusing more directly on the
redirect case, please review and +1 in the voting thread if convinced.

Thank you,
Crispin

On Wed, Jul 26, 2023 at 11:13 PM Luke Chen  wrote:

> Thanks for adding the benchmark results, Crispin!
> IMO, 2~5% performance improvement is small, but given the change is small,
> cost is also small (only append endpoint info when NOT_LEADER_OR_FOLLOWER..
> etc error), I think it is worth doing it.
>
> Thank you.
> Luke
>
> On Wed, Jul 26, 2023 at 12:33 AM Ismael Juma  wrote:
>
> > Thanks Crispin!
> >
> > Ismael
> >
> > On Mon, Jul 24, 2023 at 8:16 PM Crispin Bernier
> >  wrote:
> >
> > > I updated the wiki to include both results along with their average.
> > >
> > > Thank you,
> > > Crispin
> > >
> > > On Mon, Jul 24, 2023 at 10:58 AM Ismael Juma 
> wrote:
> > >
> > > > Hi Crispin,
> > > >
> > > > One additional question, the wiki says "The results are averaged
> over 2
> > > > runs.". Can you please provide some measure of variance in the
> > > > distribution, i.e. were both results similar to each other for both
> > > cases?
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Jul 21, 2023 at 11:31 AM Ismael Juma 
> > wrote:
> > > >
> > > > > Thanks for the update Crispin - very helpful to have actual
> > performance
> > > > > data. 2-5% for the default configuration is a bit on the low side
> for
> > > > this
> > > > > kind of proposal.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Jul 20, 2023 at 11:33 PM Crispin Bernier
> > > > >  wrote:
> > > > >
> > > > >> Benchmark numbers have been posted on the KIP, please review.
> > > > >>
> > > > >> On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote:
> > > > >> > Jun
> > > > >> >
> > > > >> > Thanks for the feedback.
> > > > >> >
> > > > >> > Numbers to follow.
> > > > >> >
> > > > >> > If we don't plan to
> > > > >> > > bump up the FetchResponse version, we could just remove the
> > > > reference
> > > > >> to
> > > > >> > > version 16.
> > > > >> >
> > > > >> > Fixed.
> > > > >> >
> > > > >> > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao
> >  > > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi, Mayank,
> > > > >> > >
> > > > >> > > Thanks for the KIP. I agree with others that it would be
> useful
> > to
> > > > >> see the
> > > > >> > > performance results. Otherwise, just a minor comment. If we
> > don't
> > > > >> plan to
> > > > >> > > bump up the FetchResponse version, we could just remove the
> > > > reference
> > > > >> to
> > > > >> > > version 16.
> > > > >> > >
> > > > >> > > Jun
> > > > >> > >
> > > > >> > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula <
> > > > >> > > mayanks.nar...@gmail.com> wrote:
> > > > >> > >
> > > > >> > > > Luke
> > > > >> > > >
> > > > >> > > > Thanks for the interest in the KIP.
> > > > >> > > >
> > > > >> > > > But what if the consumer was fetching from the follower?
> > > > >> > > >
> > > > >> > > > We already include `PreferredReadReplica` in the fetch
> > response.
> > > > >> > > > > Should we put the node info of PreferredReadReplica under
> > this
> > > > >> case,
> > > > >> > > > > instead of the leader's info?
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > > PreferredReadReplica is the decided on the leader. Looking
> at
> > > the
> > > > >> Java
> > > > >> > > > client code, AbstractFetch::selectReadRe

Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-24 Thread Crispin Bernier
I updated the wiki to include both results along with their average.

Thank you,
Crispin

On Mon, Jul 24, 2023 at 10:58 AM Ismael Juma  wrote:

> Hi Crispin,
>
> One additional question, the wiki says "The results are averaged over 2
> runs.". Can you please provide some measure of variance in the
> distribution, i.e. were both results similar to each other for both cases?
>
> Ismael
>
> On Fri, Jul 21, 2023 at 11:31 AM Ismael Juma  wrote:
>
> > Thanks for the update Crispin - very helpful to have actual performance
> > data. 2-5% for the default configuration is a bit on the low side for
> this
> > kind of proposal.
> >
> > Ismael
> >
> > On Thu, Jul 20, 2023 at 11:33 PM Crispin Bernier
> >  wrote:
> >
> >> Benchmark numbers have been posted on the KIP, please review.
> >>
> >> On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote:
> >> > Jun
> >> >
> >> > Thanks for the feedback.
> >> >
> >> > Numbers to follow.
> >> >
> >> > If we don't plan to
> >> > > bump up the FetchResponse version, we could just remove the
> reference
> >> to
> >> > > version 16.
> >> >
> >> > Fixed.
> >> >
> >> > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao 
> >> wrote:
> >> >
> >> > > Hi, Mayank,
> >> > >
> >> > > Thanks for the KIP. I agree with others that it would be useful to
> >> see the
> >> > > performance results. Otherwise, just a minor comment. If we don't
> >> plan to
> >> > > bump up the FetchResponse version, we could just remove the
> reference
> >> to
> >> > > version 16.
> >> > >
> >> > > Jun
> >> > >
> >> > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula <
> >> > > mayanks.nar...@gmail.com> wrote:
> >> > >
> >> > > > Luke
> >> > > >
> >> > > > Thanks for the interest in the KIP.
> >> > > >
> >> > > > But what if the consumer was fetching from the follower?
> >> > > >
> >> > > > We already include `PreferredReadReplica` in the fetch response.
> >> > > > > Should we put the node info of PreferredReadReplica under this
> >> case,
> >> > > > > instead of the leader's info?
> >> > > > >
> >> > > >
> >> > > > PreferredReadReplica is the decided on the leader. Looking at the
> >> Java
> >> > > > client code, AbstractFetch::selectReadReplica, first fetch request
> >> goes
> >> > > to
> >> > > > Leader of the partition -> Sends back PreferredReadReplica -> Next
> >> fetch
> >> > > > uses PreferredReadReplica. So as long as leader is available,
> >> > > > PreferredReadReplica would be found in subsequent fetches.
> >> > > >
> >> > > > Also, under this case, should we include the leader's info in the
> >> > > response?
> >> > > >
> >> > > >
> >> > > > In this case, I think the follower would fail the fetch if it
> knows
> >> a
> >> > > > different leader. If the follower knows a newer leader, it would
> >> return
> >> > > new
> >> > > > leader information in the response, for the client to act on.
> >> > > >
> >> > > >
> >> > > > Will we include the leader/node info in the response when having
> >> > > > > `UNKNOWN_LEADER_EPOCH` error?
> >> > > >
> >> > > >
> >> > > > My understanding is UNKNOWN_LEADER_EPOCH when a request from a
> >> client
> >> > > has a
> >> > > > newer epoch than the broker. So the client is already up to date
> on
> >> new
> >> > > > leader information, it's the broker that has the catching up to
> do.
> >> I
> >> > > think
> >> > > > there might be some optimisations to make sure the broker
> refreshes
> >> its
> >> > > > metadata quickly, so it can quickly recover to handle requests
> that
> >> > > > previously returned UNKNOWN_LEADER_EPOCH. But this work is outside
> >> the
> >> > > > scope of this KIP, as for now this KIP

RE: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-20 Thread Crispin Bernier
Benchmark numbers have been posted on the KIP, please review.

On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote:
> Jun
> 
> Thanks for the feedback.
> 
> Numbers to follow.
> 
> If we don't plan to
> > bump up the FetchResponse version, we could just remove the reference to
> > version 16.
> 
> Fixed.
> 
> On Thu, Jul 20, 2023 at 1:28 AM Jun Rao  wrote:
> 
> > Hi, Mayank,
> >
> > Thanks for the KIP. I agree with others that it would be useful to see the
> > performance results. Otherwise, just a minor comment. If we don't plan to
> > bump up the FetchResponse version, we could just remove the reference to
> > version 16.
> >
> > Jun
> >
> > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula <
> > mayanks.nar...@gmail.com> wrote:
> >
> > > Luke
> > >
> > > Thanks for the interest in the KIP.
> > >
> > > But what if the consumer was fetching from the follower?
> > >
> > > We already include `PreferredReadReplica` in the fetch response.
> > > > Should we put the node info of PreferredReadReplica under this case,
> > > > instead of the leader's info?
> > > >
> > >
> > > PreferredReadReplica is the decided on the leader. Looking at the Java
> > > client code, AbstractFetch::selectReadReplica, first fetch request goes
> > to
> > > Leader of the partition -> Sends back PreferredReadReplica -> Next fetch
> > > uses PreferredReadReplica. So as long as leader is available,
> > > PreferredReadReplica would be found in subsequent fetches.
> > >
> > > Also, under this case, should we include the leader's info in the
> > response?
> > >
> > >
> > > In this case, I think the follower would fail the fetch if it knows a
> > > different leader. If the follower knows a newer leader, it would return
> > new
> > > leader information in the response, for the client to act on.
> > >
> > >
> > > Will we include the leader/node info in the response when having
> > > > `UNKNOWN_LEADER_EPOCH` error?
> > >
> > >
> > > My understanding is UNKNOWN_LEADER_EPOCH when a request from a client
> > has a
> > > newer epoch than the broker. So the client is already up to date on new
> > > leader information, it's the broker that has the catching up to do. I
> > think
> > > there might be some optimisations to make sure the broker refreshes its
> > > metadata quickly, so it can quickly recover to handle requests that
> > > previously returned UNKNOWN_LEADER_EPOCH. But this work is outside the
> > > scope of this KIP, as for now this KIP focusses on client-side
> > > optimisations.
> > >
> > > Mayank
> > >
> > > On Tue, Jul 18, 2023 at 8:51 AM Luke Chen  wrote:
> > >
> > > > Hi Mayank,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > Some questions:
> > > > 1. I can see most of the cases we only care about consumer fetch from
> > the
> > > > leader.
> > > > But what if the consumer was fetching from the follower?
> > > > We already include `PreferredReadReplica` in the fetch response.
> > > > Should we put the node info of PreferredReadReplica under this case,
> > > > instead of the leader's info?
> > > > Also, under this case, should we include the leader's info in the
> > > response?
> > > >
> > > > 2. Will we include the leader/node info in the response when having
> > > > `UNKNOWN_LEADER_EPOCH` error?
> > > > I think it's fine we ignore the `UNKNOWN_LEADER_EPOCH` error since when
> > > > this happens, the node might have some error which should refresh the
> > > > metadata. On the other hand, it might also be good if we can heal the
> > > node
> > > > soon to do produce/consume works.
> > > >
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Tue, Jul 18, 2023 at 2:00 AM Philip Nee 
> > wrote:
> > > >
> > > > > Hey Mayank:
> > > > >
> > > > > For #1: I think fetch and produce behave a bit differently on
> > metadata.
> > > > > Maybe it is worth highlighting the changes for each client in detail.
> > > In
> > > > > producer did you mean by the metadata timeout before sending out
> > > produce
> > > > > requests? For consumer: I think for fetches it requires user to retry
> > > if
> > > > > the position does not exist on the leader. I don't have the detail on
> > > top
> > > > > of my head, but I think we should lay out these behavioral changes.
> > > > >
> > > > > For #3: Thanks for the clarification.
> > > > >
> > > > > On Mon, Jul 17, 2023 at 10:39 AM Mayank Shekhar Narula <
> > > > > mayanks.nar...@gmail.com> wrote:
> > > > >
> > > > > > Philip
> > > > > >
> > > > > > 1. Good call out about "poll" behaviour, my understanding is the
> > > same.
> > > > I
> > > > > am
> > > > > > assuming it's about the motivation of the KIP. There with async, my
> > > > > > intention was to convey that the client doesn't wait for the
> > > > > > metadata-refresh before a subsequent retry of the produce or fetch
> > > > > request
> > > > > > that failed due to stale metadata(i.e. going to an old leader). The
> > > > only
> > > > > > wait client has is the configured retry-delay.
> > > > > >
> > > > > > 2. Yes, in theory other APIs could 

Requesting permission to contribute to Apache Kafka

2023-07-20 Thread Crispin Bernier
Hi,
I'm following this wiki to request permission to contribute to Apache Kafka
https://cwiki.apache.org/confluence/display/kafka/kafka+improvement+proposals#KafkaImprovementProposals-GettingStarted
My wiki ID and jira ID are both: crispinbernier
Can I get permission please?

Thank you,
Crispin


[jira] [Created] (KAFKA-13688) Incorrect metrics in KafkaController for replicasToDeleteCount and ineligibleReplicasToDeleteCount

2022-02-22 Thread Crispin Bernier (Jira)
Crispin Bernier created KAFKA-13688:
---

 Summary: Incorrect metrics in KafkaController for 
replicasToDeleteCount and ineligibleReplicasToDeleteCount
 Key: KAFKA-13688
 URL: https://issues.apache.org/jira/browse/KAFKA-13688
 Project: Kafka
  Issue Type: Bug
  Components: controller
Reporter: Crispin Bernier


{code:java}
replicasToDeleteCount = if (!isActive) 0 else 
controllerContext.topicsToBeDeleted.map { topic =>  // For each enqueued 
topic, count the number of replicas that are not yet deleted  
controllerContext.replicasForTopic(topic).count { replica =>
controllerContext.replicaState(replica) != ReplicaDeletionSuccessful  }
}.sum

ineligibleReplicasToDeleteCount = if (!isActive) 0 else 
controllerContext.topicsToBeDeleted.map { topic =>  // For each enqueued 
topic, count the number of replicas that are ineligible  
controllerContext.replicasForTopic(topic).count { replica =>
controllerContext.replicaState(replica) == ReplicaDeletionIneligible  }
}.sum {code}
Duplicate replica counts will get ignored in the total sum when the code above 
converts to a set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)