Re: First time patch submitter advice

2020-06-15 Thread Luke Chen
Hi Michael,
The failed unit test has already handled here:
https://issues.apache.org/jira/browse/KAFKA-10155
https://issues.apache.org/jira/browse/KAFKA-10147

So, maybe you can ignore the test errors and mention the issue number in PR.
Thanks.

Luke

On Mon, Jun 15, 2020 at 3:23 PM Michael Carter <
michael.car...@instaclustr.com> wrote:

> Thanks for the response Gwen, that clarifies things for me.
>
> Regarding the unit test (ReassignPartitionsUnitTest.
> testModifyBrokerThrottles),  it appears to fail quite reliably on trunk as
> well (at least on my machine).
> It looks to me like a new override to
> MockAdminClient.describeConfigs(Collection resources)
> (MockAdminClient.java line 369) introduced in commit
> 48b56e533b3ff22ae0e2cf7fcc649e7df19f2b06 changed the behaviour of this
> method that the unit test relied on.
> I’ve just now put a patch into my branch to make that test pass by calling
> a slightly different version of describeConfigs (that avoids the overridden
> behaviour). It’s probably arguable whether that constitutes a fix or not
> though.
>
> Cheers,
> Michael
>
> > On 15 Jun 2020, at 3:41 pm, Gwen Shapira  wrote:
> >
> > Hi,
> >
> > 1. Unfortunately, you need to get a committer to approve running the
> tests.
> > I just gave the green-light on your PR.
> > 2. You can hope that committers will see your PR, but sometimes things
> get
> > lost. If you know someone who is familiar with that area of the code, it
> is
> > a good idea to ping them.
> > 3. We do have some flaky tests. You can see that Jenkins will run 3
> > parallel builds, if some of them pass and the committer confirms that
> > failures are not related to your code, we are ok to merge. Obviously, if
> > you end up tracking them down and fixing, everyone will be very grateful.
> >
> > Hope this helps,
> >
> > Gwen
> >
> > On Sun, Jun 14, 2020 at 5:52 PM Michael Carter <
> > michael.car...@instaclustr.com> wrote:
> >
> >> Hi all,
> >>
> >> I’ve submitted a patch for the first time(
> >> https://github.com/apache/kafka/pull/8844 <
> >> https://github.com/apache/kafka/pull/8844>), and I have a couple of
> >> questions that I’m hoping someone can help me answer.
> >>
> >> I’m a little unclear what happens after that patch has been submitted.
> The
> >> coding guidelines say Jenkins will run tests automatically, but I don’t
> see
> >> any results anywhere. Have I misunderstood what should happen, or do I
> just
> >> not know where to look?
> >> Should I be attempting to find reviewers for the change myself, or is
> that
> >> done independently of the patch submitter?
> >>
> >> Also, in resolving a couple of conflicts that have arisen after the
> patch
> >> was first submitted, I noticed that there are now failing unit tests
> that
> >> have nothing to do with my change. Is there a convention on how to deal
> >> with these? Should it be something that I try to fix on my branch?
> >>
> >> Any thoughts are appreciated.
> >>
> >> Thanks,
> >> Michael
> >
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
>
>


Re: New Website Layout

2020-08-05 Thread Luke Chen
When entering streams doc, it'll always show:
*You're viewing documentation for an older version of Kafka - check out our
current documentation here.*



On Wed, Aug 5, 2020 at 6:44 PM Ben Stopford  wrote:

> Thanks for the PR and feedback Michael. Appreciated.
>
> On Wed, 5 Aug 2020 at 10:49, Mickael Maison 
> wrote:
>
> > Thank you, it looks great!
> >
> > I found a couple of small issues:
> > - It's not rendering correctly with http.
> > - It's printing "called" to the console. I opened a PR to remove the
> > console.log() call: https://github.com/apache/kafka-site/pull/278
> >
> > On Wed, Aug 5, 2020 at 9:45 AM Ben Stopford  wrote:
> > >
> > > The new website layout has gone live as you may have seen. There are a
> > > couple of rendering issues in the streams developer guide that we're
> > > getting addressed. If anyone spots anything else could they please
> reply
> > to
> > > this thread.
> > >
> > > Thanks
> > >
> > > Ben
> > >
> > > On Fri, 26 Jun 2020 at 11:48, Ben Stopford  wrote:
> > >
> > > > Hey folks
> > > >
> > > > We've made some updates to the website's look and feel. There is a
> > staged
> > > > version in the link below.
> > > >
> > > > https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
> > > > username: kafka
> > > > password: streaming
> > > >
> > > > Comments welcomed.
> > > >
> > > > Ben
> > > >
> > > >
> >
>
>
> --
>
> Ben Stopford
>
> Lead Technologist, Office of the CTO
>
> 
>


Re: New Website Layout

2020-08-11 Thread Luke Chen
Hi Tom, Ben,
PR is ready to address the issue you saw:
https://github.com/apache/kafka-site/pull/292

Thanks.

On Wed, Aug 12, 2020 at 1:09 AM Tom Bentley  wrote:

> Hi Ben,
>
> Thanks for fixing that. Another problem I've just noticed is a couple of
> garbled headings. E.g. scroll down from
> https://kafka.apache.org/documentation.html#design_compactionbasics and
> the
> "What guarantees does log compaction provide?" section is rendering as
>
> $1 class="anchor-heading">$8$9$10
> <https://kafka.apache.org/documentation.html#$4>
>
> with the . Similar thing in
> https://kafka.apache.org/documentation.html#design_quotas. The source HTML
> looks OK to me.
>
> Kind regards,
>
> Tom
>
> On Mon, Aug 10, 2020 at 2:15 PM Ben Stopford  wrote:
>
> > Good spot. Thanks.
> >
> > On Thu, 6 Aug 2020 at 18:59, Ben Weintraub  wrote:
> >
> > > Plus one to Tom's request - the ability to easily generate links to
> > > specific config options is extremely valuable.
> > >
> > > On Thu, Aug 6, 2020 at 10:09 AM Tom Bentley 
> wrote:
> > >
> > > > Hi Ben,
> > > >
> > > > The documentation for the configs (broker, producer etc) used to
> > function
> > > > as links as well as anchors, which made the url fragments more
> > > > discoverable, because you could click on the link and then copy+paste
> > the
> > > > browser URL:
> > > >
> > > > 
> > > >> > > href="#batch.size">batch.size
> > > > 
> > > >
> > > > What seems to have happened with the new layout is the  tags are
> > > empty,
> > > > and no longer enclose the config name,
> > > >
> > > > 
> > > >   
> > > >   batch.size
> > > > 
> > > >
> > > > meaning you can't click on the link to copy and paste the URL. Could
> > the
> > > > old behaviour be restored?
> > > >
> > > > Thanks,
> > > >
> > > > Tom
> > > >
> > > > On Wed, Aug 5, 2020 at 12:43 PM Luke Chen  wrote:
> > > >
> > > > > When entering streams doc, it'll always show:
> > > > > *You're viewing documentation for an older version of Kafka - check
> > out
> > > > our
> > > > > current documentation here.*
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 5, 2020 at 6:44 PM Ben Stopford 
> > wrote:
> > > > >
> > > > > > Thanks for the PR and feedback Michael. Appreciated.
> > > > > >
> > > > > > On Wed, 5 Aug 2020 at 10:49, Mickael Maison <
> > > mickael.mai...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you, it looks great!
> > > > > > >
> > > > > > > I found a couple of small issues:
> > > > > > > - It's not rendering correctly with http.
> > > > > > > - It's printing "called" to the console. I opened a PR to
> remove
> > > the
> > > > > > > console.log() call:
> > https://github.com/apache/kafka-site/pull/278
> > > > > > >
> > > > > > > On Wed, Aug 5, 2020 at 9:45 AM Ben Stopford 
> > > > wrote:
> > > > > > > >
> > > > > > > > The new website layout has gone live as you may have seen.
> > There
> > > > are
> > > > > a
> > > > > > > > couple of rendering issues in the streams developer guide
> that
> > > > we're
> > > > > > > > getting addressed. If anyone spots anything else could they
> > > please
> > > > > > reply
> > > > > > > to
> > > > > > > > this thread.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Ben
> > > > > > > >
> > > > > > > > On Fri, 26 Jun 2020 at 11:48, Ben Stopford  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hey folks
> > > > > > > > >
> > > > > > > > > We've made some updates to the website's look and feel.
> There
> > > is
> > > > a
> > > > > > > staged
> > > > > > > > > version in the link below.
> > > > > > > > >
> > > > > > > > > https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
> > > > > > > > > username: kafka
> > > > > > > > > password: streaming
> > > > > > > > >
> > > > > > > > > Comments welcomed.
> > > > > > > > >
> > > > > > > > > Ben
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Ben Stopford
> > > > > >
> > > > > > Lead Technologist, Office of the CTO
> > > > > >
> > > > > > <https://www.confluent.io>
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Ben Stopford
> >
> > Lead Technologist, Office of the CTO
> >
> > <https://www.confluent.io>
> >
>


Re: [DISCUSS] KIP-930: Tiered Storage Metrics

2023-07-25 Thread Luke Chen
Hi Abhijeet,

Thanks for the KIP!
I don't have much preference for the name changing.
But if it could confuse other people, it's good to make it clear.

Thank you.
Luke

On Tue, Jul 25, 2023 at 2:53 PM Abhijeet Kumar 
wrote:

> Hi Kamal,
>
> As we discussed offline, I will rename this KIP so that it only captures
> the aspect of renaming the previously added metrics to remove ambiguity.
> I will create another KIP for RemoteIndexCache metrics and other relevant
> tiered storage metrics.
>
> On Tue, Jul 25, 2023 at 12:03 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the KIP!
> >
> > We are changing the metric names from what was proposed in the KIP-405
> and
> > adding new metrics for RemoteIndexCache.
> > In the KIP, it's not clear whether we are renaming the aggregate broker
> > level metrics for remote copy/fetch/failed-copy/failed-fetch.
> >
> > Are these metrics enough to monitor all the aspects of tiered storage?
> >
> > (eg)
> > 1. Metrics to see the Tier Lag Status by number of pending
> > segments/records.
> > 2. Similar to log-start-offset and log-end-offset metrics.  Should we
> > expose local-log-start-offset and
> highest-offset-uploaded-to-remote-storage
> > as metric?
> >
> > Thanks,
> > Kamal
> >
> > On Mon, Jul 24, 2023 at 2:08 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > I created KIP-930 for adding RemoteIndexCache stats and also to rename
> > some
> > > tiered storage metrics added as part of KIP-405
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
> > > >
> > > to remove ambiguity.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> > >
> > > Feedback and suggestions are welcome.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> >
>
>
> --
> Abhijeet.
>


Re: KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum

2023-07-25 Thread Luke Chen
Hi Colin,

Some more comments:
1. In the KIP, we mentioned "controller heartbeats", but it is not
explained anywhere.
I think "controller heartbeats" = controller registration", is that
correct?
If no, please explain more about it in the KIP.

2. Following this question:
> Which endpoint will the inactive controllers use to send the
ControllerRegistrationRequest?
> A: They will use the endpoint in controller.quorum.voters.
If the registration request will include controller.quorum.voters, why
bother sending this information to active controller again?
The active controller should already have all the controller.quorum.voters
when start up.
Any purpose of that design? For validation?

3. If a controller node is not part of `controller.quorum.voters`, when it
sends ControllerRegistrationRequest, what will we respond to it?

4. Nice and clear compatibility matrix!

Thank you.
Luke

On Sat, Jul 22, 2023 at 3:38 AM Colin McCabe  wrote:

> On Fri, Jul 21, 2023, at 09:43, José Armando García Sancio wrote:
> > Thanks for the KIP Colin. Apologies if some of these points have
> > already been made. I have not followed the discussion closely:
> >
> > 1. Re: Periodically, each controller will check that the controller
> > registration for its ID is as expected
> >
> > Does this need to be periodic? Can't the controller schedule this RPC,
> > retry etc, when it finds that the incarnation ID doesn't match its
> > own?
> >
>
> Hi José,
>
> Thanks for the reviews.
>
> David had the same question. I agree that it should be event-driven rather
> than periodic (except for retries, etc.)
>
> >
> > 2. Did you consider including the active controller's epoch in the
> > ControllerRegistrationRequest?
> >
> > This would allow the active controller to reject registration from
> > controllers that are not part of the active quorum and don't know the
> > latest controller epoch. This should mitigate some of the concerns you
> > raised in bullet point 1.
> >
>
> Good idea. I will add the active controller epoch to the registration
> request.
>
> >
> > 3. Which endpoint will the inactive controllers use to send the
> > ControllerRegistrationRequest?
> >
> > Will it use the first endpoint described in the cluster metadata
> > controller registration record? Or would it use the endpoint described
> > in the server configuration at controller.quorum.voters?
> >
>
> They will use the endpoint in controller.quorum.voters. In general, the
> endpoints from the registration are only used for responding to
> DESCRIBE_CLUSTER. Since, after all, we may not even have the registration
> endpoints when we start up.
>
> >
> > 4. Re: Raft integration in the rejected alternatives
> >
> > Yes, The KRaft layer needs to solve a similar problem like endpoint
> > discovery to support dynamic controller membership change. As you
> > point out the requirements are different and the set of information
> > that needs to be tracked is different. I think it is okay to use a
> > different solution for each of these problems.
>
> Yeah that was my feeling too. Thanks for taking a look.
>
> regards,
> Colin
>
> >
> > Thanks!
> > --
> > -José
>


Re: [VOTE] KIP-930: Tiered Storage Metrics

2023-07-25 Thread Luke Chen
+1 (binding) from me.

Thanks.
Luke

On Tue, Jul 25, 2023 at 7:51 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> +1 (non-binding)
>
> Thanks, Abhijeet!
>
>
> On Tue, 25 Jul 2023 at 14:22, Abhijeet Kumar 
> wrote:
>
> > Please find the updated link to the KIP:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Rename+ambiguous+Tiered+Storage+Metrics
> >
> > Updated the KIP as per our conversation on the discussion thread.
> >
> > On Tue, Jul 25, 2023 at 11:29 AM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start the vote for KIP-930 Tiered Storage Metrics.
> > >
> > > The KIP is here:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> > >
> > > Regards
> > > Abhijeet.
> > >
> > >
> >
> > --
> > Abhijeet.
> >
>


Re: [VOTE] KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration

2023-07-26 Thread Luke Chen
+1 (binding) from me.

Thanks for the KIP!

Luke

On Tue, Jul 25, 2023 at 1:24 AM Colin McCabe  wrote:

> Hi all,
>
> I'd like to start the vote for KIP-919: Allow AdminClient to Talk Directly
> with the KRaft Controller Quorum and add Controller Registration.
>
> The KIP is here: https://cwiki.apache.org/confluence/x/Owo0Dw
>
> Thanks to everyone who reviewed the proposal.
>
> best,
> Colin
>


Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

2023-07-26 Thread Luke Chen
Thanks for adding the benchmark results, Crispin!
IMO, 2~5% performance improvement is small, but given the change is small,
cost is also small (only append endpoint info when NOT_LEADER_OR_FOLLOWER..
etc error), I think it is worth doing it.

Thank you.
Luke

On Wed, Jul 26, 2023 at 12:33 AM Ismael Juma  wrote:

> Thanks Crispin!
>
> Ismael
>
> On Mon, Jul 24, 2023 at 8:16 PM Crispin Bernier
>  wrote:
>
> > I updated the wiki to include both results along with their average.
> >
> > Thank you,
> > Crispin
> >
> > On Mon, Jul 24, 2023 at 10:58 AM Ismael Juma  wrote:
> >
> > > Hi Crispin,
> > >
> > > One additional question, the wiki says "The results are averaged over 2
> > > runs.". Can you please provide some measure of variance in the
> > > distribution, i.e. were both results similar to each other for both
> > cases?
> > >
> > > Ismael
> > >
> > > On Fri, Jul 21, 2023 at 11:31 AM Ismael Juma 
> wrote:
> > >
> > > > Thanks for the update Crispin - very helpful to have actual
> performance
> > > > data. 2-5% for the default configuration is a bit on the low side for
> > > this
> > > > kind of proposal.
> > > >
> > > > Ismael
> > > >
> > > > On Thu, Jul 20, 2023 at 11:33 PM Crispin Bernier
> > > >  wrote:
> > > >
> > > >> Benchmark numbers have been posted on the KIP, please review.
> > > >>
> > > >> On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote:
> > > >> > Jun
> > > >> >
> > > >> > Thanks for the feedback.
> > > >> >
> > > >> > Numbers to follow.
> > > >> >
> > > >> > If we don't plan to
> > > >> > > bump up the FetchResponse version, we could just remove the
> > > reference
> > > >> to
> > > >> > > version 16.
> > > >> >
> > > >> > Fixed.
> > > >> >
> > > >> > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao
>  > >
> > > >> wrote:
> > > >> >
> > > >> > > Hi, Mayank,
> > > >> > >
> > > >> > > Thanks for the KIP. I agree with others that it would be useful
> to
> > > >> see the
> > > >> > > performance results. Otherwise, just a minor comment. If we
> don't
> > > >> plan to
> > > >> > > bump up the FetchResponse version, we could just remove the
> > > reference
> > > >> to
> > > >> > > version 16.
> > > >> > >
> > > >> > > Jun
> > > >> > >
> > > >> > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula <
> > > >> > > mayanks.nar...@gmail.com> wrote:
> > > >> > >
> > > >> > > > Luke
> > > >> > > >
> > > >> > > > Thanks for the interest in the KIP.
> > > >> > > >
> > > >> > > > But what if the consumer was fetching from the follower?
> > > >> > > >
> > > >> > > > We already include `PreferredReadReplica` in the fetch
> response.
> > > >> > > > > Should we put the node info of PreferredReadReplica under
> this
> > > >> case,
> > > >> > > > > instead of the leader's info?
> > > >> > > > >
> > > >> > > >
> > > >> > > > PreferredReadReplica is the decided on the leader. Looking at
> > the
> > > >> Java
> > > >> > > > client code, AbstractFetch::selectReadReplica, first fetch
> > request
> > > >> goes
> > > >> > > to
> > > >> > > > Leader of the partition -> Sends back PreferredReadReplica ->
> > Next
> > > >> fetch
> > > >> > > > uses PreferredReadReplica. So as long as leader is available,
> > > >> > > > PreferredReadReplica would be found in subsequent fetches.
> > > >> > > >
> > > >> > > > Also, under this case, should we include the leader's info in
> > the
> > > >> > > response?
> > > >> > > >
> > > >> > > >
> > > >

Re: Request permission to contribute

2023-07-30 Thread Luke Chen
Hi Qichao,

Your accounts are all set up.

Thanks.
Luke

On Mon, Jul 31, 2023 at 1:19 PM Qichao Chu  wrote:

> Hi Bruno,
>
> Is it ok for you to set me up for contribution too? I would like to create
> a KIP.
> My account ID in Confluence and JIRA is ex172000 and my email is
> ex172...@gmail.com.
>
> Thank you in advance!
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] 
>
>
> On Wed, Jul 26, 2023 at 12:41 AM Bruno Cadonna  wrote:
>
> > Hi Taras,
> >
> > I set you up for the Apache Kafka wiki. Let me know if you still miss
> > permissions.
> >
> > Thank you for your interest in Apache Kafka!
> >
> > Best,
> > Bruno
> >
> > On 7/25/23 9:49 AM, Taras Ledkov wrote:
> > > Hi Guozhang,
> > >
> > > Thanks for your attention.
> > >
> > > I'm a contributor of other Apache project (Ignite).
> > > Now I don't have permissions to create a page in `Apache Kafka` space
> on
> > wiki (confluence) [1]
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >
>


Re: Request permission to contribute

2023-08-03 Thread Luke Chen
Hi Adrian,

Your account is all set.

Thanks.
Luke

On Thu, Aug 3, 2023 at 4:25 PM Adrian Preston  wrote:

> Hello,
> Please could my JIRA account (prestona) be given the permissions required
> to contribute to Kafka?
> Thanks,
> Adrian
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


Re: [DISCUSS] Apache Kafka 3.4.1 release

2023-08-09 Thread Luke Chen
Hi José,

Thanks for the reminder.
Yes, I did miss that.
Already updated and pushed.

Thanks.
Luke

On Wed, Aug 9, 2023 at 8:08 AM José Armando García Sancio
 wrote:

> Hey Luke,
>
> Thanks for working on the release for 3.4.1. I was working on some
> cherry picks and I noticed that branch 3.4 doesn't contain the
> commit/tag for 3.4.1. I think we are supposed to merge the tag back to
> the 3.4 branch. E.g.:
>
> > Merge the last version change / rc tag into the release branch and bump
> the version to 0.10.0.1-SNAPSHOT
> >
> > git checkout 0.10.0
> > git merge 0.10.0.0-rc6
>
> from: https://cwiki.apache.org/confluence/display/KAFKA/Release+Process
>
> Did we forget to do that part?
>
> Thanks!
> --
> -José
>


Re: Requesting permissions to contribute to Apache Kafka

2023-08-22 Thread Luke Chen
Hi Animesh,

Your accounts are all set.

Thanks.
Luke

On Tue, Aug 22, 2023 at 9:25 PM Animesh Kumar  wrote:

> Hi Team,
> Please provide access to contribute to Apache Kafka
> JIRA id -- akanimesh7
> Wiki Id -- akanimesh7
> --
> Animesh Kumar
> 8120004556
>


Re: [REVIEW REQUEST] Move ReassignPartitionsCommandArgsTest to java

2023-08-23 Thread Luke Chen
Hi,

Sorry that we're mostly working on features for v3.6.0, which is expected
to be released in the following weeks.
I'll review your PR after releasing. (Please ping me then if I forget it!)

Also, it'd be good if the devs in the community can help on PR review when
available.
That'll help a lot.
Besides, PR review is also one kind of contribution, not just code
commitment.

Thanks.
Luke



On Tue, Aug 22, 2023 at 7:15 PM Николай Ижиков  wrote:

> Hello.
>
> Please, join the simple review)
> We have few steps left to completely rewrite ReassignPartitionsCommand in
> java.
>
> > 17 авг. 2023 г., в 17:16, Николай Ижиков 
> написал(а):
> >
> > Hello.
> >
> > I’m working on [1].
> > The goal of ticket is to rewire `ReassignPartitionCommand` in java.
> >
> > The PR that moves whole command is pretty big so it makes sense to split
> it.
> > I prepared the PR [2] that moves single test
> (ReassignPartitionsCommandArgsTest) to java.
> >
> > It relatively small and simple(touches only 3 files):
> >
> > To review - https://github.com/apache/kafka/pull/14217
> > Big PR  - https://github.com/apache/kafka/pull/13247
> >
> > Please, review.
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-14595
> > [2] https://github.com/apache/kafka/pull/14217
>
>


Re: FYI - CI failures due to Apache Infra (Issue with creating launcher for agent)

2023-08-28 Thread Luke Chen
Thanks for the info, Divij!

Luke

On Mon, Aug 28, 2023 at 6:01 PM Divij Vaidya 
wrote:

> Hey folks
>
> During you CI runs, you may notice that some test pipelines fail to
> start with messages such as:
>
> "ERROR: Issue with creating launcher for agent builds38. The agent is
> being disconnected"
> "Remote call on builds38 failed"
>
> This occurs due to bad hosts in the Apache infrastructure CI. We have
> an ongoing ticket here -
>
> https://issues.apache.org/jira/browse/INFRA-24927?focusedCommentId=17759528&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17759528
>
> I will keep an eye on the ticket and reply to this thread when it is
> fixed. Meanwhile, the workaround is to restart the tests.
>
> Cheers!
>
> --
> Divij Vaidya
>


Kafka jenkins is unavailable now

2023-08-29 Thread Luke Chen
Hi all,

Just found Kafka jenkins is unavailable currently.
Filed an INFRA issue: https://issues.apache.org/jira/browse/INFRA-24940
FYI.

Thanks.
Luke


Re: Apache Kafka 3.6.0 release

2023-09-13 Thread Luke Chen
Hi Satish,

Since this PR:
https://github.com/apache/kafka/pull/14366 only changes the doc, I've
backported to 3.6 branch. FYI.

Thanks.
Luke

On Thu, Sep 14, 2023 at 12:15 AM Justine Olshan
 wrote:

> Hey Satish -- yes, you are correct. KAFKA-15459 only affects 3.6.
> PR should be finalized soon.
>
> Thanks,
> Justine
>
> On Wed, Sep 13, 2023 at 1:41 AM Federico Valeri 
> wrote:
>
> > Hi Satish, this is a small documentation fix about ZK to KRaft
> > migration, that we would like to backport to 3.5 and 3.6 branches. Are
> > you ok with that?
> >
> > https://github.com/apache/kafka/pull/14366
> >
> > On Wed, Sep 13, 2023 at 3:13 AM Satish Duggana  >
> > wrote:
> > >
> > > Thanks David for the quick resolution.
> > >
> > > ~Satish.
> > >
> > > On Tue, 12 Sept 2023 at 22:51, David Arthur
> > >  wrote:
> > > >
> > > > Satish,
> > > >
> > > > KAFKA-15450 is merged to 3.6 (as well as trunk, 3.5, and 3.4)
> > > >
> > > > Thanks!
> > > > David
> > > >
> > > > On Tue, Sep 12, 2023 at 11:44 AM Ismael Juma 
> > wrote:
> > > >
> > > > > Justine,
> > > > >
> > > > > Probably best to have the conversation in the JIRA ticket vs the
> > release
> > > > > thread. Generally, we want to only include low risk bug fixes that
> > are
> > > > > fully compatible in patch releases.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Tue, Sep 12, 2023 at 7:16 AM Justine Olshan
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Thanks Satish. I understand.
> > > > > > Just curious, is this something that could be added to 3.6.1? It
> > would be
> > > > > > nice to say that hanging transactions are fully covered in a 3.6
> > release.
> > > > > > I'm not as familiar with the rules around minor releases, but
> > adding it
> > > > > > there would give more time to ensure stability.
> > > > > >
> > > > > > Thanks,
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana <
> > satish.dugg...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > > We can skip this change into 3.6 now as it is not a blocker or
> > > > > > > regression and it involves changes to the API implementation.
> > Let us
> > > > > > > plan to add the gap in the release notes as you mentioned.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Satish.
> > > > > > >
> > > > > > > On Tue, 12 Sept 2023 at 04:44, Justine Olshan
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hey Satish,
> > > > > > > >
> > > > > > > > We just discovered a gap in KIP-890 part 1. We currently
> don't
> > verify
> > > > > > on
> > > > > > > > txn offset commits, so it is still possible to have hanging
> > > > > > transactions
> > > > > > > on
> > > > > > > > the consumer offsets partitions.
> > > > > > > > I've opened a jira to wire the verification in that request.
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15449
> > > > > > > >
> > > > > > > > This also isn't a regression, but it would be nice to have
> > part 1
> > > > > fully
> > > > > > > > complete. I have opened a PR with the fix:
> > > > > > > > https://github.com/apache/kafka/pull/14370.
> > > > > > > >
> > > > > > > > I understand if there are concerns about last minute changes
> > to this
> > > > > > API
> > > > > > > > and we can hold off if that makes the most sense.
> > > > > > > > If we take that route, I think we should still keep
> > verification for
> > > > > > the
> > > > > > > > data partitions since it still provides full protection there
> > and
> > > > > > > improves
> > > > > > > > the transactions experience. We will need to call out the gap
> > in the
> > > > > > > > release notes for consumer offsets partitions
> > > > > > > >
> > > > > > > > Let me know what you think.
> > > > > > > > Justine
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Sep 11, 2023 at 12:29 PM David Arthur
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > Another (small) ZK migration issue was identified. This one
> > isn't a
> > > > > > > > > regression (it has existed since 3.4), but I think it's
> > reasonable
> > > > > to
> > > > > > > > > include. It's a small configuration check that could
> > potentially
> > > > > save
> > > > > > > end
> > > > > > > > > users from some headaches down the line.
> > > > > > > > >
> > > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15450
> > > > > > > > > https://github.com/apache/kafka/pull/14367
> > > > > > > > >
> > > > > > > > > I think we can get this one committed to trunk today.
> > > > > > > > >
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma <
> > m...@ismaeljuma.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Satish,
> > > > > > > > > >
> > > > > > > > > > That sounds great. I think we should aim to only allow
> > blockers
> > > > > > > > > > (regressions, impactful security issues, etc.) on the 3.6
> > branch
> > > > > > > until
> > > > > > > > > > 3.6.0 is out.
> > > > 

Re: Help to the community with review

2023-09-18 Thread Luke Chen
Hi nizhikov,

Thank you for your consideration and help!
Yes, we really need the community developers to help review PRs to share
the loading.
One thing to note, PR review is also counted as one of the contribution!
Please do that soon if you have spare cycle. :)

BTW, I'll review your PRs this week.

Thank you.
Luke

On Tue, Sep 19, 2023 at 1:31 AM Николай Ижиков  wrote:

> Hello.
>
> From my experience it’s hard to get Kafka patches reviewed in timely
> manner.
> I think that the reason is committers has different priorities and lack of
> free time.
>
> So I offer my help with review of patches if anyone from community are
> need it.
>
> Please, tag me on GitHub (@nizhikov) if you have valuable, relatively
> small, patch that stuck on review phase.
> I will review them one by one but on regular bases.


Re: Help to the community with review

2023-09-19 Thread Luke Chen
Hi Taras,

Thank you very much for your offers.
Really appreciate it!

For KIPs review, it's also great if the community can help review them if
available.
I'll also try my best to have a look when available.

Thank you for your contribution!
Luke

On Tue, Sep 19, 2023 at 3:08 PM Taras Ledkov  wrote:

> Hi,
>
> I'n not a committer to Kafka, but I offer my help with review of patches
> too.
> Please, tag me on GitHub (@tledkov).
>
> Unfortunately, no one offers help with KIPs review...
>


Re: [ANNOUNCE] New committer: Lucas Brutschy

2023-09-21 Thread Luke Chen
Congratulations, Lukas!

Luke

On Fri, Sep 22, 2023 at 6:53 AM Tom Bentley  wrote:

> Congratulations!
>
> On Fri, 22 Sept 2023 at 09:11, Sophie Blee-Goldman  >
> wrote:
>
> > Congrats Lucas!
> >
>


Re: [ANNOUNCE] New committer: Yash Mayya

2023-09-21 Thread Luke Chen
Congratulations, Yash!

Luke

On Fri, Sep 22, 2023 at 7:16 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) <
hgerald...@bloomberg.net> wrote:

> Congrats! Well deserved
>
> From: dev@kafka.apache.org At: 09/21/23 17:05:01 UTC-4:00To:
> dev@kafka.apache.org
> Cc:  r...@confluent.io.invalid
> Subject: Re: [ANNOUNCE] New committer: Yash Mayya
>
> Congratulations, Yash!
>
> On Thu 21. Sep 2023 at 21.57, Randall Hauch  wrote:
>
> > Congratulations, Yash!
> >
> > On Thu, Sep 21, 2023 at 12:31 PM Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> >
> > > Congratulations Yash!!
> > >
> > > On Thu, 21 Sept 2023 at 22:57, Viktor Somogyi-Vass
> > >  wrote:
> > > >
> > > > Congrats Yash!
> > > >
> > > > On Thu, Sep 21, 2023 at 7:04 PM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Congrats Yash!
> > > > >
> > > > > ———
> > > > > Josep Prat
> > > > >
> > > > > Aiven Deutschland GmbH
> > > > >
> > > > > Alexanderufer 3-7, 10117 Berlin
> > > > >
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > >
> > > > > m: +491715557497
> > > > >
> > > > > w: aiven.io
> > > > >
> > > > > e: josep.p...@aiven.io
> > > > >
> > > > > On Thu, Sep 21, 2023, 18:55 Raymond Ng 
> > > wrote:
> > > > >
> > > > > > Congrats Yash! Well-deserved!
> > > > > >
> > > > > > /Ray
> > > > > >
> > > > > > On Thu, Sep 21, 2023 at 9:40 AM Kamal Chandraprakash <
> > > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > > >
> > > > > > > Congratulations Yash!
> > > > > > >
> > > > > > > On Thu, Sep 21, 2023, 22:03 Bill Bejeck 
> > > wrote:
> > > > > > >
> > > > > > > > Congrats Yash!
> > > > > > > >
> > > > > > > > On Thu, Sep 21, 2023 at 12:26 PM Divij Vaidya <
> > > > > divijvaidy...@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congratulations Yash!
> > > > > > > > >
> > > > > > > > > Divij Vaidya
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Sep 21, 2023 at 6:18 PM Sagar <
> > > sagarmeansoc...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Congrats Yash !
> > > > > > > > > > On Thu, 21 Sep 2023 at 9:38 PM, Ashwin
> > > > > >  > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Awesome ! Congratulations Yash !!
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 21, 2023 at 9:25 PM Edoardo Comar <
> > > > > > > edoardli...@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Congratulations Yash
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 21 Sept 2023 at 16:28, Bruno Cadonna <
> > > > > > cado...@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > The PMC of Apache Kafka is pleased to announce a
> new
> > > Kafka
> > > > > > > > > committer
> > > > > > > > > > > > > Yash Mayya.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yash's major contributions are around Connect.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yash authored the following KIPs:
> > > > > > > > > > > > >
> > > > > > > > > > > > > KIP-793: Allow sink connectors to be used with
> > > > > topic-mutating
> > > > > > > > SMTs
> > > > > > > > > > > > > KIP-882: Kafka Connect REST API configuration
> > > validation
> > > > > > > timeout
> > > > > > > > > > > > > improvements
> > > > > > > > > > > > > KIP-970: Deprecate and remove Connect's redundant
> > task
> > > > > > > > > configurations
> > > > > > > > > > > > > endpoint
> > > > > > > > > > > > > KIP-980: Allow creating connectors in a stopped
> state
> > > > > > > > > > > > >
> > > > > > > > > > > > > Overall, Yash is known for insightful and friendly
> > > input to
> > > > > > > > > discussions
> > > > > > > > > > > > > and his high quality contributions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Congratulations, Yash!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Bruno (on behalf of the Apache Kafka PMC)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
>
>


Re: question

2023-09-22 Thread Luke Chen
Hi 殿杰

In short, we don't support it now.
But welcome to submit a PR to fix the gap.
You can check this ticket for more information:
https://issues.apache.org/jira/browse/KAFKA-7025

Thanks.
Luke

On Sat, Sep 23, 2023 at 2:14 AM shidian...@mxnavi.com 
wrote:

>
> hello,
>
> I‘m working inKafika development. Now,I have a question. Does Kafka
> support Android client?
>
>
>
>
> 石殿杰
> 技术中心
> 邮箱:shidian...@mxnavi.com
> 电话:18341724011
>


[ANNOUNCE] New Kafka PMC Member: Justine Olshan

2023-09-22 Thread Luke Chen
Hi, Everyone,

Justine Olshan has been a Kafka committer since Dec. 2022. She has been
very active and instrumental to the community since becoming a committer.
It's my pleasure to announce that Justine is now a member of Kafka PMC.

Congratulations Justine!

Luke
on behalf of Apache Kafka PMC


Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-23 Thread Luke Chen
Hi Satish,

I found the current KRaft implementation will have "split brain" issue when
network partition happens, which will cause inconsistent metadata returned
from the controller.
Filed KAFKA-15489  for
this issue, and PR  is ready
for review.

Even though this is not a regression issue (this has already existed since
the 1st release of KRaft feature), I think this is an important issue since
KRaft is announced production ready.
Not sure what other people's thoughts are.

Thank you.
Luke

On Thu, Sep 21, 2023 at 6:33 PM Josep Prat 
wrote:

> Hi Satish,
>
> I ran the following validation steps:
> - Built from source with Java 11 and Scala 2.13
> - Verified Signatures and hashes of the artifacts generated
> - Navigated through Javadoc including links to JDK classes
> - Run the unit tests
> - Run integration tests
> - Run the quickstart in KRaft and Zookeeper mode
>
>
> I +1 this release (non-binding)
>
> Thanks for your efforts!
>
> On Thu, Sep 21, 2023 at 2:59 AM Satish Duggana 
> wrote:
>
> > Thanks Greg for verifying the release including the earlier
> > blocker(KAFKA-15473) verification.
> >
> > ~Satish.
> >
> > On Wed, 20 Sept 2023 at 22:30, Greg Harris  >
> > wrote:
> >
> > > Hi all,
> > >
> > > I verified the functionality of KIP-898 and the recent fix for
> > > KAFKA-15473 with the following steps:
> > >
> > > 1. I started a 3.5.1 broker, and a 3.5.1 worker with most (>400)
> > > publicly available plugins installed
> > > 2. I captured the output of /connector-plugins
> > > 3. I upgraded the worker to 3.6.0-rc1
> > > 4. I captured the output of /connector-plugins with various settings
> > > of plugin.discovery
> > > 5. I ran the migration script to add manifests to my plugins
> > > 6. I captured the output of /connector-plugins with various settings
> > > of plugin.discovery
> > > 7. I downgraded the worker to 3.5.1
> > > 8. I diffed the output of /connector-plugins across the different
> > > cases and observed the expected changes.
> > > a. When plugins are migrated for 3.6.0, all modes produce identical
> > > results.
> > > b. When plugins are not migrated for 3.6.0, only_scan and
> > > hybrid_warn produce identical results, hybrid_fail crashes, and
> > > service_load is missing plugins
> > > c. When upgrading from 3.5.1 I see that plugins with invalid
> > > constructors are hidden, AK plugins now have versions, multi-interface
> > > plugins now show each interface type, and plugins using AppInfoParser
> > > change versions.
> > > d. The startup logs now include descriptive errors for invalid
> > > plugins that otherwise would have been thrown at runtime
> > > d. The fix for KAFKA-15473 prevents duplicates
> > > e. The output for 3.5.1 after downgrading is identical to before.
> > >
> > > +1 (non-binding)
> > >
> > > Thanks Satish for running the release!
> > >
> > > On Wed, Sep 20, 2023 at 8:36 AM Divij Vaidya 
> wrote:
> > > >
> > > > Hey Satish
> > > >
> > > > My comments about documentation misses from RC0 vote thread [1] are
> > > > still not addressed (such as missing metric documentation, formatting
> > > > problems etc). Could you please mention why we shouldn't consider
> them
> > > > as blockers to make RC1 as the final release?
> > > >
> > > > [1] https://lists.apache.org/thread/cokoxzd0jtgjtrlxoq7kkzmvpm75381t
> > > >
> > > > On Wed, Sep 20, 2023 at 4:53 PM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the second candidate for the release of Apache Kafka 3.6.0.
> > > Some of the major features include:
> > > > >
> > > > > * KIP-405 : Kafka Tiered Storage
> > > > > * KIP-868 : KRaft Metadata Transactions
> > > > > * KIP-875: First-class offsets support in Kafka Connect
> > > > > * KIP-898: Modernize Connect plugin discovery
> > > > > * KIP-938: Add more metrics for measuring KRaft performance
> > > > > * KIP-902: Upgrade Zookeeper to 3.8.1
> > > > > * KIP-917: Additional custom metadata for remote log segment
> > > > >
> > > > > Release notes for the 3.6.0 release:
> > > > >
> https://home.apache.org/~satishd/kafka-3.6.0-rc1/RELEASE_NOTES.html
> > > > >
> > > > > *** Please download, test and vote by Saturday, September 23, 8am
> PT
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > https://kafka.apache.org/KEYS
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > https://home.apache.org/~satishd/kafka-3.6.0-rc1/
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > > >
> > > > > * Javadoc:
> > > > > https://home.apache.org/~satishd/kafka-3.6.0-rc1/javadoc/
> > > > >
> > > > > * Tag to be voted upon (off 3.6 branch) is the 3.6.0 tag:
> > > > > https://github.com/apache/kafka/releases/

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-23 Thread Luke Chen
Hi Satish,

I verified with:
1. Ran quick start in KRaft for scala 2.12 artifact
2. Making sure the checksum are correct
3. Browsing release notes, documents, javadocs, protocols.

I filed KAFKA-15491 <https://issues.apache.org/jira/browse/KAFKA-15491>for
log output improvement while testing stream application.
It won't be blocker in v3.6.0.

For KAFKA-15489 <https://issues.apache.org/jira/browse/KAFKA-15489>, I'm
fine if we decide to fix it in v3.6.1/v3.7.0.

+1 (binding) from me.

Thank you.
Luke

On Sun, Sep 24, 2023 at 3:38 AM Ismael Juma  wrote:

> Given that this is not a regression and there have been no reports for over
> a year, I think it's ok for this to land in 3.6.1.
>
> Ismael
>
> On Sat, Sep 23, 2023 at 9:32 AM Satish Duggana 
> wrote:
>
> > Thanks Luke for reporting KRaft issue[1].
> >
> > I am not sure whether it is a release blocker for 3.6.0. Need input
> > from other KRaft experts also to finalize the decision. Even if we
> > adopt a fix, do not we need to bake it for some time before it is
> > pushed to production to avoid any regressions as this change is in the
> > critical paths?
> >
> > 1. https://issues.apache.org/jira/browse/KAFKA-15489
> >
> > Thanks,
> > Satish.
> >
> > On Sat, 23 Sept 2023 at 03:08, Luke Chen  wrote:
> > >
> > > Hi Satish,
> > >
> > > I found the current KRaft implementation will have "split brain" issue
> > when
> > > network partition happens, which will cause inconsistent metadata
> > returned
> > > from the controller.
> > > Filed KAFKA-15489 <https://issues.apache.org/jira/browse/KAFKA-15489>
> > for
> > > this issue, and PR <https://github.com/apache/kafka/pull/14428> is
> ready
> > > for review.
> > >
> > > Even though this is not a regression issue (this has already existed
> > since
> > > the 1st release of KRaft feature), I think this is an important issue
> > since
> > > KRaft is announced production ready.
> > > Not sure what other people's thoughts are.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Thu, Sep 21, 2023 at 6:33 PM Josep Prat  >
> > > wrote:
> > >
> > > > Hi Satish,
> > > >
> > > > I ran the following validation steps:
> > > > - Built from source with Java 11 and Scala 2.13
> > > > - Verified Signatures and hashes of the artifacts generated
> > > > - Navigated through Javadoc including links to JDK classes
> > > > - Run the unit tests
> > > > - Run integration tests
> > > > - Run the quickstart in KRaft and Zookeeper mode
> > > >
> > > >
> > > > I +1 this release (non-binding)
> > > >
> > > > Thanks for your efforts!
> > > >
> > > > On Thu, Sep 21, 2023 at 2:59 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks Greg for verifying the release including the earlier
> > > > > blocker(KAFKA-15473) verification.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > > On Wed, 20 Sept 2023 at 22:30, Greg Harris
> >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I verified the functionality of KIP-898 and the recent fix for
> > > > > > KAFKA-15473 with the following steps:
> > > > > >
> > > > > > 1. I started a 3.5.1 broker, and a 3.5.1 worker with most (>400)
> > > > > > publicly available plugins installed
> > > > > > 2. I captured the output of /connector-plugins
> > > > > > 3. I upgraded the worker to 3.6.0-rc1
> > > > > > 4. I captured the output of /connector-plugins with various
> > settings
> > > > > > of plugin.discovery
> > > > > > 5. I ran the migration script to add manifests to my plugins
> > > > > > 6. I captured the output of /connector-plugins with various
> > settings
> > > > > > of plugin.discovery
> > > > > > 7. I downgraded the worker to 3.5.1
> > > > > > 8. I diffed the output of /connector-plugins across the different
> > > > > > cases and observed the expected changes.
> > > > > > a. When plugins are migrated for 3.6.0, all modes produce
> > identical
> > >

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-25 Thread Luke Chen
Hi Satish,

Snappy-java published a new vulnerability
<https://github.com/xerial/snappy-java/security/advisories/GHSA-55g7-9cwv-5qfv>
that will cause OOM error in the server.
Kafka is also impacted by this vulnerability since it's like CVE-2023-34455
<https://nvd.nist.gov/vuln/detail/CVE-2023-34455>.
We'd better bump the snappy-java version to bypass this vulnerability.
PR <https://github.com/apache/kafka/pull/14434> is created to run the CI
build.

Thanks.
Luke


On Mon, Sep 25, 2023 at 2:38 PM Satish Duggana 
wrote:

> Thanks to everyone who voted for this release.
>
> We have 2 +1 PMC votes and 3 +1 non-binding votes. We are past the
> deadline. Please try RC1 and send your vote to this email thread.
>
> Thanks,
> Satish.
>
>
> On Sun, 24 Sept 2023 at 13:23, Justine Olshan
>  wrote:
> >
> > Hi Satish,
> >
> > I've done the following:
> > - Verified signature
> > - Built from Java 17/Scala 2.13 and Java 8/Scala 2.11
> > - Run unit + integration tests
> > - Ran a shorter Trogdor transactional-produce-bench on a single broker
> > cluster (KRaft and ZK) to verify transactional workloads worked
> reasonably
> >
> > Minor thing (we can discuss elsewhere and is non-blocking for the
> release)
> > but if ZK has been deprecated since 3.5 we should move up the Kraft setup
> > in the quickstart guide  <http://goog_2103708782>here
> > <https://kafka.apache.org/quickstart>.
> >
> > +1 (binding) from me.
> >
> > Justine
> >
> > On Sun, Sep 24, 2023 at 7:09 AM Federico Valeri 
> > wrote:
> >
> > > Hi Satish, I did the following to verify the release:
> > >
> > > - Verified signature and checksum
> > > - Built from source with Java 17 and Scala 2.13
> > > - Ran all unit and integration tests
> > > - Spot checked release notes and documentation
> > > - Ran a custom client using staging artifacts on a 3-nodes cluster
> > > - Tested tiered storage with one of the available RSM implementations
> > >
> > > +1 (non binding)
> > >
> > > Thanks
> > > Fede
> > >
> > >
> > > On Sun, Sep 24, 2023 at 8:49 AM Luke Chen  wrote:
> > > >
> > > > Hi Satish,
> > > >
> > > > I verified with:
> > > > 1. Ran quick start in KRaft for scala 2.12 artifact
> > > > 2. Making sure the checksum are correct
> > > > 3. Browsing release notes, documents, javadocs, protocols.
> > > >
> > > > I filed KAFKA-15491 <
> https://issues.apache.org/jira/browse/KAFKA-15491
> > > >for
> > > > log output improvement while testing stream application.
> > > > It won't be blocker in v3.6.0.
> > > >
> > > > For KAFKA-15489 <https://issues.apache.org/jira/browse/KAFKA-15489>,
> I'm
> > > > fine if we decide to fix it in v3.6.1/v3.7.0.
> > > >
> > > > +1 (binding) from me.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Sun, Sep 24, 2023 at 3:38 AM Ismael Juma 
> wrote:
> > > >
> > > > > Given that this is not a regression and there have been no reports
> for
> > > over
> > > > > a year, I think it's ok for this to land in 3.6.1.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Sat, Sep 23, 2023 at 9:32 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Luke for reporting KRaft issue[1].
> > > > > >
> > > > > > I am not sure whether it is a release blocker for 3.6.0. Need
> input
> > > > > > from other KRaft experts also to finalize the decision. Even if
> we
> > > > > > adopt a fix, do not we need to bake it for some time before it is
> > > > > > pushed to production to avoid any regressions as this change is
> in
> > > the
> > > > > > critical paths?
> > > > > >
> > > > > > 1. https://issues.apache.org/jira/browse/KAFKA-15489
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Sat, 23 Sept 2023 at 03:08, Luke Chen 
> wrote:
> > > > > > >
> > > > > > > Hi Satish,
> > > > > > >
> > > > > > > I found the current KRaft implementat

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-25 Thread Luke Chen
Hi Divij,

About the system tests, it's me helping Satish working on it since our team
has internal jenkins pipeline for it.
Here's the result:
https://drive.google.com/drive/folders/1S2XYd79f6_AeWj9f9qEkliRg7JtL04AC?usp=sharing

I'm mainly focusing on the failed tests.
For kraft_upgrade_test, I've fixed in this PR
<https://github.com/apache/kafka/pull/14424>. After the fix, the rerun
results look good.
For quota_test, it's a known issue in our environment. After rerun, all
passed.

Thanks.
Luke



On Mon, Sep 25, 2023 at 4:09 PM Divij Vaidya 
wrote:

> Correction: posted the wrong JIRA in my previous email. Instead of
> https://issues.apache.org/jira/browse/KAFKA-15001, please consider
> this https://issues.apache.org/jira/browse/KAFKA-15487
>
> --
> Divij Vaidya
>
> On Mon, Sep 25, 2023 at 10:04 AM Divij Vaidya 
> wrote:
> >
> > Hi Satish
> >
> > 1. I agree with Luke. It's a "high" severity vulnerability and we
> > should create another RC with the upgraded Snappy version. If we
> > create another RC, we should also fix a different CVE resported in
> > https://issues.apache.org/jira/browse/KAFKA-15001
> >
> > 2. I was hoping you could post the results of system tests before I
> > vote on this. I am particularly interested in looking at
> > producer/consumer performance results since we have quite a few
> > changes in this release. What is the plan on the system tests?
> >
> > --
> > Divij Vaidya
> >
> > On Mon, Sep 25, 2023 at 9:10 AM Luke Chen  wrote:
> > >
> > > Hi Satish,
> > >
> > > Snappy-java published a new vulnerability
> > > <
> https://github.com/xerial/snappy-java/security/advisories/GHSA-55g7-9cwv-5qfv
> >
> > > that will cause OOM error in the server.
> > > Kafka is also impacted by this vulnerability since it's like
> CVE-2023-34455
> > > <https://nvd.nist.gov/vuln/detail/CVE-2023-34455>.
> > > We'd better bump the snappy-java version to bypass this vulnerability.
> > > PR <https://github.com/apache/kafka/pull/14434> is created to run the
> CI
> > > build.
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > On Mon, Sep 25, 2023 at 2:38 PM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Thanks to everyone who voted for this release.
> > > >
> > > > We have 2 +1 PMC votes and 3 +1 non-binding votes. We are past the
> > > > deadline. Please try RC1 and send your vote to this email thread.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > >
> > > > On Sun, 24 Sept 2023 at 13:23, Justine Olshan
> > > >  wrote:
> > > > >
> > > > > Hi Satish,
> > > > >
> > > > > I've done the following:
> > > > > - Verified signature
> > > > > - Built from Java 17/Scala 2.13 and Java 8/Scala 2.11
> > > > > - Run unit + integration tests
> > > > > - Ran a shorter Trogdor transactional-produce-bench on a single
> broker
> > > > > cluster (KRaft and ZK) to verify transactional workloads worked
> > > > reasonably
> > > > >
> > > > > Minor thing (we can discuss elsewhere and is non-blocking for the
> > > > release)
> > > > > but if ZK has been deprecated since 3.5 we should move up the
> Kraft setup
> > > > > in the quickstart guide  <http://goog_2103708782>here
> > > > > <https://kafka.apache.org/quickstart>.
> > > > >
> > > > > +1 (binding) from me.
> > > > >
> > > > > Justine
> > > > >
> > > > > On Sun, Sep 24, 2023 at 7:09 AM Federico Valeri <
> fedeval...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Satish, I did the following to verify the release:
> > > > > >
> > > > > > - Verified signature and checksum
> > > > > > - Built from source with Java 17 and Scala 2.13
> > > > > > - Ran all unit and integration tests
> > > > > > - Spot checked release notes and documentation
> > > > > > - Ran a custom client using staging artifacts on a 3-nodes
> cluster
> > > > > > - Tested tiered storage with one of the available RSM
> implementations
> > > > > >
> > > > > > +1 (non binding)
> > > > >

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-25 Thread Luke Chen
Hi Jose,

Sounds good to me.
Let's have further discussion in JIRA/PR, and target to v3.6.1/v3.7.0.

Thanks.
Luke

On Tue, Sep 26, 2023 at 1:35 AM José Armando García Sancio
 wrote:

> On Sat, Sep 23, 2023 at 3:08 AM Luke Chen  wrote:
> >
> > Hi Satish,
> >
> > I found the current KRaft implementation will have "split brain" issue
> when
> > network partition happens, which will cause inconsistent metadata
> returned
> > from the controller.
> > Filed KAFKA-15489 <https://issues.apache.org/jira/browse/KAFKA-15489>
> for
> > this issue, and PR <https://github.com/apache/kafka/pull/14428> is ready
> > for review.
> >
> > Even though this is not a regression issue (this has already existed
> since
> > the 1st release of KRaft feature), I think this is an important issue
> since
> > KRaft is announced production ready.
> > Not sure what other people's thoughts are.
>
> Thanks for the report and PR Luke. This looks related to this issue:
> https://issues.apache.org/jira/browse/KAFKA-13621
>
> Do you agree? We can move our conversation to those issues but I also
> agree that I don't think this issue should be a release blocker.
>
> Thanks!
> -José
>


Re: [VOTE] 3.6.0 RC2

2023-10-01 Thread Luke Chen
Hi Satish,

I verified with:
1. Ran quick start in KRaft for scala 2.12 artifact
2. Making sure the checksum are correct
3. Browsing release notes, documents, javadocs, protocols.
4. Verified the tiered storage feature works well.

+1 (binding).

Thanks.
Luke



On Mon, Oct 2, 2023 at 5:23 AM Jakub Scholz  wrote:

> +1 (non-binding). I used the Scala 2.13 binaries and the staged Maven
> artifacts and run my tests. Everything seems to work fine for me.
>
> Thanks
> Jakub
>
> On Fri, Sep 29, 2023 at 8:17 PM Satish Duggana 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for the release of Apache Kafka 3.6.0.
> > Some of the major features include:
> >
> > * KIP-405 : Kafka Tiered Storage
> > * KIP-868 : KRaft Metadata Transactions
> > * KIP-875: First-class offsets support in Kafka Connect
> > * KIP-898: Modernize Connect plugin discovery
> > * KIP-938: Add more metrics for measuring KRaft performance
> > * KIP-902: Upgrade Zookeeper to 3.8.1
> > * KIP-917: Additional custom metadata for remote log segment
> >
> > Release notes for the 3.6.0 release:
> > https://home.apache.org/~satishd/kafka-3.6.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Tuesday, October 3, 12pm PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~satishd/kafka-3.6.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~satishd/kafka-3.6.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 3.6 branch) is the 3.6.0-rc2 tag:
> > https://github.com/apache/kafka/releases/tag/3.6.0-rc2
> >
> > * Documentation:
> > https://kafka.apache.org/36/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/36/protocol.html
> >
> > * Successful Jenkins builds for the 3.6 branch:
> > There are a few runs of unit/integration tests. You can see the latest
> > at https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/. We will
> > continue running a few more iterations.
> > System tests:
> > We will send an update once we have the results.
> >
> > Thanks,
> > Satish.
> >
>


Re: [VOTE] 3.6.0 RC2

2023-10-03 Thread Luke Chen
t; > > 1. Metrics added in
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e
> > > > > > aren't available in the documentation (cc: Justine). I don't
> > consider
> > > > > this
> > > > > > as a release blocker but we should add it as a fast follow-up.
> > > > > >
> > > > > > 2. Metric added in
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/kafka/commit/a900794ace4dcf1f9dadee27fbd8b63979532a18
> > > > > > isn't available in documentation (cc: David). I don't consider
> > this as
> > > > a
> > > > > > release blocker but we should add it as a fast follow-up.
> > > > > >
> > > > > > --
> > > > > > Divij Vaidya
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 2, 2023 at 9:50 AM Federico Valeri <
> > fedeval...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Satish, I did the following to verify the release:
> > > > > > >
> > > > > > > - Built from source with Java 17 and Scala 2.13
> > > > > > > - Ran all unit and integration tests
> > > > > > > - Spot checked documentation
> > > > > > > - Ran custom client applications using staging artifacts on a
> > 3-nodes
> > > > > > > cluster
> > > > > > > - Tested tiered storage with one of the available RSM
> > implementations
> > > > > > >
> > > > > > > +1 (non binding)
> > > > > > >
> > > > > > > Thanks
> > > > > > > Fede
> > > > > > >
> > > > > > > On Mon, Oct 2, 2023 at 8:50 AM Luke Chen 
> > wrote:
> > > > > > > >
> > > > > > > > Hi Satish,
> > > > > > > >
> > > > > > > > I verified with:
> > > > > > > > 1. Ran quick start in KRaft for scala 2.12 artifact
> > > > > > > > 2. Making sure the checksum are correct
> > > > > > > > 3. Browsing release notes, documents, javadocs, protocols.
> > > > > > > > 4. Verified the tiered storage feature works well.
> > > > > > > >
> > > > > > > > +1 (binding).
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Oct 2, 2023 at 5:23 AM Jakub Scholz  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding). I used the Scala 2.13 binaries and the
> > staged
> > > > > Maven
> > > > > > > > > artifacts and run my tests. Everything seems to work fine
> > for me.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Jakub
> > > > > > > > >
> > > > > > > > > On Fri, Sep 29, 2023 at 8:17 PM Satish Duggana <
> > > > > > > satish.dugg...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello Kafka users, developers and client-developers,
> > > > > > > > > >
> > > > > > > > > > This is the third candidate for the release of Apache
> Kafka
> > > > > 3.6.0.
> > > > > > > > > > Some of the major features include:
> > > > > > > > > >
> > > > > > > > > > * KIP-405 : Kafka Tiered Storage
> > > > > > > > > > * KIP-868 : KRaft Metadata Transactions
> > > > > > > > > > * KIP-875: First-class offsets support in Kafka Connect
> > > > > > > > > > * KIP-898: Modernize Connect plugin discovery
> > > > > > > > > > * KIP-938: Add more metrics for measuring KRaft
> performance
> > > > > > > > > > * KIP-902: Upgrade Zookeeper to 3.8.1
> > > > > > > > > > * KIP-917: Additional custom metadata for remote log
> > segment
> > > > > > > > > >
> > > > > > > > > > Release notes for the 3.6.0 release:
> > > > > > > > > >
> > > > > >
> > https://home.apache.org/~satishd/kafka-3.6.0-rc2/RELEASE_NOTES.html
> > > > > > > > > >
> > > > > > > > > > *** Please download, test and vote by Tuesday, October 3,
> > 12pm
> > > > PT
> > > > > > > > > >
> > > > > > > > > > Kafka's KEYS file containing PGP keys we use to sign the
> > > > release:
> > > > > > > > > > https://kafka.apache.org/KEYS
> > > > > > > > > >
> > > > > > > > > > * Release artifacts to be voted upon (source and binary):
> > > > > > > > > > https://home.apache.org/~satishd/kafka-3.6.0-rc2/
> > > > > > > > > >
> > > > > > > > > > * Maven artifacts to be voted upon:
> > > > > > > > > >
> > > > > > >
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > > > > > > > >
> > > > > > > > > > * Javadoc:
> > > > > > > > > >
> https://home.apache.org/~satishd/kafka-3.6.0-rc2/javadoc/
> > > > > > > > > >
> > > > > > > > > > * Tag to be voted upon (off 3.6 branch) is the 3.6.0-rc2
> > tag:
> > > > > > > > > > https://github.com/apache/kafka/releases/tag/3.6.0-rc2
> > > > > > > > > >
> > > > > > > > > > * Documentation:
> > > > > > > > > > https://kafka.apache.org/36/documentation.html
> > > > > > > > > >
> > > > > > > > > > * Protocol:
> > > > > > > > > > https://kafka.apache.org/36/protocol.html
> > > > > > > > > >
> > > > > > > > > > * Successful Jenkins builds for the 3.6 branch:
> > > > > > > > > > There are a few runs of unit/integration tests. You can
> > see the
> > > > > > > latest
> > > > > > > > > > at
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/.
> > > > We
> > > > > > > will
> > > > > > > > > > continue running a few more iterations.
> > > > > > > > > > System tests:
> > > > > > > > > > We will send an update once we have the results.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Satish.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


Re: Apache Kafka 3.7.0 Release

2023-10-09 Thread Luke Chen
Thanks Stanislav!

On Tue, Oct 10, 2023 at 3:05 AM Josep Prat 
wrote:

> Thanks Stanislav!
>
> ———
> Josep Prat
>
> Aiven Deutschland GmbH
>
> Alexanderufer 3-7, 10117 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> m: +491715557497
>
> w: aiven.io
>
> e: josep.p...@aiven.io
>
> On Mon, Oct 9, 2023, 20:05 Chris Egerton  wrote:
>
> > +1, thanks Stanislav!
> >
> > On Mon, Oct 9, 2023, 14:02 Bill Bejeck  wrote:
> >
> > > +1
> > >
> > > Thanks, Stanislav!
> > >
> > > -Bill
> > >
> > > On Mon, Oct 9, 2023 at 1:59 PM Ismael Juma  wrote:
> > >
> > > > Thanks for volunteering Stanislav!
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Oct 9, 2023 at 10:51 AM Stanislav Kozlovski
> > > >  wrote:
> > > >
> > > > > Hey all!
> > > > >
> > > > > I would like to volunteer to be the release manager driving the
> next
> > > > > release - Apache Kafka *3.7.0*.
> > > > >
> > > > > If there are no objections, I will start and share a release plan
> > soon
> > > > > enough!
> > > > >
> > > > > Cheers,
> > > > > Stanislav
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Apache Kafka 3.6.0

2023-10-10 Thread Luke Chen
Thanks for running the release, Satish!

BTW, 3.6.0 should be a major release, not a minor one. :)

Luke

On Wed, Oct 11, 2023 at 1:39 PM Satish Duggana  wrote:

> The Apache Kafka community is pleased to announce the release for
> Apache Kafka 3.6.0
>
> This is a minor release and it includes fixes and improvements from 238
> JIRAs.
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/3.6.0/RELEASE_NOTES.html
>
> An overview of the release can be found in our announcement blog post:
> https://kafka.apache.org/blog
>
> You can download the source and binary release (Scala 2.12 and Scala 2.13)
> from:
> https://kafka.apache.org/downloads#3.6.0
>
>
> ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream of records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you for the following 139 contributors to this release!
> (Please report an unintended omission)
>
> This was a community effort, so thank you to everyone who contributed
> to this release, including all our users and our 139 contributors:
> A. Sophie Blee-Goldman, Aaron Ai, Abhijeet Kumar, aindriu-aiven,
> Akhilesh Chaganti, Alexandre Dupriez, Alexandre Garnier, Alok
> Thatikunta, Alyssa Huang, Aman Singh, Andras Katona, Andrew Schofield,
> Andrew Grant, Aneel Kumar, Anton Agestam, Artem Livshits, atu-sharm,
> bachmanity1, Bill Bejeck, Bo Gao, Bruno Cadonna, Calvin Liu, Chaitanya
> Mukka, Chase Thomas, Cheryl Simmons, Chia-Ping Tsai, Chris Egerton,
> Christo Lolov, Clay Johnson, Colin P. McCabe, Colt McNealy, d00791190,
> Damon Xie, Danica Fine, Daniel Scanteianu, Daniel Urban, David Arthur,
> David Jacot, David Mao, dengziming, Deqi Hu, Dimitar Dimitrov, Divij
> Vaidya, DL1231, Dániel Urbán, Erik van Oosten, ezio, Farooq Qaiser,
> Federico Valeri, flashmouse, Florin Akermann, Gabriel Oliveira,
> Gantigmaa Selenge, Gaurav Narula, GeunJae Jeon, Greg Harris, Guozhang
> Wang, Hailey Ni, Hao Li, Hector Geraldino, hudeqi, hzh0425, Iblis Lin,
> iit2009060, Ismael Juma, Ivan Yurchenko, James Shaw, Jason Gustafson,
> Jeff Kim, Jim Galasyn, John Roesler, Joobi S B, Jorge Esteban Quilcate
> Otoya, Josep Prat, Joseph (Ting-Chou) Lin, José Armando García Sancio,
> Jun Rao, Justine Olshan, Kamal Chandraprakash, Keith Wall, Kirk True,
> Lianet Magrans, LinShunKang, Liu Zeyu, lixy, Lucas Bradstreet, Lucas
> Brutschy, Lucent-Wong, Lucia Cerchie, Luke Chen, Manikumar Reddy,
> Manyanda Chitimbo, Maros Orsak, Matthew de Detrich, Matthias J. Sax,
> maulin-vasavada, Max Riedel, Mehari Beyene, Michal Cabak (@miccab),
> Mickael Maison, Milind Mantri, minjian.cai, mojh7, Nikolay, Okada
> Haruki, Omnia G H Ibrahim, Owen Leung, Philip Nee, prasanthV, Proven
> Provenzano, Purshotam Chauhan, Qichao Chu, Rajini Sivaram, Randall
> Hauch, Renaldo Baur Filho, Ritika Reddy, Rittika Adhikari, Rohan, Ron
> Dagostino, Sagar Rao, Said Boudjelda, Sambhav Jain, Satish Duggana,
> sciclon2, Shekhar Rajak, Sungyun Hur, Sushant Mahajan, Tanay
> Karmarkar, tison, Tom Bentley, vamossagar12, Victoria Xia, Vincent
> Jiang, vveicc, Walker Carlson, Yash Mayya, Yi-Sheng Lien, Ziming Deng,
> 蓝士钦
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
>
> Thank you!
>
> Regards,
> Satish Duggana
>


[DISCUSS] Road to Kafka 4.0

2023-10-11 Thread Luke Chen
Hi all,

While Kafka 3.6.0 is released, I’d like to start the discussion for the
“road to Kafka 4.0”. Based on the plan in KIP-833
,
the next release 3.7 will be the final release before moving to Kafka 4.0
to remove the Zookeeper from Kafka. Before making this major change, I'd
like to get consensus on the "must-have features/fixes for Kafka 4.0", to
avoid some users being surprised when upgrading to Kafka 4.0. The intent is
to have a clear communication about what to expect in the following months.
In particular we should be signaling what features and configurations are
not supported, or at risk (if no one is able to add support or fix known
bugs).

Here is the JIRA tickets list
 I
labeled for "4.0-blocker". The criteria I labeled as “4.0-blocker” are:
1. The feature is supported in Zookeeper Mode, but not supported in KRaft
mode, yet (ex: KIP-858: JBOD in KRaft)
2. Critical bugs in KRaft, (ex: KAFKA-15489 : split brain in KRaft
controller quorum)

If you disagree with my current list, welcome to have discussion in the
specific JIRA ticket. Or, if you think there are some tickets I missed,
welcome to start a discussion in the JIRA ticket and ping me or other
people. After we get the consensus, we can label/unlabel it afterwards.
Again, the goal is to have an open communication with the community about
what will be coming in 4.0.

Below is the high level category of the list content:

1. Recovery from disk failure
KIP-856
:
KRaft Disk Failure Recovery

2. Prevote to support controllers more than 3
KIP-650
:
Enhance Kafkaesque Raft semantics

3. JBOD support
KIP-858
:
Handle
JBOD broker disk failure in KRaft

4. Scale up/down Controllers
KIP-853
:
KRaft Controller Membership Changes

5. Modifying dynamic configurations on the KRaft controller

6. Critical bugs in KRaft

Does this make sense?
Any feedback is welcomed.

Thank you.
Luke


Re: [DISCUSS] 3.5.2 Release

2023-10-12 Thread Luke Chen
Hi Levani and Divij,

I can work on the 3.5.2 release.
I'll start a new thread for volunteering it maybe next week.

Thanks.
Luke

On Thu, Oct 12, 2023 at 5:07 PM Divij Vaidya 
wrote:

> Hello Levani
>
> From a process perspective, there is no fixed schedule for bug fix
> releases. If we have a volunteer for release manager (must be a committer),
> they can start with the process of bug fix release (with the approval of
> PMC).
>
> My personal opinion is that it's too early to start 3.6.1 and we should
> wait at least 1 months to hear feedback on 3.6.0. We need to make a careful
> balance between getting the critical fixes in the hands of users as soon
> as possible vs. spending community effort towards releases (the effort that
> could be used to make Kafka better, feature-wise & operational
> stability-wise, otherwise).
>
> For 3.5.2, I think there are sufficient pending (including some CVE fixes)
> to start a bug fix release. We just need a volunteer for the release
> manager.
>
> --
> Divij Vaidya
>
>
>
> On Thu, Oct 12, 2023 at 9:57 AM Levani Kokhreidze 
> wrote:
>
> > Hello,
> >
> > KAFKA-15571 [1] was merged and backported to the 3.5 and 3.6 branches.
> Bug
> > fixes the feature that was added in 3.5. Considering the feature doesn't
> > work as expected without a fix, I would like to know if it's reasonable
> to
> > start the 3.5.2 release. Of course, releasing such a massive project like
> > Kafka is not a trivial task, and I am looking for the community's input
> on
> > this if it's reasonable to start the 3.5.2 release process.
> >
> > Best,
> > Levani
> >
> > [1] - https://issues.apache.org/jira/browse/KAFKA-15571
>


[DISCUSS] Apache Kafka 3.5.2 release

2023-10-16 Thread Luke Chen
Hi all,

I'd like to volunteer as release manager for the Apache Kafka 3.5.2, to
have an important bug/vulnerability fix release for 3.5.1.

If there are no objections, I'll start building a release plan in thewiki
in the next couple of weeks.

Thanks,
Luke


Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-10-20 Thread Luke Chen
Hi Matthias,

I'm planning to have the 1st RC next week.
Does that work for you?
Should I defer one more week?

Thanks.
Luke

On Wed, Oct 18, 2023 at 1:52 AM Matthias J. Sax  wrote:

> Thanks -- there is a few fixed for Kafka Streams we are considering to
> cherry-pick to get into 3.5.2 release -- what timeline do you target for
> the release?
>
>
> -Matthias
>
> On 10/17/23 8:47 AM, Divij Vaidya wrote:
> > Thank you for volunteering Luke.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Tue, Oct 17, 2023 at 3:26 PM Bill Bejeck  wrote:
> >
> >> Thanks for driving the release, Luke.
> >>
> >> +1
> >> -Bill
> >>
> >> On Tue, Oct 17, 2023 at 5:05 AM Satish Duggana <
> satish.dugg...@gmail.com>
> >> wrote:
> >>
> >>> Thanks Luke for volunteering for 3.5.2 release.
> >>>
> >>> On Tue, 17 Oct 2023 at 11:58, Josep Prat 
> >>> wrote:
> >>>>
> >>>> Hi Luke,
> >>>>
> >>>> Thanks for taking this one!
> >>>>
> >>>> Best,
> >>>>
> >>>> On Tue, Oct 17, 2023 at 8:12 AM Luke Chen  wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'd like to volunteer as release manager for the Apache Kafka 3.5.2,
> >> to
> >>>>> have an important bug/vulnerability fix release for 3.5.1.
> >>>>>
> >>>>> If there are no objections, I'll start building a release plan in
> >>> thewiki
> >>>>> in the next couple of weeks.
> >>>>>
> >>>>> Thanks,
> >>>>> Luke
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> [image: Aiven] <https://www.aiven.io>
> >>>>
> >>>> *Josep Prat*
> >>>> Open Source Engineering Director, *Aiven*
> >>>> josep.p...@aiven.io   |   +491715557497
> >>>> aiven.io <https://www.aiven.io>   |   <
> >>> https://www.facebook.com/aivencloud>
> >>>><https://www.linkedin.com/company/aiven/>   <
> >>> https://twitter.com/aiven_io>
> >>>> *Aiven Deutschland GmbH*
> >>>> Alexanderufer 3-7, 10117 Berlin
> >>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>
> >>
> >
>


Re: Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-10-20 Thread Luke Chen
Hi Ryan,

OK, I've backported it to 3.5 branch.
I'll be included in v3.5.2.

Thanks.
Luke

On Fri, Oct 20, 2023 at 7:43 AM Ryan Leslie (BLP/ NEW YORK (REMOT) <
rles...@bloomberg.net> wrote:

> Hi Luke,
>
> Hope you are well. Can you please include
> https://issues.apache.org/jira/browse/KAFKA-15106 in 3.5.2?
>
> Thanks,
>
> Ryan
>
> From: dev@kafka.apache.org At: 10/17/23 05:05:24 UTC-4:00
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] Apache Kafka 3.5.2 release
>
> Thanks Luke for volunteering for 3.5.2 release.
>
> On Tue, 17 Oct 2023 at 11:58, Josep Prat 
> wrote:
> >
> > Hi Luke,
> >
> > Thanks for taking this one!
> >
> > Best,
> >
> > On Tue, Oct 17, 2023 at 8:12 AM Luke Chen  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to volunteer as release manager for the Apache Kafka 3.5.2, to
> > > have an important bug/vulnerability fix release for 3.5.1.
> > >
> > > If there are no objections, I'll start building a release plan in
> thewiki
> > > in the next couple of weeks.
> > >
> > > Thanks,
> > > Luke
> > >
> >
> >
> > --
> > [image: Aiven] <https://www.aiven.io>
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io | +491715557497
> > aiven.io <https://www.aiven.io> | <https://www.facebook.com/aivencloud>
> > <https://www.linkedin.com/company/aiven/> <https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
>
>
>


Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-10-20 Thread Luke Chen
Hi Matthias,

I agree it's indeed a blocker for 3.5.2 to address CVE in RocksDB.
Please let me know when the test is completed.

Thank you.
Luke

On Sat, Oct 21, 2023 at 2:12 AM Matthias J. Sax  wrote:

> Thanks for the info Luke.
>
> We did backport all but one PR in the mean time. The missing PR is a
> RocksDB version bump. We want to consider it for 3.5.2, because it
> addresses a CVE.
>
> Cf https://github.com/apache/kafka/pull/14216
>
> However, RocksDB versions bumps are a little bit more tricky, and we
> would like to test this properly on 3.5 branch, what would take at least
> one week; we could do the cherry-pick on Monday and start testing.
>
> Please let us know if such a delay for 3.5.2 is acceptable or not.
>
> Thanks.
>
> -Matthias
>
>
> On 10/20/23 5:44 AM, Luke Chen wrote:
> > Hi Ryan,
> >
> > OK, I've backported it to 3.5 branch.
> > I'll be included in v3.5.2.
> >
> > Thanks.
> > Luke
> >
> > On Fri, Oct 20, 2023 at 7:43 AM Ryan Leslie (BLP/ NEW YORK (REMOT) <
> > rles...@bloomberg.net> wrote:
> >
> >> Hi Luke,
> >>
> >> Hope you are well. Can you please include
> >> https://issues.apache.org/jira/browse/KAFKA-15106 in 3.5.2?
> >>
> >> Thanks,
> >>
> >> Ryan
> >>
> >> From: dev@kafka.apache.org At: 10/17/23 05:05:24 UTC-4:00
> >> To: dev@kafka.apache.org
> >> Subject: Re: [DISCUSS] Apache Kafka 3.5.2 release
> >>
> >> Thanks Luke for volunteering for 3.5.2 release.
> >>
> >> On Tue, 17 Oct 2023 at 11:58, Josep Prat 
> >> wrote:
> >>>
> >>> Hi Luke,
> >>>
> >>> Thanks for taking this one!
> >>>
> >>> Best,
> >>>
> >>> On Tue, Oct 17, 2023 at 8:12 AM Luke Chen  wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I'd like to volunteer as release manager for the Apache Kafka 3.5.2,
> to
> >>>> have an important bug/vulnerability fix release for 3.5.1.
> >>>>
> >>>> If there are no objections, I'll start building a release plan in
> >> thewiki
> >>>> in the next couple of weeks.
> >>>>
> >>>> Thanks,
> >>>> Luke
> >>>>
> >>>
> >>>
> >>> --
> >>> [image: Aiven] <https://www.aiven.io>
> >>>
> >>> *Josep Prat*
> >>> Open Source Engineering Director, *Aiven*
> >>> josep.p...@aiven.io | +491715557497
> >>> aiven.io <https://www.aiven.io> | <https://www.facebook.com/aivencloud
> >
> >>> <https://www.linkedin.com/company/aiven/> <
> https://twitter.com/aiven_io>
> >>> *Aiven Deutschland GmbH*
> >>> Alexanderufer 3-7, 10117 Berlin
> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>> Amtsgericht Charlottenburg, HRB 209739 B
> >>
> >>
> >>
> >
>


Re: UncleanLeaderElectionsPerSec metric and Raft

2023-10-23 Thread Luke Chen
Hi Justine,

Thanks for the response.
I also agree that even if after KIP-966 the unclean leader election might
be changed, we should still figure out if it's a missing feature or not,
and what's our plan for it.

Thanks.
Luke

On Mon, Oct 23, 2023 at 11:48 PM Justine Olshan
 wrote:

> Hey Neil,
>
> I was taking a look at this code, and noticed that some unclean leader
> election params were not implemented.
>
> https://github.com/apache/kafka/blob/4612fe42af0df0a4c1affaf66c55d01eb6267ce3/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L499
>
> I know you mentioned setting the non-topic config, but I wonder if the
> feature is generally not built out. I think that once KIP-966 is
> implemented, it will likely replace the old notion of unclean leader
> election.
>
> Still, if KRaft mode doesn't have unclean leader election, it should be
> documented. I will get back to you on this.
>
> Justine
>
> On Wed, Oct 18, 2023 at 10:30 AM Neil Buesing 
> wrote:
>
> > Development,
> >
> > with Raft controllers, is the unclean leader election / sec metric supose
> > to be available?
> >
> > kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
> >
> > Nothing in documentation indicates that it isn’t as well as in code
> > navigation nothing indicates to me that it wouldn’t show up, but even
> added
> > unclean leader election to true for both brokers and controllers and
> > nothing.
> >
> > (set this for all controllers and brokers)
> >   KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE: true
> >
> > Happy to report a Jira, but wanted to figure out if the bug was in the
> > documentation or the metric not being available?
> >
> > Thanks,
> >
> > Neil
> >
> > P.S. I did confirm that others have seen and wondered about this,
> > https://github.com/strimzi/strimzi-kafka-operator/issues/8169, but that
> is
> > about the only other report on this I have found.
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Satish Duggana

2023-10-28 Thread Luke Chen
Congrats Satish!

Luke

On Sat, Oct 28, 2023 at 11:16 AM ziming deng 
wrote:

> Congratulations Satish!
>
> > On Oct 27, 2023, at 23:03, Jun Rao  wrote:
> >
> > Hi, Everyone,
> >
> > Satish Duggana has been a Kafka committer since 2022. He has been very
> > instrumental to the community since becoming a committer. It's my
> pleasure
> > to announce that Satish is now a member of Kafka PMC.
> >
> > Congratulations Satish!
> >
> > Jun
> > on behalf of Apache Kafka PMC
>
>


Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-11-02 Thread Luke Chen
Hi Matthias,

Is there any update about the test status for RocksDB versions bumps?
Could I create a 3.5.2 RC build next week?

Thanks.
Luke

On Sat, Oct 21, 2023 at 1:01 PM Luke Chen  wrote:

> Hi Matthias,
>
> I agree it's indeed a blocker for 3.5.2 to address CVE in RocksDB.
> Please let me know when the test is completed.
>
> Thank you.
> Luke
>
> On Sat, Oct 21, 2023 at 2:12 AM Matthias J. Sax  wrote:
>
>> Thanks for the info Luke.
>>
>> We did backport all but one PR in the mean time. The missing PR is a
>> RocksDB version bump. We want to consider it for 3.5.2, because it
>> addresses a CVE.
>>
>> Cf https://github.com/apache/kafka/pull/14216
>>
>> However, RocksDB versions bumps are a little bit more tricky, and we
>> would like to test this properly on 3.5 branch, what would take at least
>> one week; we could do the cherry-pick on Monday and start testing.
>>
>> Please let us know if such a delay for 3.5.2 is acceptable or not.
>>
>> Thanks.
>>
>> -Matthias
>>
>>
>> On 10/20/23 5:44 AM, Luke Chen wrote:
>> > Hi Ryan,
>> >
>> > OK, I've backported it to 3.5 branch.
>> > I'll be included in v3.5.2.
>> >
>> > Thanks.
>> > Luke
>> >
>> > On Fri, Oct 20, 2023 at 7:43 AM Ryan Leslie (BLP/ NEW YORK (REMOT) <
>> > rles...@bloomberg.net> wrote:
>> >
>> >> Hi Luke,
>> >>
>> >> Hope you are well. Can you please include
>> >> https://issues.apache.org/jira/browse/KAFKA-15106 in 3.5.2?
>> >>
>> >> Thanks,
>> >>
>> >> Ryan
>> >>
>> >> From: dev@kafka.apache.org At: 10/17/23 05:05:24 UTC-4:00
>> >> To: dev@kafka.apache.org
>> >> Subject: Re: [DISCUSS] Apache Kafka 3.5.2 release
>> >>
>> >> Thanks Luke for volunteering for 3.5.2 release.
>> >>
>> >> On Tue, 17 Oct 2023 at 11:58, Josep Prat 
>> >> wrote:
>> >>>
>> >>> Hi Luke,
>> >>>
>> >>> Thanks for taking this one!
>> >>>
>> >>> Best,
>> >>>
>> >>> On Tue, Oct 17, 2023 at 8:12 AM Luke Chen  wrote:
>> >>>
>> >>>> Hi all,
>> >>>>
>> >>>> I'd like to volunteer as release manager for the Apache Kafka 3.5.2,
>> to
>> >>>> have an important bug/vulnerability fix release for 3.5.1.
>> >>>>
>> >>>> If there are no objections, I'll start building a release plan in
>> >> thewiki
>> >>>> in the next couple of weeks.
>> >>>>
>> >>>> Thanks,
>> >>>> Luke
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> [image: Aiven] <https://www.aiven.io>
>> >>>
>> >>> *Josep Prat*
>> >>> Open Source Engineering Director, *Aiven*
>> >>> josep.p...@aiven.io | +491715557497
>> >>> aiven.io <https://www.aiven.io> | <
>> https://www.facebook.com/aivencloud>
>> >>> <https://www.linkedin.com/company/aiven/> <
>> https://twitter.com/aiven_io>
>> >>> *Aiven Deutschland GmbH*
>> >>> Alexanderufer 3-7, 10117 Berlin
>> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>> >>> Amtsgericht Charlottenburg, HRB 209739 B
>> >>
>> >>
>> >>
>> >
>>
>


Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-09 Thread Luke Chen
Hi Christo,

Thanks for the KIP!

Some comments:
1. I agree with Kamal that a metric to cover the time taken to read data
from remote storage is helpful.

2. I can see there are some metrics are only on topic level, but some are
on partition level.
Could you explain why some of them are only on topic level?
Like RemoteLogSizeComputationTime, it's different from partition to
partition, will it be better to be exposed as partition metric?

3. `RemoteLogSizeBytes` metric hanging.
To compute the RemoteLogSizeBytes, we need to wait until all records in the
metadata topic loaded.
What will happen if it takes long to load the data from metadata topic?
Should we instead return -1 or something to indicate it's still loading?

Thanks.
Luke

On Fri, Nov 3, 2023 at 1:53 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Christo,
>
> Thanks for expanding the scope of the KIP!  We should also cover the time
> taken to
> read data from remote storage. This will give our users a fair idea about
> the P99, P95,
> and P50 Fetch latency to read data from remote storage.
>
> The Fetch API request metrics currently provides a breakdown of the time
> spent on each item:
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L517
> Should we also provide `RemoteStorageTimeMs` item (only for FETCH API) so
> that users can
> understand the overall and per-step time taken?
>
> Regarding the Remote deletion metrics, should we also emit a metric to
> expose the oldest segment time?
> Users can configure the topic retention either by size (or) time. If time
> is configured, then emitting
> the oldest segment time allows the user to configure an alert on top of it
> and act accordingly.
>
> On Wed, Nov 1, 2023 at 7:07 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thanks, Christo!
> >
> > 1. Agree. Having a further look into how many latency metrics are
> included
> > on the broker side there are only a few of them (e.g. request lifecycle)
> —
> > but seems mostly delegated to clients, or plugin in this case, to measure
> > this.
> >
> > 3.2. Personally, I find the record-based lag less useful as records can't
> > be relied as a stable unit of measure. So, if we can keep bytes- and
> > segment-based lag, LGTM.
> > 3.4.  Agree, these metrics should be on the broker side. Though if plugin
> > decides to take deletion as a background process, then it should have
> it's
> > own metrics. That's why I was thinking the calculation should be fairly
> > similar to the CopyLag: "these segments are available for deletion but
> > haven't been deleted yet"
> > 3.5. For lag metrics: could we add an explanation on how each lag will be
> > calculated, e.g. using which values, from which components, under which
> > circumstances do we expect these values to increase/decrease, etc. This
> > will clarify 3.4. and make it easier to agree and eventually test.
> >
> > 4. Sorry I wasn't clear. I meant similar to `RemoteCopyBytesPerSec` and
> > `RemoteFetchBytesPerSec`, we could consider to include
> > `RemoteDeleteBytesPerSec`.
> >
> > 5. and 6. Thanks for the explanation! It surely benefits to have these as
> > part of the set of metrics.
> >
> > Cheers,
> > Jorge.
> >
> > On Mon, 30 Oct 2023 at 16:07, Christo Lolov 
> > wrote:
> >
> > > Heya Jorge,
> > >
> > > Thank you for the insightful comments!
> > >
> > > 1. I see a value in such latency metrics but in my opinion the correct
> > > location for such metrics is in the plugins providing the underlying
> > > functionality. What are your thoughts on the matter?
> > >
> > > 2. Okay, I will look for and adjust the formatting today/tomorrow!
> > >
> > > 3.1 Done.
> > > 3.2 Sure, I will add this to the KIP later today, the suggestion makes
> > > sense to me. However, my question is, would you still find value in
> > > emitting metrics for all three i.e. RemoteCopyLagRecords,
> > > RemoteCopyLagBytes and RemoteCopyLagSegments or would you only keep
> > > RemoteCopyLagBytes and RemoteCopyLagSegments?
> > > 3.3. Yes, RemoteDeleteLagRecords was supposed to be an equivalent of
> > > RemoteCopyLagRecords. Once I have your opinion on 3.2 I will make the
> > > respective changes.
> > > 3.4. I envision these metrics to be added to Kafka rather than the
> > plugins.
> > > Today Kafka sends deletes to remote storage but does not know whether
> > those
> > > segments have been deleted immediately when the request has been sent
> or
> > > have been given to a background process to carry out the actual
> > reclamation
> > > of space. The purpose of this metric is to give an estimate in time
> which
> > > says "hey, we have called this many segments or bytes to be deleted".
> > >
> > > 4. I believe this goes down the same line of thinking as what you
> > mentioned
> > > in 3.3 - have I misunderstood something?
> > >
> > > 5. I have on a number of occasions found I do not have a metric to
> > quickly
> > > point me to what part of t

Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-11-09 Thread Luke Chen
Hi all,

Greg found a regression issue in Kafka connect:
https://issues.apache.org/jira/browse/KAFKA-15800
I'll wait until this fix gets merged and create CR build for v3.5.2.

Thanks.
Luke

On Sat, Nov 4, 2023 at 1:33 AM Matthias J. Sax  wrote:

> Hey,
>
> Sorry for late reply. We finished our testing, and think we are go.
>
> Thanks for giving us the opportunity to get the RocksDB version bump in.
> Let's ship it!
>
>
> -Matthias
>
> On 11/2/23 4:37 PM, Luke Chen wrote:
> > Hi Matthias,
> >
> > Is there any update about the test status for RocksDB versions bumps?
> > Could I create a 3.5.2 RC build next week?
> >
> > Thanks.
> > Luke
> >
> > On Sat, Oct 21, 2023 at 1:01 PM Luke Chen  wrote:
> >
> >> Hi Matthias,
> >>
> >> I agree it's indeed a blocker for 3.5.2 to address CVE in RocksDB.
> >> Please let me know when the test is completed.
> >>
> >> Thank you.
> >> Luke
> >>
> >> On Sat, Oct 21, 2023 at 2:12 AM Matthias J. Sax 
> wrote:
> >>
> >>> Thanks for the info Luke.
> >>>
> >>> We did backport all but one PR in the mean time. The missing PR is a
> >>> RocksDB version bump. We want to consider it for 3.5.2, because it
> >>> addresses a CVE.
> >>>
> >>> Cf https://github.com/apache/kafka/pull/14216
> >>>
> >>> However, RocksDB versions bumps are a little bit more tricky, and we
> >>> would like to test this properly on 3.5 branch, what would take at
> least
> >>> one week; we could do the cherry-pick on Monday and start testing.
> >>>
> >>> Please let us know if such a delay for 3.5.2 is acceptable or not.
> >>>
> >>> Thanks.
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 10/20/23 5:44 AM, Luke Chen wrote:
> >>>> Hi Ryan,
> >>>>
> >>>> OK, I've backported it to 3.5 branch.
> >>>> I'll be included in v3.5.2.
> >>>>
> >>>> Thanks.
> >>>> Luke
> >>>>
> >>>> On Fri, Oct 20, 2023 at 7:43 AM Ryan Leslie (BLP/ NEW YORK (REMOT) <
> >>>> rles...@bloomberg.net> wrote:
> >>>>
> >>>>> Hi Luke,
> >>>>>
> >>>>> Hope you are well. Can you please include
> >>>>> https://issues.apache.org/jira/browse/KAFKA-15106 in 3.5.2?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Ryan
> >>>>>
> >>>>> From: dev@kafka.apache.org At: 10/17/23 05:05:24 UTC-4:00
> >>>>> To: dev@kafka.apache.org
> >>>>> Subject: Re: [DISCUSS] Apache Kafka 3.5.2 release
> >>>>>
> >>>>> Thanks Luke for volunteering for 3.5.2 release.
> >>>>>
> >>>>> On Tue, 17 Oct 2023 at 11:58, Josep Prat  >
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi Luke,
> >>>>>>
> >>>>>> Thanks for taking this one!
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> On Tue, Oct 17, 2023 at 8:12 AM Luke Chen 
> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to volunteer as release manager for the Apache Kafka
> 3.5.2,
> >>> to
> >>>>>>> have an important bug/vulnerability fix release for 3.5.1.
> >>>>>>>
> >>>>>>> If there are no objections, I'll start building a release plan in
> >>>>> thewiki
> >>>>>>> in the next couple of weeks.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Luke
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> [image: Aiven] <https://www.aiven.io>
> >>>>>>
> >>>>>> *Josep Prat*
> >>>>>> Open Source Engineering Director, *Aiven*
> >>>>>> josep.p...@aiven.io | +491715557497
> >>>>>> aiven.io <https://www.aiven.io> | <
> >>> https://www.facebook.com/aivencloud>
> >>>>>> <https://www.linkedin.com/company/aiven/> <
> >>> https://twitter.com/aiven_io>
> >>>>>> *Aiven Deutschland GmbH*
> >>>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>


Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-20 Thread Luke Chen
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Fri, 10 Nov 2023 at 09:33, Satish Duggana <
> > satish.dugg...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Christo for the KIP and the interesting discussion.
> > > > > >
> > > > > > 101. Adding metrics at partition level may increase the
> cardinality
> > > of
> > > > > > these metrics. We should be cautious of that and see whether they
> > are
> > > > > > really needed. RLM related operations do not generally affect
> based
> > > on
> > > > > > partition(s) but it is mostly because of the remote storage or
> > broker
> > > > > > level issues.
> > > > > >
> > > > > > 102. I am not sure whether the records metric is much useful when
> > we
> > > > > > have other bytes and segments related metrics available. If
> needed,
> > > > > > records level information can be derived once we have
> > segments/bytes
> > > > > > metrics.
> > > > > >
> > > > > > 103. Regarding RemoteLogSizeComputationTime, we can add logs for
> > > > > > debugging purposes to print the required duration while computing
> > > size
> > > > > > instead of generating a metric. If there is any degradation in
> > remote
> > > > > > log size computation, it will have an effect on RLM task leading
> to
> > > > > > remote log copy and delete lags.
> > > > > >
> > > > > > RLMM and RSM implementations can always add more metrics for
> > > > > > observability based on the respective implementations.
> > > > > >
> > > > > > 104. What is the purpose of RemoteLogMetadataCount as a metric?
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Fri, 10 Nov 2023 at 04:10, Jorge Esteban Quilcate Otoya
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Christo,
> > > > > > >
> > > > > > > I'd like to add another suggestion:
> > > > > > >
> > > > > > > 7. Adding on TS lag formulas, my understanding is that per
> > > pertition:
> > > > > > > - RemoteCopyLag: difference between: latest local segment
> > candidate
> > > > for
> > > > > > > upload - latest remote segment
> > > > > > >   - Represents how Remote Log Manager task is handling backlog
> of
> > > > > > segments.
> > > > > > >   - Ideally, this lag is zero -- grows when upload is slower
> than
> > > the
> > > > > > > increase on candidate segments to upload
> > > > > > >
> > > > > > > - RemoteDeleteLag: difference between: latest remote candidate
> > > > segment to
> > > > > > > keep based on retention - oldest remote segment
> > > > > > >   - Represents how many segments Remote Log Manager task is
> > missing
> > > > to
> > > > > > > delete at a given point in time
> > > > > > >   - Ideally, this lag is zero -- grows when retention condition
> > > > changes
> > > > > > but
> > > > > > > RLM task is not able to schedule deletion yet.
> > > > > > >
> > > > > > > Is my understanding of these lags correct?
> > > > > > >
> > > > > > > I'd like to also consider an additional lag:
> > > > > > > - LocalDeleteLag: difference between: latest local candidate
> > > segment
> > > > to
> > > > > > > keep based on local retention - oldest local segment
> > > > > > >   - Represents how many segments are still available locally
> when
> > > > they
> > > > > > are
> > > > > > > candidate for deletion. This usually happens when log cleaner
> has
> > > not
> > > > > > been
> > > > > > > scheduled yet. It's important because it represents how much
> data
> > > is
> > > > > > stored
> > > > > > > locally when it co

Re: [VOTE] KIP-963: Additional metrics in Tiered Storage

2023-11-20 Thread Luke Chen
+1 (binding) from me.
Thanks for the KIP.

Luke

On Tue, Nov 21, 2023 at 11:53 AM Satish Duggana 
wrote:

> +1 (binding)
> Thanks for the KIP and the discussion.
>
> Discussion mail thread for the KIP:
> https://lists.apache.org/thread/40vsyc240hyody37mf2f0pn90shkzb45
>
>
>
> On Tue, 21 Nov 2023 at 05:21, Kamal Chandraprakash
>  wrote:
> >
> > +1 (non-binding). Thanks for the KIP!
> >
> > On Tue, Nov 21, 2023, 03:04 Divij Vaidya 
> wrote:
> >
> > > + 1 (binding)
> > >
> > > This Kip will greatly improve Tiered Storage troubleshooting. Thank you
> > > Christo.
> > >
> > > On Mon 20. Nov 2023 at 17:21, Christo Lolov 
> > > wrote:
> > >
> > > > Hello all!
> > > >
> > > > Now that the discussion for KIP-963 has winded down, I would like to
> open
> > > > it for a vote targeting 3.7.0 as the release. You can find the
> current
> > > > version of the KIP at
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage
> > > >
> > > > Best,
> > > > Christo
> > > >
> > >
>


[VOTE] 3.5.2 RC1

2023-11-21 Thread Luke Chen
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.5.2.

This is a bugfix release with several fixes since the release of 3.5.1,
including dependency version bumps for CVEs.

Release notes for the 3.5.2 release:
https://home.apache.org/~showuon/kafka-3.5.2-rc1/RELEASE_NOTES.html

*** Please download, test and vote by Nov. 28.

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~showuon/kafka-3.5.2-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~showuon/kafka-3.5.2-rc1/javadoc/

* Tag to be voted upon (off 3.5 branch) is the 3.5.2 tag:
https://github.com/apache/kafka/releases/tag/3.5.2-rc1

* Documentation:
https://kafka.apache.org/35/documentation.html

* Protocol:
https://kafka.apache.org/35/protocol.html

* Successful Jenkins builds for the 3.5 branch:
Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/98/
There are some falky tests, including the testSingleIP test failure. It
failed because of some infra change and we fixed it
 recently.

System tests: running, will update the results later.



Thank you.
Luke


Re: [DISCUSS] Road to Kafka 4.0

2023-11-21 Thread Luke Chen
st
> >> >> >> deployments were 3 nodes as well.
> >> >> >>
> >> >> >> KIP-853 is not a blocker for either 3.7 or 4.0. We discussed this
> in
> >> >> >> several KIPs that happened this year and last year. The most
> notable was
> >> >> >> probably KIP-866, which was approved in May 2022.
> >> >> >>
> >> >> >> Many users these days run in a Kubernetes environment where
> Kubernetes
> >> >> >> actually controls the DNS. This makes changing the set of voters
> less
> >> >> >> important than it was historically.
> >> >> >>
> >> >> >> For example, in a world with static DNS, you might have to change
> the
> >> >> >> controller.quorum.voters setting from:
> >> >> >>
> >> >> >> 100@a.local:9073,101@b.local:9073,102@c.local:9073
> >> >> >>
> >> >> >> to:
> >> >> >>
> >> >> >> 100@a.local:9073,101@b.local:9073,102@d.local:9073
> >> >> >>
> >> >> >> In a world with k8s controlling the DNS, you simply remap c.local
> to point
> >> >> >> ot the IP address of your new pod for controller 102, and you're
> done. No
> >> >> >> need to update controller.quorum.voters.
> >> >> >>
> >> >> >> Another question is whether you re-create the pod data from
> scratch every
> >> >> >> time you add a new node. If you store the controller data on an
> EBS volume
> >> >> >> (or cloud-specific equivalent), you really only have to detach it
> from the
> >> >> >> previous pod and re-attach it to the new pod. k8s also handles
> this
> >> >> >> automatically, of course.
> >> >> >>
> >> >> >> If you want to reconstruct the full controller pod state each
> time you
> >> >> >> create a new pod (for example, so that you can use only instance
> storage),
> >> >> >> you should be able to rsync that state from the leader. In
> general, the
> >> >> >> invariant that we want to maintain is that the state should not
> "go back in
> >> >> >> time" -- if controller 102 promised to hold all log data up to
> offset X, it
> >> >> >> should come back with committed data at at least that offset.
> >> >> >>
> >> >> >> There are lots of new features we'd like to implement for KRaft,
> and Kafka
> >> >> >> in general. If you have some you really would like to see, I
> think everyone
> >> >> >> in the community would be happy to work with you. The flip side,
> of course,
> >> >> >> is that since there are an unlimited number of features we could
> do, we
> >> >> >> can't really block the release for any one feature.
> >> >> >>
> >> >> >> To circle back to KIP-853, I think it stands a good chance of
> making it
> >> >> >> into AK 4.0. Jose, Alyssa, and some other people have worked on
> it. It
> >> >> >> definitely won't make it into 3.7, since we have only a few weeks
> left
> >> >> >> before that release happens.
> >> >> >>
> >> >> >> best,
> >> >> >> Colin
> >> >> >>
> >> >> >>
> >> >> >> On Thu, Nov 9, 2023, at 00:20, Anton Agestam wrote:
> >> >> >> > Hi Luke,
> >> >> >> >
> >> >> >> > We have been looking into what switching from ZK to KRaft will
> mean for
> >> >> >> > Aiven.
> >> >> >> >
> >> >> >> > We heavily depend on an “immutable infrastructure” model for
> deployments.
> >> >> >> > This means that, when we perform upgrades, we introduce new
> nodes to our
> >> >> >> > clusters, scale the cluster up to incorporate the new nodes,
> and then
> >> >> >> phase
> >> >> >> > the old ones out once all partitions are moved to the new
> generation.
> >> >> >> This
> >> >> >> > allows us, and anyone else using a similar model, to do
> upgrades as well
> >> >> >> as
> >> &

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2023-11-28 Thread Luke Chen
Hi Abhijeet,

Thanks for the KIP!
This is an important feature for tiered storage.

Some comments:
1. Will we introduce new metrics for this tiered storage quotas?
This is important because the admin can know the throttling status by
checking the metrics while the remote write/read are slow, like the rate of
uploading/reading byte rate, the throttled time for upload/read... etc.

2. Could you give some examples for the throttling algorithm in the KIP to
explain it? That will make it much clearer.

3. To solve this problem, we can break down the RLMTask into two smaller
tasks - one for segment upload and the other for handling expired segments.
How do we handle the situation when a segment is still waiting for
offloading while this segment is expired and eligible to be deleted?
Maybe it'll be easier to not block the RLMTask when quota exceeded, and
just check it each time the RLMTask runs?

Thank you.
Luke

On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar 
wrote:

> Hi All,
>
> I have created KIP-956 for defining read and write quota for tiered
> storage.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
>
> Feedback and suggestions are welcome.
>
> Regards,
> Abhijeet.
>


Re: [DISCUSS] KIP-996: Pre-Vote

2023-11-29 Thread Luke Chen
Hi Alyssa,

Thanks for the KIP!
This is an important improvement for KRaft quorum.

Some comments:
1. Follower transitions to: Prospective: After expiration of the election
timeout
-> Is this the fetch timeout, not election timeout?

2. I also agree we don't bump the epoch in prospective state.
 A candidate will now send a VoteRequest with the PreVote field set to true
and CandidateEpoch set to its [epoch + 1] when its election timeout
expires.
-> What is "CandidateEpoch"? And I thought you've agreed to not set [epoch
+ 1] ?

Thanks.
Luke

On Wed, Nov 29, 2023 at 2:06 AM Alyssa Huang 
wrote:

> Thanks Jose, I've updated the KIP to reflect your and Jason's suggestions!
>
> On Tue, Nov 28, 2023 at 9:54 AM José Armando García Sancio
>  wrote:
>
> > Hi Alyssa
> >
> > On Mon, Nov 27, 2023 at 1:40 PM Jason Gustafson
> >  wrote:
> > > 2. Do you think the pretend epoch bump is necessary? Would it be
> simpler
> > to
> > > change the prevote acceptance check to assert a greater than or equal
> > epoch?
> >
> > I agree with Jason it would be better if all of the requests always
> > sent the current epoch. For the VoterRequest, it should be correct for
> > the prospective node to not increase the epoch and send the current
> > epoch and id. Since there are two states (prospective and candidate)
> > that can send a VoteRequest, maybe we can change the field name to
> > just ReplicaEpoch and ReplicaId.
> >
> > Thanks,
> > --
> > -José
> >
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-02 Thread Luke Chen
; risk having stray data on remote storage.
>>> b) on each restart, we should initiate the remote storage deletion
>>> because although we replayed a record with a DISABLED state, we can not be
>>> sure if the remote data is deleted or not.
>>>
>>> We could either consider keeping the remote topic in DISABLING state
>>> until all of the remote storage data is deleted, or we need an additional
>>> mechanism to handle the remote stray data.
>>>
>>> The existing topic deletion, for instance, handles stray logs on disk by
>>> detecting them on KafkaBroker startup and deleting before the
>>> ReplicaManager is started.
>>> Maybe we need a similar mechanism here as well if we don't want a
>>> DISABLING state. Otherwise, we need a callback from Brokers to validate
>>> that remote storage data is deleted and now we could move to the DISABLED
>>> state.
>>>
>>> Thanks.
>>>
>>> On Tue, 9 Apr 2024 at 12:45, Luke Chen  wrote:
>>>
>>>> Hi Christo,
>>>>
>>>> > I would then opt for moving information from DisableRemoteTopic
>>>> within the StopReplicas API which will then disappear in KRaft world as
>>>> it
>>>> is already scheduled for deprecation. What do you think?
>>>>
>>>> Sounds good to me.
>>>>
>>>> Thanks.
>>>> Luke
>>>>
>>>> On Tue, Apr 9, 2024 at 6:46 PM Christo Lolov 
>>>> wrote:
>>>>
>>>> > Heya Luke!
>>>> >
>>>> > I thought a bit more about it and I reached the same conclusion as
>>>> you for
>>>> > 2 as a follow-up from 1. In other words, in KRaft world I don't think
>>>> the
>>>> > controller needs to wait for acknowledgements for the brokers. All we
>>>> care
>>>> > about is that the leader (who is responsible for archiving/deleting
>>>> data in
>>>> > tiered storage) knows about the change and applies it properly. If
>>>> there is
>>>> > a leadership change halfway through the operation then the new leader
>>>> still
>>>> > needs to apply the message from the state topic and we know that a
>>>> > disable-message will be applied before a reenablement-message. I will
>>>> > change the KIP later today/tomorrow morning to reflect this reasoning.
>>>> >
>>>> > However, with this I believe that introducing a new API just for
>>>> > Zookeeper-based clusters (i.e. DisableRemoteTopic) becomes a bit of an
>>>> > overkill. I would then opt for moving information from
>>>> DisableRemoteTopic
>>>> > within the StopReplicas API which will then disappear in KRaft world
>>>> as it
>>>> > is already scheduled for deprecation. What do you think?
>>>> >
>>>> > Best,
>>>> > Christo
>>>> >
>>>> > On Wed, 3 Apr 2024 at 07:59, Luke Chen  wrote:
>>>> >
>>>> > > Hi Christo,
>>>> > >
>>>> > > 1. I agree with Doguscan that in KRaft mode, the controller won't
>>>> send
>>>> > RPCs
>>>> > > to the brokers (except in the migration path).
>>>> > > So, I think we could adopt the similar way we did to
>>>> > `AlterReplicaLogDirs`
>>>> > > (
>>>> > > KIP-858
>>>> > > <
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft#KIP858:HandleJBODbrokerdiskfailureinKRaft-Intra-brokerreplicamovement
>>>> > > >)
>>>> > > that let the broker notify controller any update, instead of
>>>> controller
>>>> > to
>>>> > > broker. And once the controller receives all the complete requests
>>>> from
>>>> > > brokers, it'll enter "Disabled" state. WDYT?
>>>> > >
>>>> > > 2. Why should we wait until all brokers to respond before moving to
>>>> > > "Disabled" state in "KRaft mode"?
>>>> > > Currently, only the leader node does the remote log upload/fetch
>>>> tasks,
>>>> > so
>>>> > > does that mean the controller only need to make sure the leader
>>>> completes
>>>> > >

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-02 Thread Luke Chen
Also, I think using `stopReplicas` request is a good idea because it won't
cause any problems while migrating to KRaft mode.
The stopReplicas request is one of the request that KRaft controller will
send to ZK brokers during migration.

Thanks.
Luke

On Fri, May 3, 2024 at 11:48 AM Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the update.
>
> Questions:
> 1. For this
> "The possible state transition from DISABLED state is to the ENABLED."
> I think it only applies for KRaft mode. In ZK mode, the possible state is
> "DISABLING", right?
>
> 2. For this:
> "If the cluster is using Zookeeper as the control plane, enabling remote
> storage for a topic triggers the controller to send this information to
> Zookeeper. Each broker listens for changes in Zookeeper, and when a change
> is detected, the broker triggers RemoteLogManager#onLeadershipChange()."
>
> I think the way ZK brokers knows the leadership change is by getting the
> LeaderAndISRRequeset from the controller, not listening for changes in ZK.
>
> 3. In the KRaft handler steps, you said:
> "The controller also updates the Topic metadata to increment the
> tiered_epoch and update the tiered_stateto DISABLING state."
>
> Should it be "DISABLED" state since it's KRaft mode?
>
> 4. I was thinking how we handle the tiered_epoch not match error.
> For ZK, I think the controller won't write any data into ZK Znode,
> For KRaft, either configRecord or updateTopicMetadata records won't be
> written.
> Is that right? Because the current workflow makes me think there will be
> partial data updated in ZK/KRaft when tiered_epoch error.
>
> 5. Since we changed to use stopReplicas (V5) request now, the diagram for
> ZK workflow might also need to update.
>
> 6. In ZK mode, what will the controller do if the "stopReplicas" responses
> not received from all brokers? Reverting the changes?
> This won't happen in KRaft mode because it's broker's responsibility to
> fetch metadata update from controller.
>
>
> Thank you.
> Luke
>
>
> On Fri, Apr 19, 2024 at 10:23 PM Christo Lolov 
> wrote:
>
>> Heya all!
>>
>> I have updated KIP-950. A list of what I have updated is:
>>
>> * Explicitly state that Zookeeper-backed clusters will have ENABLED ->
>> DISABLING -> DISABLED while KRaft-backed clusters will only have ENABLED ->
>> DISABLED
>> * Added two configurations for the new thread pools and explained where
>> values will be picked-up mid Kafka version upgrade
>> * Explained how leftover remote partitions will be scheduled for deletion
>> * Updated the API to use StopReplica V5 rather than a whole new
>> controller-to-broker API
>> * Explained that the disablement procedure will be triggered by the
>> controller listening for an (Incremental)AlterConfig change
>> * Explained that we will first move log start offset and then issue a
>> deletion
>> * Went into more details that changing remote.log.disable.policy after
>> disablement won't do anything and that if a customer would like additional
>> data deleted they would have to use already existing methods
>>
>> Let me know if there are any new comments or I have missed something!
>>
>> Best,
>> Christo
>>
>> On Mon, 15 Apr 2024 at 12:40, Christo Lolov 
>> wrote:
>>
>>> Heya Doguscan,
>>>
>>> I believe that the state of the world after this KIP will be the
>>> following:
>>>
>>> For Zookeeper-backed clusters there will be 3 states: ENABLED, DISABLING
>>> and DISABLED. We want this because Zookeeper-backed clusters will await a
>>> confirmation from the brokers that they have indeed stopped tiered-related
>>> operations on the topic.
>>>
>>> For KRaft-backed clusters there will be only 2 states: ENABLED and
>>> DISABLED. KRaft takes a fire-and-forget approach for topic deletion. I
>>> believe the same approach ought to be taken for tiered topics. The
>>> mechanism which will ensure that leftover state in remote due to failures
>>> is cleaned up to me is the retention mechanism. In today's code, a leader
>>> deletes all segments it finds in remote with offsets below the log start
>>> offset. I believe this will be good enough for cleaning up leftover state
>>> in remote due to failures.
>>>
>>> I know that quite a few changes have been discussed so I will aim to put
>>> them on paper in the upcoming days and let everyone know!
>>>
>>> Best,
>>> Christo
>>>
>>> On Tue, 9 Apr 202

Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-05-03 Thread Luke Chen
Hi Kamal,

Thanks for the KIP!
Sorry for the late review.

Overall LGTM! Just 1 question:

If one fetch request contains 2 partitions: [p1, p2]
fetch.max.wait.ms: 500, remote.fetch.max.wait.ms: 1000

And now, p1 fetch offset is the log end offset and has no new data coming,
and p2 fetch offset is to fetch from remote storage.
And suppose the fetch from remote storage takes 1000ms.
So, question:
Will this fetch request return in 500ms or 1000ms?
And what will be returned?

I think before this change, it'll return within 500ms, right?
But it's not clear what behavior it will be after this KIP.

Thank you.
Luke


On Fri, May 3, 2024 at 1:56 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Christo,
>
> We have localTimeMs, remoteTimeMs, and totalTimeMs as part of the
> FetchConsumer request metric.
>
>
> kafka.network:type=RequestMetrics,name={LocalTimeMs|RemoteTimeMs|TotalTimeMs},request={Produce|FetchConsumer|FetchFollower}
>
> RemoteTimeMs refers to the amount of time spent in the purgatory for normal
> fetch requests
> and amount of time spent in reading the remote data for remote-fetch
> requests. Do we want
> to have a separate `TieredStorageTimeMs` to capture the time spent in
> remote-read requests?
>
> With per-broker level timer metrics combined with the request level
> metrics, the user will have
> sufficient information.
>
> Metric name =
>
> kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs
>
> --
> Kamal
>
> On Mon, Apr 29, 2024 at 1:38 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > Is it difficult to instead add the metric at
> > kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other
> > name=*)? Alternatively, if it is difficult to add it there, is it
> possible
> > to add 2 metrics, one at the RequestMetrics level (even if it is
> > total-time-ms - (all other times)) and one at what you are proposing? As
> an
> > operator I would find it strange to not see the metric in the
> > RequestMetrics.
> >
> > Your thoughts?
> >
> > Best,
> > Christo
> >
> > On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Christo,
> > >
> > > Updated the KIP with the remote fetch latency metric. Please take
> another
> > > look!
> > >
> > > --
> > > Kamal
> > >
> > > On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Federico,
> > > >
> > > > Thanks for the suggestion! Updated the config name to "
> > > > remote.fetch.max.wait.ms".
> > > >
> > > > Christo,
> > > >
> > > > Good point. We don't have the remote-read latency metrics to measure
> > the
> > > > performance of the remote read requests. I'll update the KIP to emit
> > this
> > > > metric.
> > > >
> > > > --
> > > > Kamal
> > > >
> > > >
> > > > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri <
> fedeval...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi Kamal, it looks like all TS configurations starts with "remote."
> > > >> prefix, so I was wondering if we should name it
> > > >> "remote.fetch.max.wait.ms".
> > > >>
> > > >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash
> > > >>  wrote:
> > > >> >
> > > >> > Hi all,
> > > >> >
> > > >> > If there are no more comments, I'll start a vote thread by
> tomorrow.
> > > >> > Please review the KIP.
> > > >> >
> > > >> > Thanks,
> > > >> > Kamal
> > > >> >
> > > >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash <
> > > >> > kamal.chandraprak...@gmail.com> wrote:
> > > >> >
> > > >> > > Hi all,
> > > >> > >
> > > >> > > Bumping the thread. Please review this KIP. Thanks!
> > > >> > >
> > > >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash <
> > > >> > > kamal.chandraprak...@gmail.com> wrote:
> > > >> > >
> > > >> > >> Hi Jorge,
> > > >> > >>
> > > >> > >> Thanks for the review! Added your suggestions to the KIP. PTAL.
> > > >> > >>
> > > >> > >> The `fetch.max.wait.ms` config will be also applicable for
> > topics
> > > >> > >> enabled with remote storage.
> > > >> > >> Updated the description to:
> > > >> > >>
> > > >> > >> ```
> > > >> > >> The maximum amount of time the server will block before
> answering
> > > the
> > > >> > >> fetch request
> > > >> > >> when it is reading near to the tail of the partition
> > > >> (high-watermark) and
> > > >> > >> there isn't
> > > >> > >> sufficient data to immediately satisfy the requirement given by
> > > >> > >> fetch.min.bytes.
> > > >> > >> ```
> > > >> > >>
> > > >> > >> --
> > > >> > >> Kamal
> > > >> > >>
> > > >> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate Otoya <
> > > >> > >> quilcate.jo...@gmail.com> wrote:
> > > >> > >>
> > > >> > >>> Hi Kamal,
> > > >> > >>>
> > > >> > >>> Thanks for this KIP! It should help to solve one of the main
> > > issues
> > > >> with
> > > >> > >>> tiered storage at the moment that is dealing with individual
> > > >> consumer
> > > >> > >>> configurations to avoid flooding logs with interrupted
> > exceptions.
> > 

Re: request permissions to contribute to Kafka

2024-05-06 Thread Luke Chen
Hi Zhisheng,

I've granted your permission.

Thank you.
Luke

On Tue, May 7, 2024 at 10:25 AM Zhisheng Zhang <31791909...@gmail.com>
wrote:

> Hi
>
> I'd like to request permissions to contribute to Kafka to propose a KIP
>
> Wiki ID:zhangzhisheng
> Jira ID:zhangzhisheng
>
> Thank you
>


Re: [VOTE] KIP-1018: Introduce max remote fetch timeout config

2024-05-09 Thread Luke Chen
Hi Kamal,

Thanks for the KIP!
+1 from me.

Thanks.
Luke

On Mon, May 6, 2024 at 5:03 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi all,
>
> We would like to start a voting thread for KIP-1018: Introduce
> max remote fetch timeout config for DelayedRemoteFetch requests.
>
> The KIP is available on
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
>
> If you have any suggestions, feel free to participate in the discussion
> thread:
> https://lists.apache.org/thread/9x21hzpxzmrt7xo4vozl17d70fkg3chk
>
> --
> Kamal
>


Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-10 Thread Luke Chen
ously. This case can be
> bound
> > > to happen when the number of remote log segments to
> > > delete is huge.
> > >
> > >
> > > On Mon, May 6, 2024, 18:12 Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > >> Hi Christo,
> > >>
> > >> Thanks for the update!
> > >>
> > >> 1. In the ZK mode, how will the transition from DISABLING to DISABLED
> > >> state happen?
> > >> For the "retain" policy, until we delete all the remote-log segments,
> > the
> > >> state will be
> > >> DISABLING and the deletion can happen only when they breach either the
> > >> retention
> > >> time (or) size.
> > >>
> > >> How does the controller monitor that all the remote log segments are
> > >> deleted for all
> > >> the partitions of the topic before transitioning the state to
> DISABLED?
> > >>
> > >> 2. In Kraft, we have only ENABLED -> DISABLED state. How are we
> > >> supporting the case
> > >> "retain" -> "enable"?
> > >>
> > >> If the remote storage is degraded, we want to avoid uploading the
> > >> segments temporarily
> > >> and resume back once the remote storage is healthy. Is the case
> > supported?
> > >>
> > >>
> > >>
> > >> On Fri, May 3, 2024 at 12:12 PM Luke Chen  wrote:
> > >>
> > >>> Also, I think using `stopReplicas` request is a good idea because it
> > >>> won't cause any problems while migrating to KRaft mode.
> > >>> The stopReplicas request is one of the request that KRaft controller
> > >>> will send to ZK brokers during migration.
> > >>>
> > >>> Thanks.
> > >>> Luke
> > >>>
> > >>> On Fri, May 3, 2024 at 11:48 AM Luke Chen  wrote:
> > >>>
> > >>>> Hi Christo,
> > >>>>
> > >>>> Thanks for the update.
> > >>>>
> > >>>> Questions:
> > >>>> 1. For this
> > >>>> "The possible state transition from DISABLED state is to the
> ENABLED."
> > >>>> I think it only applies for KRaft mode. In ZK mode, the possible
> state
> > >>>> is "DISABLING", right?
> > >>>>
> > >>>> 2. For this:
> > >>>> "If the cluster is using Zookeeper as the control plane, enabling
> > >>>> remote storage for a topic triggers the controller to send this
> > information
> > >>>> to Zookeeper. Each broker listens for changes in Zookeeper, and
> when a
> > >>>> change is detected, the broker triggers
> > >>>> RemoteLogManager#onLeadershipChange()."
> > >>>>
> > >>>> I think the way ZK brokers knows the leadership change is by getting
> > >>>> the LeaderAndISRRequeset from the controller, not listening for
> > changes in
> > >>>> ZK.
> > >>>>
> > >>>> 3. In the KRaft handler steps, you said:
> > >>>> "The controller also updates the Topic metadata to increment the
> > >>>> tiered_epoch and update the tiered_stateto DISABLING state."
> > >>>>
> > >>>> Should it be "DISABLED" state since it's KRaft mode?
> > >>>>
> > >>>> 4. I was thinking how we handle the tiered_epoch not match error.
> > >>>> For ZK, I think the controller won't write any data into ZK Znode,
> > >>>> For KRaft, either configRecord or updateTopicMetadata records won't
> be
> > >>>> written.
> > >>>> Is that right? Because the current workflow makes me think there
> will
> > >>>> be partial data updated in ZK/KRaft when tiered_epoch error.
> > >>>>
> > >>>> 5. Since we changed to use stopReplicas (V5) request now, the
> diagram
> > >>>> for ZK workflow might also need to update.
> > >>>>
> > >>>> 6. In ZK mode, what will the controller do if the "stopReplicas"
> > >>>> responses not received from all brokers? Reverting the changes?
> > >>>> This won't happen in KRaft mode because it's broker's responsibility
> > to
> 

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-05-15 Thread Luke Chen
Hi Christo,

Thanks for the explanation.
I think it would be good if you could add that into the KIP.

Otherwise, LGTM.

Thank you.
Luke

On Mon, May 13, 2024 at 11:55 PM Christo Lolov 
wrote:

> Heya!
>
> re Kamal - Okay, I believe I understand what you mean and I agree. I have
> made the following change
>
> ```
>
> During tiered storage disablement, when RemoteLogManager#stopPartition() is
> called:
>
>- Tasks scheduled for the topic-partitions in the
>RemoteStorageCopierThreadPool will be canceled.
>- If the disablement policy is retain, scheduled tasks for the
>topic-partitions in the RemoteDataExpirationThreadPool will remain
>unchanged.
>- If the disablement policy is delete, we will first advance the log
>start offset and we will let tasks scheduled for the topic-partitions in
>the RemoteDataExpirationThreadPool to successfully delete all remote
>segments before the log start offset and then unregister themselves.
>
> ```
>
> re Luke - I checked once again. As far as I understand when a broker goes
> down all replicas it hosts go to OfflineReplica state in the state machine
> the controller maintains. The moment the broker comes back up again the
> state machine resends StopReplica based on
> ```
>
> * OfflineReplica -> ReplicaDeletionStarted
> * --send StopReplicaRequest to the replica (with deletion)
>
> ```
> from ReplicaStateMachine.scala. So I was wrong and you are right, we do not
> appear to be sending constant requests today. I believe it is safe for us
> to follow a similar approach i.e. if a replica comes online again we resend
> the StopReplica.
>
> If you don't notice any more problems I will aim to start a VOTE tomorrow
> so we can get at least part of this KIP in 3.8.
>
> Best,
> Christo
>
> On Fri, 10 May 2024 at 11:11, Luke Chen  wrote:
>
> > Hi Christo,
> >
> > > 1. I am not certain I follow the question. From DISABLED you can only
> go
> > to
> > ENABLED regardless of whether your cluster is backed by Zookeeper or
> KRaft.
> > Am I misunderstanding your point?
> >
> > Yes, you're right.
> >
> > > 4. I was thinking that if there is a mismatch we will just fail
> accepting
> > the request for disablement. This should be the same in both Zookeeper
> and
> > KRaft. Or am I misunderstanding your question?
> >
> > OK, sounds good.
> >
> > > 6. I think my current train of thought is that there will be unlimited
> > retries until all brokers respond in a similar way to how deletion of a
> > topic works today in ZK. In the meantime the state will continue to be
> > DISABLING. Do you have a better suggestion?
> >
> > I don't think infinite retries is a good idea since if a broker is down
> > forever, this request will never complete.
> > You mentioned the existing topic deletion is using the similar pattern,
> how
> > does it handle this issue?
> >
> > Thanks.
> > Luke
> >
> > On Thu, May 9, 2024 at 9:21 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > re: Luke
> > >
> > > 1. I am not certain I follow the question. From DISABLED you can only
> go
> > to
> > > ENABLED regardless of whether your cluster is backed by Zookeeper or
> > KRaft.
> > > Am I misunderstanding your point?
> > >
> > > 2. Apologies, this was a leftover from previous versions. I have
> updated
> > > the Zookeeper section. The steps ought to be: controller receives
> change,
> > > commits necessary data to Zookeeper, enqueues disablement and starts
> > > sending StopReplicas request to brokers; brokers receive StopReplicas
> and
> > > propagate them all the way to RemoteLogManager#stopPartitions which
> takes
> > > care of the rest.
> > >
> > > 3. Correct, it should say DISABLED - this should now be corrected.
> > >
> > > 4. I was thinking that if there is a mismatch we will just fail
> accepting
> > > the request for disablement. This should be the same in both Zookeeper
> > and
> > > KRaft. Or am I misunderstanding your question?
> > >
> > > 5. Yeah. I am now doing a second pass on all diagrams and will update
> > them
> > > by the end of the day!
> > >
> > > 6. I think my current train of thought is that there will be unlimited
> > > retries until all brokers respond in a similar way to how deletion of a
> > > topic works today in ZK. In the meantime the state will continue to be
> > > DISABLING. Do you have a better 

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-15 Thread Luke Chen
Hi Christo,

In addition to the minor comments left in the discussion thread, it LGTM.
+1 from me.

Thank you.
Luke


On Tue, May 14, 2024 at 11:21 PM Christo Lolov 
wrote:

> Heya!
>
> I would like to start a vote on KIP-950: Tiered Storage Disablement in
> order to catch the last Kafka release targeting Zookeeper -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
>
> Best,
> Christo
>


Re: [DISCUSS] Apache Kafka 3.7.1 release

2024-05-15 Thread Luke Chen
Hi Igor,

Thanks for volunteering!
+1

Luke

On Wed, May 15, 2024 at 11:15 PM Mickael Maison 
wrote:

> Hi Igor,
>
> Thanks for volunteering, +1
>
> Mickael
>
> On Thu, Apr 25, 2024 at 11:09 AM Igor Soarez  wrote:
> >
> > Hi everyone,
> >
> > I'd like to volunteer to be the release manager for a 3.7.1 release.
> >
> > Please keep in mind, this would be my first release, so I might have
> some questions,
> > and it might also take me a bit longer to work through the release
> process.
> > So I'm thinking a good target would be toward the end of May.
> >
> > Please let me know your thoughts and if there are any objections or
> concerns.
> >
> > Thanks,
> >
> > --
> > Igor
>


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-16 Thread Luke Chen
Thanks Chia-Ping!
Since ZK is going to be removed, I agree the KRaft part has higher priority.
But if Christo or the community contributor has spare time, it's good to
have ZK part, too!

Thanks.
Luke

On Thu, May 16, 2024 at 5:45 PM Chia-Ping Tsai  wrote:

> +1 but I prefer to ship it to KRaft only.
>
> I do concern that community have enough time to accept more feature in 3.8
> :(
>
> Best,
> Chia-Ping
>
> On 2024/05/14 15:20:50 Christo Lolov wrote:
> > Heya!
> >
> > I would like to start a vote on KIP-950: Tiered Storage Disablement in
> > order to catch the last Kafka release targeting Zookeeper -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> >
> > Best,
> > Christo
> >
>


Re: [DISCUSS] KIP 1047 - Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder

2024-05-23 Thread Luke Chen
LGTM!
Thanks for raising this improvement.

Luke

On Thu, May 23, 2024 at 12:52 AM Chia-Ping Tsai  wrote:

> Thanks for Josep's response
>
> > We can add this to 3.8.0, but keep in mind the KIP is not voted yet (as
> far
> as I can see), so I would highly encourage to start the vote thread ASAP
> and strat with the implementation right after.
>
> sure. We will file a draft PR at the same time!
>
> Josep Prat  於 2024年5月23日 週四 上午12:31寫道:
>
> > Hi all,
> >
> > We can add this to 3.8.0, but keep in mind the KIP is not voted yet (as
> far
> > as I can see), so I would highly encourage to start the vote thread ASAP
> > and strat with the implementation right after.
> >
> > Best,
> >
> > -
> > Josep Prat
> > Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> > +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > On Wed, May 22, 2024, 17:06 Chia-Ping Tsai  wrote:
> >
> > > > One issue I also noted is that some of the existing Decoder
> > > implementations (StringDecoder for example) can accept configurations
> > > but currently DumpLogSegments does not provide a way to pass any
> > > configurations, it creates an empty VerifiableProperties object each
> > > time it instantiates a Decoder instance. If we were to use
> > > Deserializer we would also need a way to provide configurations.
> > >
> > > BTW, if the known bug gets fixed, we have to make new interface extend
> > > `configurable`.
> > >
> > > Or we can just ignore the known issue as `DumpLogSegments` has no
> options
> > > to take custom configs for `Decoder`. That allow the `Decoder` more
> > simple
> > >
> > >
> > > Chia-Ping Tsai  於 2024年5月22日 週三 下午10:58寫道:
> > >
> > > >
> > > > Thanks for Mickael response!
> > > >
> > > > >I'm wondering whether we need to introduce a new Decoder interface
> and
> > > > instead if we could reuse Deserializer. We could deprecate the
> > > > key-decoder-class and value-decoder-class flags and introduce new
> > > > flags like key-deserializer-class and value-deserializer-class. One
> > > > benefit is that we already have many existing deserializer
> > > > implementations. WDYT?
> > > >
> > > > I prefer to use different interface, since using the same interface
> > > > (Deserializer) may obstruct us from enhancing the interface used by
> > > > DumpLogSegments only in the future.
> > > >
> > > > > One issue I also noted is that some of the existing Decoder
> > > > implementations (StringDecoder for example) can accept configurations
> > > > but currently DumpLogSegments does not provide a way to pass any
> > > > configurations, it creates an empty VerifiableProperties object each
> > > > time it instantiates a Decoder instance. If we were to use
> > > > Deserializer we would also need a way to provide configurations.
> > > >
> > > > yep, that is a known issue:
> > > > https://issues.apache.org/jira/browse/KAFKA-12311
> > > >
> > > > We will file PR to fix it
> > > >
> > > > Mickael Maison  於 2024年5月22日 週三 下午10:51寫道:
> > > >
> > > >> Hi,
> > > >>
> > > >> Thanks for the KIP. Sorting this out in 3.8.0 would be really nice
> as
> > > >> it would allow us to migrate this tool in 4.0.0. We're unfortunately
> > > >> past the KIP deadline but maybe this is small enough to have an
> > > >> exception.
> > > >>
> > > >> I'm wondering whether we need to introduce a new Decoder interface
> and
> > > >> instead if we could reuse Deserializer. We could deprecate the
> > > >> key-decoder-class and value-decoder-class flags and introduce new
> > > >> flags like key-deserializer-class and value-deserializer-class. One
> > > >> benefit is that we already have many existing deserializer
> > > >> implementations. WDYT?
> > > >>
> > > >> One issue I also noted is that some of the existing Decoder
> > > >> implementations (StringDecoder for example) can accept
> configurations
> > > >> but currently DumpLogSegments does not provide a way to pass any
> > > >> configurations, it creates an empty VerifiableProperties object each
> > > >> time it instantiates a Decoder instance. If we were to use
> > > >> Deserializer we would also need a way to provide configurations.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >> On Wed, May 22, 2024 at 4:12 PM Chia-Ping Tsai  >
> > > >> wrote:
> > > >> >
> > > >> > Dear all,
> > > >> >
> > > >> > We know that  3.8.0 KIP is already frozen, but this is a small KIP
> > and
> > > >> we need to ship it to 3.8.0 so as to remove the deprecated scala
> > > interface
> > > >> from 4.0.
> > > >> >
> > > >> > Best,
> > > >> > Chia-Ping
> > > >> >
> > > >> > On 2024/05/22 14:05:16 Frank Yang wrote:
> > > >> > > Hi team,
> > > >> > >
> > > >> > > Chia-Ping Tsai and I would like to propose KIP-1047 to migrate
> > > >> kafka.serializer.Decoder from core module (scala) to tools module
> > > (java).
> > > >> > >
> > > >> > > Feedback and comments a

Re: Inquire about a bug issue

2024-05-23 Thread Luke Chen
Hi Jianbin,

Thanks for asking.
I'll review the PR this week or next week.
Let's target this bug fix for v3.7.1 and v3.8.0.

Thanks.
Luke

On Fri, May 24, 2024 at 11:20 AM Jianbin Chen  wrote:

> I would like to inquire if anyone is paying attention to this issue
> https://issues.apache.org/jira/browse/KAFKA-16583. When the broker
> allocates partitions and then restarts, there is a chance that this problem
> will occur, causing the broker to fail to start. This is a bug that greatly
> affects the stability of production services. Why has it not been dealt
> with after more than a month? I believe it is necessary to include it in
> version 3.7.1 and release it as soon as possible to prevent more users from
> being affected.
>
> Jianbin Chen, githubId: funky-eyes
>


Re: [VOTE] KIP 1047 - Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder

2024-05-24 Thread Luke Chen
+1 (binding)
Thanks Frank!

Luke

On Fri, May 24, 2024 at 5:21 PM Josep Prat 
wrote:

> Hi Frank,
>
> thanks for the KIP.
> +1 (binding)
>
> Best,
>
> On Fri, May 24, 2024 at 11:11 AM Kuan Po Tseng 
> wrote:
>
> > +1 (non-binding)
> >
> > On 2024/05/23 16:26:42 Frank Yang wrote:
> > > Hi all,
> > >
> > > I would like to start a vote on KIP-1047: Introduce new
> > > org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder.
> > >
> > > KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1047+Introduce+new+org.apache.kafka.tools.api.Decoder+to+replace+kafka.serializer.Decoder
> > >
> > > Discussion thread:
> > https://lists.apache.org/thread/n3k6vb4vddl1s5nopcyglnddtvzp4j63
> > >
> > > Thanks and regards,
> > > PoAn
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Kafka jenkins is unable to run and view old builds.

2024-05-26 Thread Luke Chen
Hi all,

Currently, the Kafka jenkins is unable to run and view old builds.
I've filed INFRA-25824 
to infra team.

Thanks.
Luke


Re: Action requested: Changes to CI for JDK 11 & 17 builds on Pull Requests

2024-05-27 Thread Luke Chen
> I did not see build failure that happens in 11 and 17 but not in 8 or 21,
and also it can save more CI resources and make our CI be thinner.
Same here. I've never seen build passed in jdk 21 but failed in 11 or 17.
But even if it happened, it is rare. I think we are just making a trade-off
to make CI more reliable and faster.

Thanks.
Luke

On Tue, May 28, 2024 at 2:22 PM Chia-Ping Tsai  wrote:

> Dear all,
>
> I do love Harris's patch as no one love slow CI I believe. For another, I
> file https://issues.apache.org/jira/browse/KAFKA-16847 just now to revise
> our readme about JDK. I'd like to raise more discussion here.
>
> > Note that compilation with Java 11/17 doesn't add any value over
> compiling
> > with Java 21 with the appropriate --release config (which we set). So,
> this
> > part of the build process is wasteful.
>
> I did not see build failure that happens in 11 and 17 but not in 8 or 21,
> and also it can save more CI resources and make our CI be thinner. Hence,
> I'm +1 to drop 11 and 17 totally.
>
> Best,
> Chia-Ping
>
>
> On 2024/05/28 04:40:48 Ismael Juma wrote:
> > Hi Greg,
> >
> > Thanks for making this change.
> >
> > Note that compilation with Java 11/17 doesn't add any value over
> compiling
> > with Java 21 with the appropriate --release config (which we set). So,
> this
> > part of the build process is wasteful. Running the tests does add some
> > value (and hence why we originally had it), but the return on investment
> is
> > not good enough given our CI issues (and hence why the change is good).
> >
> > Ismael
> >
> > On Mon, May 27, 2024, 8:20 PM Greg Harris 
> > wrote:
> >
> > > Hello Apache Kafka Developers,
> > >
> > > In order to better utilize scarce CI resources shared with other Apache
> > > projects, the Kafka project will no longer be running full test suites
> for
> > > the JDK 11 & 17 components of PR builds.
> > >
> > > *Action requested: If you have an active pull request, please merge or
> > > rebase the latest trunk into your branch* before continuing
> development as
> > > normal. You may wait to push the resulting branch until you make
> another
> > > commit, or push the result immediately.
> > >
> > > What to expect with this change:
> > > * Trunk (and release branch) builds will not be affected.
> > > * JDK 8 and 21 builds will not be affected.
> > > * Compilation will not be affected.
> > > * Static analysis (spotbugs, checkstyle, etc) will not be affected.
> > > * Overall build execution time should be similar or slightly better
> than
> > > before.
> > > * You can expect fewer tests to be run on your PRs (~6 instead of
> > > ~12).
> > > * Test flakiness should be similar or slightly better than before.
> > >
> > > And as a reminder, build failures (red indicators in CloudBees) are
> always
> > > blockers for merging. Starting now, the 11 and 17 builds should always
> pass
> > > (green indicators in CloudBees) before merging, as failed tests (yellow
> > > indicators in CloudBees) should no longer be present.
> > >
> > > Thanks everyone,
> > > Greg Harris
> > >
> >
>


Re: Action requested: Changes to CI for JDK 11 & 17 builds on Pull Requests

2024-05-28 Thread Luke Chen
Wow! I've never seen this beautiful green on jenkins for years!
Thanks Greg!!

Luke

On Tue, May 28, 2024 at 4:12 PM Chia-Ping Tsai  wrote:

> Please take a look at following QA. ALL PASS!!!
>
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15889/25/pipeline
>
> I almost cried, and BIG thanks to Harris!!!
>
> On 2024/05/28 03:20:01 Greg Harris wrote:
> > Hello Apache Kafka Developers,
> >
> > In order to better utilize scarce CI resources shared with other Apache
> > projects, the Kafka project will no longer be running full test suites
> for
> > the JDK 11 & 17 components of PR builds.
> >
> > *Action requested: If you have an active pull request, please merge or
> > rebase the latest trunk into your branch* before continuing development
> as
> > normal. You may wait to push the resulting branch until you make another
> > commit, or push the result immediately.
> >
> > What to expect with this change:
> > * Trunk (and release branch) builds will not be affected.
> > * JDK 8 and 21 builds will not be affected.
> > * Compilation will not be affected.
> > * Static analysis (spotbugs, checkstyle, etc) will not be affected.
> > * Overall build execution time should be similar or slightly better than
> > before.
> > * You can expect fewer tests to be run on your PRs (~6 instead of
> > ~12).
> > * Test flakiness should be similar or slightly better than before.
> >
> > And as a reminder, build failures (red indicators in CloudBees) are
> always
> > blockers for merging. Starting now, the 11 and 17 builds should always
> pass
> > (green indicators in CloudBees) before merging, as failed tests (yellow
> > indicators in CloudBees) should no longer be present.
> >
> > Thanks everyone,
> > Greg Harris
> >
>


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-05-29 Thread Luke Chen
Hi Josep,

Thanks for raising this.
I'm +1 for delaying some time to have features completed.

But I think we might need to make it clear, what's the updated feature
freeze date/code freeze date?
Is this correct?
- Feature freeze is on May 12th
- Code freeze is June 26th


Thanks.
Luke

On Wed, May 29, 2024 at 5:38 PM Josep Prat 
wrote:

> Hi Kafka developers,
>
> Given the fact we have a couple of KIPs that are halfway through their
> implementation and it seems it's a matter of days (1 or 2 weeks) to have
> them completed. What would you think if we delay feature freeze and code
> freeze by 2 weeks? Let me know your thoughts.
>
> Best,
>
> On Tue, May 28, 2024 at 8:47 AM Josep Prat  wrote:
>
> > Hi Kafka developers,
> >
> > This is a reminder about the upcoming deadlines:
> > - Feature freeze is on May 29th
> > - Code freeze is June 12th
> >
> > I'll cut the new branch during morning hours (CEST) on May 30th.
> >
> > Thanks all!
> >
> > On Thu, May 16, 2024 at 8:34 AM Josep Prat  wrote:
> >
> >> Hi all,
> >>
> >> We are now officially past the KIP freeze deadline. KIPs that are
> >> approved after this point in time shouldn't be adopted in the 3.8.x
> release
> >> (except the 2 already mentioned KIPS 989 and 1028 assuming no vetoes
> occur).
> >>
> >> Reminder of the upcoming deadlines:
> >> - Feature freeze is on May 29th
> >> - Code freeze is June 12th
> >>
> >> If you have an approved KIP that you know already you won't be able to
> >> complete before the feature freeze deadline, please update the Release
> >> column in the
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >> page.
> >>
> >> Thanks all,
> >>
> >> On Wed, May 15, 2024 at 8:53 PM Josep Prat  wrote:
> >>
> >>> Hi Nick,
> >>>
> >>> If nobody comes up with concerns or pushback until the time of closing
> >>> the vote, I think we can take it for 3.8.
> >>>
> >>> Best,
> >>>
> >>> -
> >>>
> >>> Josep Prat
> >>> Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> >>> +491715557497 | aiven.io
> >>> Aiven Deutschland GmbH
> >>> Alexanderufer 3-7, 10117 Berlin
> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>
> >>> On Wed, May 15, 2024, 20:48 Nick Telford 
> wrote:
> >>>
>  Hi Josep,
> 
>  Would it be possible to sneak KIP-989 into 3.8? Just as with 1028,
> it's
>  currently being voted on and has already received the requisite votes.
>  The
>  only thing holding it back is the 72 hour voting window.
> 
>  Vote thread here:
>  https://lists.apache.org/thread/nhr65h4784z49jbsyt5nx8ys81q90k6s
> 
>  Regards,
> 
>  Nick
> 
>  On Wed, 15 May 2024 at 17:47, Josep Prat  >
>  wrote:
> 
>  > And my maths are wrong! I added 24 hours more to all the numbers in
>  there.
>  > If after 72 hours no vetoes appear, I have no objections on adding
>  this
>  > specific KIP as it shouldn't have a big blast radius of affectation.
>  >
>  > Best,
>  >
>  > On Wed, May 15, 2024 at 6:44 PM Josep Prat 
>  wrote:
>  >
>  > > Ah, I see Chris was faster writing this than me.
>  > >
>  > > On Wed, May 15, 2024 at 6:43 PM Josep Prat 
>  wrote:
>  > >
>  > >> Hi all,
>  > >> You still have the full day of today (independently for the
>  timezone) to
>  > >> get KIPs approved. Tomorrow morning (CEST timezone) I'll send
>  another
>  > email
>  > >> asking developers to assign future approved KIPs to another
> version
>  > that is
>  > >> not 3.8.
>  > >>
>  > >> So, the only problem I see with KIP-1028 is that it hasn't been
>  open for
>  > >> a vote for 72 hours (48 hours as of now). If there is no negative
>  > voting on
>  > >> the KIP I think we can let that one in, given it would only miss
>  the
>  > >> deadline by less than 12 hours (if my timezone maths add up).
>  > >>
>  > >> Best,
>  > >>
>  > >> On Wed, May 15, 2024 at 6:35 PM Ismael Juma 
>  wrote:
>  > >>
>  > >>> The KIP freeze is just about having the KIP accepted. Not sure
>  why we
>  > >>> would
>  > >>> need an exception for that.
>  > >>>
>  > >>> Ismael
>  > >>>
>  > >>> On Wed, May 15, 2024 at 9:20 AM Chris Egerton <
>  fearthecel...@gmail.com
>  > >
>  > >>> wrote:
>  > >>>
>  > >>> > FWIW I think that the low blast radius for KIP-1028 should
>  allow it
>  > to
>  > >>> > proceed without adhering to the usual KIP and feature freeze
>  dates.
>  > >>> Code
>  > >>> > freeze is probably worth still  respecting, at least if
> changes
>  are
>  > >>> > required to the docker/jvm/Dockerfile. But I defer to Josep's
>  > >>> judgement as
>  > >>> > the release manager.
>  > >>> >
>  > >>> > On Wed, May 15, 2024, 06:59 Vedarth Sharma <
>  vedarth.sha...@gmail.com

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-05-31 Thread Luke Chen
Hi Justine,

In the KIP-1012 discussion thread
, our
conclusion should be having an "automatic" unclean leader election in
KRaft, even if KIP-966 cannot complete in time.

> We should specify in KIP-1012 that we need to have some way to configure
the system to automatically do unclean leader election. If we run out of
time implementing KIP-966, this could be something quite simple, like
honoring the static unclean.leader.election = true configuration.

I think we still need to include this in v3.8.0, to honor the static
unclean.leader.election = true configuration.

Thanks.
Luke



On Fri, May 31, 2024 at 1:55 AM Justine Olshan 
wrote:

> My understanding is on Kraft, automatic unclean leadership election is
> disabled, but it can be manually triggered.
>
> See this note from Colin on another email thread:
> > We do have the concept of unclean leader election in KRaft, but it has to
> be triggered by the leader election tool currently. We've been talking
> about adding configuration-based unclean leader election as part of the
> KIP-966 work.
>
> Just wanted to add this clarification.
>
> Justine
>
> On Thu, May 30, 2024 at 9:38 AM Calvin Liu 
> wrote:
>
> > Hi Mickael,
> > Part 1 adds the ELR and enables the leader election improvements related
> to
> > ELR. It does not change unclean leader election behavior which I think is
> > hard-coded to be disabled.
> > Part 2 should replace the current unclean leader election with the
> unclean
> > recovery. Colin McCabe will help with part 2 as the Kraft controller
> > expert. Thanks Colin!
> >
> >
> >
> >
> > On Thu, May 30, 2024 at 2:43 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > What's not clear from your reply is whether "KIP-966 Part 1" contains
> > > the ability to perform unclean leader elections with KRaft?
> > > Hopefully we have committers already looking at these. If you need
> > > additional help, please shout (well ping!)
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Thu, May 30, 2024 at 6:05 AM Ismael Juma  wrote:
> > > >
> > > > Sounds good, thanks Josep!
> > > >
> > > > Ismael
> > > >
> > > > On Wed, May 29, 2024 at 7:51 AM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > I think your proposal makes more sense than mine. The end goal is
> to
> > > try to
> > > > > get these 2 KIPs in 3.8.0 if possible. I think we can also achieve
> > > this by
> > > > > not delaying the general feature freeze, but rather by cherry
> picking
> > > the
> > > > > future commits on these features to the 3.8 branch.
> > > > >
> > > > > So I would propose to leave the deadlines as they are and manually
> > > cherry
> > > > > pick the commits related to KIP-853 and KIP-966.
> > > > >
> > > > > Best,
> > > > >
> > > > > On Wed, May 29, 2024 at 3:48 PM Ismael Juma 
> > wrote:
> > > > >
> > > > > > Hi Josep,
> > > > > >
> > > > > > It's generally a bad idea to push these dates because the scope
> > keeps
> > > > > > increasing then. If there are features that need more time and we
> > > believe
> > > > > > they are essential for 3.8 due to its special nature as the last
> > > release
> > > > > > before 4.0, we should allow them to be cherry-picked to the
> release
> > > > > branch
> > > > > > versus delaying the feature freeze and code freeze for
> everything.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, May 29, 2024 at 2:38 AM Josep Prat
> > > 
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Kafka developers,
> > > > > > >
> > > > > > > Given the fact we have a couple of KIPs that are halfway
> through
> > > their
> > > > > > > implementation and it seems it's a matter of days (1 or 2
> weeks)
> > to
> > > > > have
> > > > > > > them completed. What would you think if we delay feature freeze
> > and
> > > > > code
> > > > > > > freeze by 2 weeks? Let me know your thoughts.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > On Tue, May 28, 2024 at 8:47 AM Josep Prat <
> josep.p...@aiven.io>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Kafka developers,
> > > > > > > >
> > > > > > > > This is a reminder about the upcoming deadlines:
> > > > > > > > - Feature freeze is on May 29th
> > > > > > > > - Code freeze is June 12th
> > > > > > > >
> > > > > > > > I'll cut the new branch during morning hours (CEST) on May
> > 30th.
> > > > > > > >
> > > > > > > > Thanks all!
> > > > > > > >
> > > > > > > > On Thu, May 16, 2024 at 8:34 AM Josep Prat <
> > josep.p...@aiven.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi all,
> > > > > > > >>
> > > > > > > >> We are now officially past the KIP freeze deadline. KIPs
> that
> > > are
> > > > > > > >> approved after this point in time shouldn't be adopted in
> the
> > > 3.8.x
> > > > > > > release
> > > > > > > >> (except the 2 already mentioned KIPS 989 and 1028 assuming
> no
> > > vetoes
> > > > > > > occur).
> > > > > > > >>
> > > > > > > >> Reminder of the upcoming deadlin

Re: Build hanging

2024-06-07 Thread Luke Chen
Hi Haruki,

Thanks for identifying this blocking test.
Could you help quickly open a PR to disable this test to unblock the CI
build?

Thanks.
Luke

On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada  wrote:

> Hi
>
> I found that the hanging can be reproduced locally.
> The blocking test is
>
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> It started to block after this commit (
>
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> )
>
> Thanks,
>
> 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
>
> > Seems like the build is currently broken -- specifically, a test is
> hanging
> > and causing it to abort after 7+ hours. There are many examples in the
> > current PRs, such as
> >
> > Timed out after almost 8 hours:
> > 1.
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> > 2.
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
> >
> > Still running after 6+ hours:
> > 1.
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
> >
> > It's pretty difficult to tell which test is hanging but it seems like one
> > of the commits in the last 1-2 days is the likely culprit. If anyone has
> an
> > idea of what may have caused this or is actively investigating, please
> let
> > everyone know.
> >
> > Needless to say, this is rather urgent given the upcoming 3.8 code
> freeze.
> >
> > Thanks,
> > Sophie
> >
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


Re: Build hanging

2024-06-07 Thread Luke Chen
> Let's disable for now to unblock builds, and revert later if we can't
solve
it until code freeze?

That's exactly what I meant to do.
I've opened KAFKA-16916 <https://issues.apache.org/jira/browse/KAFKA-16916>
for this issue and assigned to you.
Welcome to unassign yourselves if you don't have time to fix the
adminClient behavior change issue.
But, let's disable it first.

Thanks.
Luke


On Sat, Jun 8, 2024 at 8:55 AM Haruki Okada  wrote:

> Hi Luke,
>
> I see, but since this is likely due to AdminClient's behavior change, we
> need to fix it anyways not only disabling test before 3.8 release.
> Let's disable for now to unblock builds, and revert later if we can't solve
> it until code freeze?
>
> 2024年6月8日(土) 9:31 Luke Chen :
>
> > Hi Haruki,
> >
> > Thanks for identifying this blocking test.
> > Could you help quickly open a PR to disable this test to unblock the CI
> > build?
> >
> > Thanks.
> > Luke
> >
> > On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada  wrote:
> >
> > > Hi
> > >
> > > I found that the hanging can be reproduced locally.
> > > The blocking test is
> > >
> > >
> >
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> > > It started to block after this commit (
> > >
> > >
> >
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> > > )
> > >
> > > Thanks,
> > >
> > > 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
> > >
> > > > Seems like the build is currently broken -- specifically, a test is
> > > hanging
> > > > and causing it to abort after 7+ hours. There are many examples in
> the
> > > > current PRs, such as
> > > >
> > > > Timed out after almost 8 hours:
> > > > 1.
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> > > > 2.
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
> > > >
> > > > Still running after 6+ hours:
> > > > 1.
> > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
> > > >
> > > > It's pretty difficult to tell which test is hanging but it seems like
> > one
> > > > of the commits in the last 1-2 days is the likely culprit. If anyone
> > has
> > > an
> > > > idea of what may have caused this or is actively investigating,
> please
> > > let
> > > > everyone know.
> > > >
> > > > Needless to say, this is rather urgent given the upcoming 3.8 code
> > > freeze.
> > > >
> > > > Thanks,
> > > > Sophie
> > > >
> > >
> > >
> > > --
> > > 
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > 
> > >
> >
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-12 Thread Luke Chen
Hi Josep

For KIP-966, I think Calvin had mentioned he won't complete in v3.8.0.
https://lists.apache.org/thread/fsnr8wy5fznzfso7jgk90skgyo277fmw

For unclean leader election, all we need is this PR:
https://github.com/apache/kafka/pull/16284
For this PR, I think it needs one more week to be completed.

Thanks.
Luke

On Wed, Jun 12, 2024 at 4:51 PM Josep Prat 
wrote:

> Hi all,
>
> We are now really close to the planned code freeze deadline (today EOD).
> According to KIP-1012 [1] we agreed to stay in the 3.x branch until we
> achieve feature parity regarding Zookeeper and KRaft. The two main KIPs
> identified that would achieve this are: KIP-853 [2] and KIP-966 [3].
> At the moment of writing this email both KIPs are not completed. My
> question to the people driving both KIPs would be, how much more time do
> you think it's needed to bring these KIPs to completion?
>
> - If the time needed would be short, we could still include these 2 KIPs in
> the release.
> - However, if the time needed would be on the scale of weeks, we should
> continue with the release plan for 3.8 and after start working on the 3.9
> release.
>
> What are your thoughts?
>
>
> [1]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1012%3A+The+need+for+a+Kafka+3.8.x+release
> [2]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes
> [3]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
>
> On Wed, Jun 12, 2024 at 10:40 AM Josep Prat  wrote:
>
> > Hi Rajini,
> > Yes, we could backport this one to the 3.8 branch. Would you be able to
> do
> > this once you merge this PR?
> >
> > Thanks
> >
> > On Tue, Jun 11, 2024 at 10:53 PM Rajini Sivaram  >
> > wrote:
> >
> >> Hi Josep,
> >>
> >> The PR https://github.com/apache/kafka/pull/13277 for KIP-899 looks
> ready
> >> to be merged (waiting for the PR build).The PR changes several files,
> but
> >> is relatively straightforward and not risky. Also the changes are under
> a
> >> config that is not enabled by default. Since the KIP was approved before
> >> KIP freeze, will it be ok to include in 3.8.0?
> >>
> >> Thank you,
> >>
> >> Rajini
> >>
> >>
> >> On Tue, Jun 11, 2024 at 9:35 AM Josep Prat  >
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I just want to remind everybody that the code freeze deadline is
> >> > approaching. June 12th EOD is the deadline.
> >> >
> >> > Please do not automatically backport any commit to the 3.8 branch
> after
> >> > June 12th EOD. Ping me if you think the commit should be backported
> >> (fixes
> >> > failures in the branch or critical bug fixes).
> >> >
> >> > Thanks all!
> >> >
> >> > On Sat, Jun 1, 2024 at 8:43 PM José Armando García Sancio
> >> >  wrote:
> >> >
> >> > > Hi Josep,
> >> > >
> >> > > See my comments below.
> >> > >
> >> > > On Wed, May 29, 2024 at 10:52 AM Josep Prat
> >>  >> > >
> >> > > wrote:
> >> > > > So I would propose to leave the deadlines as they are and manually
> >> > cherry
> >> > > > pick the commits related to KIP-853 and KIP-966.
> >> > >
> >> > > Your proposal sounds good to me. I suspect that will be doing
> feature
> >> > > development for KIP-853 past the feature freeze and code freeze
> date.
> >> > > Maybe feature development will be finished around the end of June.
> >> > >
> >> > > I'll make sure to cherry pick commits for KIP-853 to the 3.8 branch
> >> > > once we have one.
> >> > >
> >> > > Thanks,
> >> > > --
> >> > > -José
> >> > >
> >> >
> >> >
> >> > --
> >> > [image: Aiven] 
> >> >
> >> > *Josep Prat*
> >> > Open Source Engineering Director, *Aiven*
> >> > josep.p...@aiven.io   |   +491715557497
> >> > aiven.io    |   <
> >> https://www.facebook.com/aivencloud
> >> > >
> >> >      <
> >> > https://twitter.com/aiven_io>
> >> > *Aiven Deutschland GmbH*
> >> > Alexanderufer 3-7, 10117 Berlin
> >> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >> > Amtsgericht Charlottenburg, HRB 209739 B
> >> >
> >>
> >
> >
> > --
> > [image: Aiven] 
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io    |
> > 
> >    <
> https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht 

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-13 Thread Luke Chen
I'm also +1 to have 3.8 release on time, and have a shorter release cycle
for v3.9.
But we have to explicitly define what "shorter release cycle" means.
I agree with Greg's suggestion:
cutting the 3.9 branch immediately after the Kraft KIPS are
feature-complete, or 4 months after the 3.8 release, whichever comes first.

Also, what does that mean to v4.0?
Release 4.0.0: 3 - 4 months after the 3.9 branch is created?

Thanks.
Luke



On Fri, Jun 14, 2024 at 4:32 AM Greg Harris 
wrote:

> Hi Mickael,
>
> I agree +1 to proceeding with the 3.8 release on-time.
>
> I'm fine with cutting the 3.9 branch immediately after the Kraft KIPS are
> feature-complete, or 4 months after the 3.8 release, whichever comes first.
>
> Thanks,
> Greg
>
> On Thu, Jun 13, 2024 at 1:29 PM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > We follow a time based release process precisely to avoid this type of
> > issues.
> > Rushing to complete a feature and merging it just before the release
> > puts pressure on the contributors, and leaves little time to properly
> > test it. Especially on a complex feature like KIP-853.
> >
> > I'd be +1 on releasing 3.8 now and doing a 3.9 release to reach
> > feature parity. If we really want, as Sophie suggested, we could do a
> > shorter cycle for 3.9 before moving onto 4.0.
> >
> > Thanks,
> > Mickael
> >
> > On Thu, Jun 13, 2024 at 10:23 PM Christopher X Bogan
> >  wrote:
> > >
> > > is this where I ask
> > >  to join?
> > >
> > > On Thu, Jun 13, 2024 at 1:20 PM Greg Harris
>  > >
> > > wrote:
> > >
> > > > Hi Sophie and Justine,
> > > >
> > > > I share your concerns about delaying 3.8 in order to give the Kraft
> > KIPs
> > > > more time for implementation. I raised them in the discussion for
> > KIP-1012
> > > > [1]:
> > > >
> > > > > I think there is a
> > > > > risk that features that are on-time and eligible for a 3.8 release
> > > > > could be delayed by some KIPs which are given special treatment.
> > > >
> > > > This situation is exactly why Kafka has standardized on time based
> > releases
> > > > [2], and It is not exceptional for features to slip from releases in
> > order
> > > > to keep the releases on-time, it's a very intentional choice of
> > priorities.
> > > >
> > > > I don't think the situation we're in warrants parallel development,
> and
> > > > I'm uncomfortable with incurring the additional risk to users by
> doing
> > a
> > > > nonstandard release.
> > > > For example, two risks are that 3.9 never happens, or happens much
> > later
> > > > than expected (after 4.1, 4.2, etc). There is a remote chance of
> these
> > > > happening, but we should be prepared if these features get delayed
> > further.
> > > > Users could be left behind waiting for a 3.9 release without an
> upgrade
> > > > path for new features or security updates. I found this to be the
> most
> > > > compelling motivation for KIP-1012, and parallel development doesn't
> > > > address it.
> > > > If the 3.9 release comes out much later, users may be unsafe
> upgrading
> > from
> > > > 3.9 to some 4.x versions, and we would need special notices to
> explain
> > this
> > > > non-linearity in our versions.
> > > >
> > > > Thanks all,
> > > > Greg
> > > >
> > > > [1] https://lists.apache.org/thread/kvdp2gmq5gd9txkvxh5vk3z2n55b04s5
> > > > [2]
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan
> > > >
> > > > On Thu, Jun 13, 2024 at 1:17 PM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > I know we discarded parallel branching, but it was under the scope
> of
> > > > 3.8.0
> > > > > and with the KIPs no yet approved.
> > > > > We could also not do a parallel release, but rather "quick" 3.9 and
> > then
> > > > > start with 4.0.
> > > > >
> > > > > Best
> > > > > -
> > > > > Josep Prat
> > > > > Open Source Engineering Director, Aiven
> > > > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > > > Aiven Deutschland GmbH
> > > > > Alexanderufer 3-7, 10117 Berlin
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > > > On Thu, Jun 13, 2024, 22:08 Josep Prat 
> wrote:
> > > > >
> > > > > > Hi Sophie,
> > > > > >
> > > > > > I have a call tomorrow with José to clarify the estimates for
> > KIP-853.
> > > > > > I also wouldn't like to delay the release for a month or more.
> > > > > >
> > > > > > Regarding your proposal, I find it would be a good way forward,
> +1
> > from
> > > > > my
> > > > > > side.
> > > > > >
> > > > > >
> > > > > > I also find this release and what should include is a hot topic.
> > > > > > What do others think?
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > 
> > > > > > Josep Prat
> > > > > > Open Source Engineering Director, Aiven josep.p...@aiven.io   |
> > > > > > +491715557497 | aiven.io
> > > > > > Aiven Deutschland GmbH
> > > > > > Alexanderufer 3-7, 10117 Berlin
> > > > > > Geschäftsführer: Oskari Saaren

Re: [VOTE] 3.7.1 RC1

2024-06-14 Thread Luke Chen
Hi all,

After running system tests, here are failing tests:

1. kafkatest.tests.core.upgrade_test failed at lz4 and snappy tests:
KAFKA-16962 
2.  StreamsUpgradeTest failed at "fromVersion"=2.2 ~ 2.5, but passed for
the rest of the versions: KAFKA-16960

3. TestKRaftUpgrade failed at "fromVersion"=3.3 ~ 3.5 under combined KRaft:
KAFKA-16961 

Need help to take a look, or maybe re-run in another environment.

Thanks.
Luke

On Fri, Jun 14, 2024 at 5:03 PM Edoardo Comar  wrote:

> well if you still need to build RC2, I'd cherry pick this that I
> missed from the other PR
> https://github.com/apache/kafka/pull/16326
>
> On Thu, 13 Jun 2024 at 13:20, Igor Soarez  wrote:
> >
> > Hi Edoardo,
> >
> > It is late, but not too late. I have cherry-picked your change
> > to the 3.7 branch and I'll build a second release candidate.
> >
> > If you could have a look at the first RC, please let me know if
> > you spot any issues with it that can be avoided in the next RC.
> >
> > Thanks,
> >
> > --
> > Igor
>


Re: [VOTE] 3.7.1 RC1

2024-06-15 Thread Luke Chen
Update to the system test investigation:

1. kafkatest.tests.core.upgrade_test failed at lz4 and snappy tests:
KAFKA-16962
<https://issues.apache.org/jira/browse/KAFKA-16962>
--> This test tested the upgrade from version [0.9 ~ 3.6] to v3.7, and only
v0.9 failed, others all passed. Given v0.9 version is a very old version,
this issue should not be the blocker for this release.

2.  StreamsUpgradeTest failed at "fromVersion"=2.2 ~ 2.5, but passed for
the rest of the versions: KAFKA-16960
<https://issues.apache.org/jira/browse/KAFKA-16960>
--> Thanks to Matthias's investigation, it looks like it's an environmental
issue. Closed.

3. TestKRaftUpgrade failed at "fromVersion"=3.3 ~ 3.5 under combined KRaft:
KAFKA-16961 <https://issues.apache.org/jira/browse/KAFKA-16961>
--> This is a side effect when we fixed another issue. I've created
KAFKA-16969 <https://issues.apache.org/jira/browse/KAFKA-16969> for the
issue. This is a blocker for v3.7.1 because it will block users using KRaft
combined mode with multiple log.dirs upgrade from older version to v3.7.1.
We're planning to revert the previous change for now.

Thank you.
Luke



On Sat, Jun 15, 2024 at 10:06 AM Justine Olshan
 wrote:

> The import fix is in. (As well as the integration tag)
> I did notice a large number of failing tests on the PR build though. Not
> sure if some of these were fixed in trunk and if we want to pick up those
> fixes.
>
> Justine
>
> On Fri, Jun 14, 2024 at 6:19 PM Matthias J. Sax  wrote:
>
> > Replied on https://issues.apache.org/jira/browse/KAFKA-16960
> >
> > On 6/14/24 6:05 AM, Luke Chen wrote:
> > > Hi all,
> > >
> > > After running system tests, here are failing tests:
> > >
> > > 1. kafkatest.tests.core.upgrade_test failed at lz4 and snappy tests:
> > > KAFKA-16962 <https://issues.apache.org/jira/browse/KAFKA-16962>
> > > 2.  StreamsUpgradeTest failed at "fromVersion"=2.2 ~ 2.5, but passed
> for
> > > the rest of the versions: KAFKA-16960
> > > <https://issues.apache.org/jira/browse/KAFKA-16960>
> > > 3. TestKRaftUpgrade failed at "fromVersion"=3.3 ~ 3.5 under combined
> > KRaft:
> > > KAFKA-16961 <https://issues.apache.org/jira/browse/KAFKA-16961>
> > >
> > > Need help to take a look, or maybe re-run in another environment.
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Fri, Jun 14, 2024 at 5:03 PM Edoardo Comar 
> > wrote:
> > >
> > >> well if you still need to build RC2, I'd cherry pick this that I
> > >> missed from the other PR
> > >> https://github.com/apache/kafka/pull/16326
> > >>
> > >> On Thu, 13 Jun 2024 at 13:20, Igor Soarez  wrote:
> > >>>
> > >>> Hi Edoardo,
> > >>>
> > >>> It is late, but not too late. I have cherry-picked your change
> > >>> to the 3.7 branch and I'll build a second release candidate.
> > >>>
> > >>> If you could have a look at the first RC, please let me know if
> > >>> you spot any issues with it that can be avoided in the next RC.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> --
> > >>> Igor
> > >>
> > >
> >
>


Re: [DISCUSS] KIP-1057: Add remote log metadata flag to the dump log tool

2024-06-19 Thread Luke Chen
Hi Federico,

Thanks for the KIP!
It's helpful for debugging the tiered storage issues.
+1 from me.

Thanks.
Luke

On Tue, Jun 18, 2024 at 12:18 AM Satish Duggana 
wrote:

> Thanks Federico for the KIP.
>
> This feature is helpful for developers while debugging tiered storage
> related issues.
>
> Even though RLMM is a pluggable interface, it is still useful to have
> a utility that is meant for the default/inbuilt implementation based
> on the internal topic. We can clarify that in the help notes and user
> docs.
>
> Users can still use alternatives like others suggested if they need to
> dump in a different format
> - Running the dump-logs tool with custom decoder
> - Running kafka-consumer.sh on the topic.
>
> ~Satish.
>
>
> ~Satish.
>
>
>
> On Mon, 17 Jun 2024 at 15:55, Federico Valeri 
> wrote:
> >
> > Hi Kamal,
> >
> > On Mon, Jun 17, 2024 at 11:44 AM Kamal Chandraprakash
> >  wrote:
> > >
> > > We can use the console-consumer to read the contents of the
> > > `__remote_log_metadata` topic. Why are we proposing a new tool?
> > >
> > > sh kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
> > > __remote_log_metadata  --consumer-property
> exclude.internal.topics=false
> > > --formatter
> > >
> org.apache.kafka.server.log.remote.metadata.storage.serialization.RemoteLogMetadataSerde\$RemoteLogMetadataFormatter
> > > --from-beginning
> > >
> >
> > Thanks from bringing this up. It works fine but a running broker is
> > required, so it would make it inconvenient for a remote support
> > engineer. Also you may have to deal with client security
> > configuration, and it would be complicated to only dump specific
> > segments. I'm adding to the rejected alternative for now, but I'm open
> > to changes.
> >
> > >
> > >
> > > On Mon, Jun 17, 2024 at 12:53 PM Federico Valeri  >
> > > wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > On Sun, Jun 16, 2024 at 7:38 PM Divij Vaidya <
> divijvaidy...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hello Federico
> > > > >
> > > > > Please note that the topic-based RLMM is one of the possible
> > > > > implementations of RLMM. Hence, whatever solution we design here
> should:
> > > > 1\
> > > > > be explicit that this tooling only works for topic based RLMM 2\
> specify
> > > > > the handling of the failure mode when topic based RLMM is not
> being used.
> > > > >
> > > >
> > > > That's true, thanks for pointing out.
> > > >
> > > > > I would argue that Topic based RLMM cannot be treated the same as
> other
> > > > > internal topics. Topic based RLMM topic is an optional topic which
> can
> > > > have
> > > > > any possible schema (depending on plugin implementation) whereas
> > > > > other internal topics are always guaranteed to be present with a
> fixed
> > > > > schema.
> > > > >
> > > >
> > > > Right, I updated the KIP with an improved option description.
> > > >
> > > > > In light of the above statements, the rejected alternative sounds
> better
> > > > to
> > > > > me because:
> > > > > 1\ it provides the ability to dump logs for "any" RLMM
> implementation and
> > > > > not just topic based RLMM.
> > > > > 2\ we don't have to deal with schema evolution of topic based RLMM
> in
> > > > this
> > > > > tool. That responsibility will be delegated to the decoder class
> which
> > > > the
> > > > > operator can define using the flag "--value-decoder-class".
> > > > >
> > > > > Is there a reason that you are unable to use the rejected solution
> (which
> > > > > requires no changes) for debugging purposes?
> > > > >
> > > >
> > > > The rejected alternative will still be available, but I thought that
> > > > having a dedicated flag would make debugging easier, as I guess most
> > > > people will use the default RLMM implementation. I would be happy to
> > > > hear other opinions on this.
> > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jun 15, 2024 at 4:43 PM Federico Valeri <
> fedeval...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to kick off a discussion for KIP-1057, that proposes to
> add
> > > > > > remote log metadata flag to the dump log tool, which is useful
> when
> > > > > > debugging.
> > > > > >
> > > > > >
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1057%3A+Add+remote+log+metadata+flag+to+the+dump+log+tool
> > > > > >
> > > > > > Thanks,
> > > > > > Fede
> > > > > >
> > > >
>


Re: [VOTE] 3.7.1 RC2

2024-06-20 Thread Luke Chen
Hi Igor,

I had run system tests against this RC2, and all passed except:
kafkatest.tests.core.upgrade_test failed at lz4 and snappy tests:
KAFKA-16962

--> This test tested the upgrade from version [0.9 ~ 3.6] to v3.7, and only
v0.9 failed, others all passed. Given v0.9 version is a very old version,
this issue should not be the blocker for this release.

So, the system tests look good.

I also did:
1. ran the quickstart with the scala 2.12 binaries
2. check checksums.
3. browser javadoc

+1 from me.

Thank you.
Luke



On Wed, Jun 19, 2024 at 11:39 PM Mickael Maison 
wrote:

> Hi Krishna,
>
> Thanks for the clarification!
> In that case +1 (binding)
>
> Thanks,
> Mickael
>
> On Wed, Jun 19, 2024 at 5:30 PM Krishna Agarwal
>  wrote:
> >
> > Hi Mickael,
> >
> > The Docker Image CVE scan report can be found in the Docker Build Test
> > Pipeline[0] shared(in the initial vote email). I see there are no High or
> > Critical CVEs.
> >
> > [0]: https://github.com/apache/kafka/actions/runs/9572915509
> >
> > Regards,
> > Krishna
> >
> > On Wed, Jun 19, 2024 at 8:14 PM Mickael Maison  >
> > wrote:
> >
> > > Hi Igor,
> > >
> > > I did the following:
> > > - Checked signatures and checksums
> > > - Ran the quickstart with the 2.13 binaries and the Docker image
> > > - Built and run the tests from source
> > > - Quickly browsed the javadoc
> > >
> > > It all looks good, but before voting, could you run the Docker Image
> > > CVE Scanner GitHub action [0] on the new image to check it's CVE free?
> > > It looks like we don't have this action in the 3.7 branch so it'll
> > > probably involve some manual steps.
> > >
> > > 0: https://github.com/apache/kafka/actions/workflows/docker_scan.yml
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > Mickael
> > >
> > > On Wed, Jun 19, 2024 at 10:55 AM Igor Soarez 
> wrote:
> > > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the second candidate for release of Apache Kafka 3.7.1.
> > > >
> > > > This is a bugfix release with several fixes.
> > > >
> > > > Release notes for the 3.7.1 release:
> > > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Friday June 28, 11am UTC.
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/
> > > >
> > > > * Docker release artifact to be voted upon:
> > > > apache/kafka:3.7.1-rc2
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/javadoc/
> > > >
> > > > * Tag to be voted upon (off 3.7 branch) is the 3.7.1 tag:
> > > > https://github.com/apache/kafka/releases/tag/3.7.1-rc2
> > > >
> > > > * Documentation:
> > > > https://kafka.apache.org/37/documentation.html
> > > >
> > > > * Protocol:
> > > > https://kafka.apache.org/37/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 3.7 branch:
> > > > Unit/integration tests:
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/184/
> > > > The latest test run includes some flaky tests, all of which were
> > > confirmed to pass locally.
> > > >
> > > > System tests:
> > > > I don't have access to the Jenkins instance used for system tests in
> > > > https://jenkins.confluent.io/job/system-test-kafka/job/3.7
> > > > Luke has kindly shared results in the previous RC (thank you Luke!),
> > > > and all issues have been addressed.
> > > > If anyone with access is able to confirm the latest test results,
> please
> > > > reply with details.
> > > >
> > > > * Successful Docker Image Github Actions Pipeline for 3.7 branch:
> > > > Docker Build Test Pipeline:
> > > https://github.com/apache/kafka/actions/runs/9572915509
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > > Igor Soarez
> > >
>


Re: [DISCUSS] KIP-1052: Enable warmup in producer performance test

2024-06-21 Thread Luke Chen
Hi Matt,

Thanks for the KIP!
I agree having the warm-up records could help correctly analyze the
performance.

Some questions:
1. It looks like we will add 2 more options to producer perf tool:
 - --warmup-records
 - --combined-summary

Is this correct?
In the "public interface" section, only 1 is mentioned. Could you update it?
Also, in the KIP, you use the word: "An option such as "--warmup-records"
should be added...", it sounds like it is not decided, yet.
I suggest you update to say, we will add "--warmup-records" option for
" to make it clear.

2. What will be the output containing both warm-up and steady-state results?
Could you give an example directly?

For better understanding, I'd suggest you refer to KIP-1057

to add some examples using `kafka-producer-perf-test.sh` with the new
option, to show what it will output.

Thank you.
Luke

On Fri, Jun 21, 2024 at 10:39 AM Welch, Matt  wrote:

> Hi Divij,
>
> Thanks for your response.  You raise some very important points.
> I've updated the KIP to clarify the changes discussed here.
>
> 1. I agree that warmup stats should be printed separately.  I see two
> cases here, both of which would have two summary lines printed at the end
> of the producer perf test.  In the first case, warmup-separate, the warmup
> stats are printed first as warmup-only, followed by a second print of the
> steady state performance. In the second case, warmup-combined, the first
> print would look identical to the summary line that's currently used and
> would reflect the "whole test", with a second summary print of the
> steady-state performance.  This second case would allow for users to
> compare what the test would have looked like without a warmup to results of
> the test with a warmup. Although I've been looking at the second case
> lately, I can see merits of both cases and would be happy to support the
> warmup-separate case if that's the preference of the community.  Regarding
> the JMX metrics accumulated by Kafka, we need to decide if we should reset
> the JMX metrics between the warmup and steady state. While I like the idea
> of having separate JMX buckets for warmup and steady state, these
> statistics are usually heavily windowed, so should naturally settle toward
> steady-state values after a warmup.
>
> 2. The total number of records sent by the producer and defined by
> '--num-records' MUST be strictly greater than the '--warmup-records' or an
> error will be thrown. Negative warmup records should similarly throw an
> error.  Specifying warmup-records of "0" should have behavior identical to
> the current implementation.
>
> 3.  You're correct that choosing the warmup duration can have a
> significant impact on the test output if care is not taken.  I've updated
> the proposed change to describe a simplistic process to choose how many
> warmup records to use.  Without understanding all the factors that go into
> a warmup, a user could run a test and watch the time series output of the
> producer test to determine when steady state has been reached and warmup
> has completed.  The number of records at which the producer hits steady
> state could then be used in subsequent tests. In practice, we find that 1
> minute is a good warmup for most cases, since aside from networking and
> storage initialization, even the JVM should be warmed up by then and using
> compiled code rather than interpreted byte code. This is more a heuristic,
> however, and measured latency and throughput of the system should be used
> to determine steady state.
>
> 4.  The current design has the user specifying the warmup records like
> they would specify the number of records for the test. While this is
> related to the throughput, it seemed a better option to have the user
> specify the number of records in the warmup, rather than some kind of
> duration which would be more complex to track. I completely agree with your
> concern of warmup affecting steady state, however, especially in short
> tests. With a warmup "removing" some of the high latency from steady state
> results, it could be tempting for users to run very short tests since they
> no longer need to wait long to achieve a repeatable steady-state result. I
> would consider this a case of insufficient warmup since Kafka could still
> be processing the warmup records as you mention. Best practice for warmup
> duration would be to hit steady state during the warmup and only then
> consider it a successful warmup. Our preferred process is to monitor
> producer latency until it hits steady state in a first test, then double
> that duration for the warmup in subsequent testing. One minute is usually
> sufficient. A problem does occur when using unlimited throughput since the
> user does not yet know how fast the producers will send so can't estimate
> warmup records. If the iterative testing described above is not 

Re: [VOTE] KIP-1057: Add remote log metadata flag to the dump log tool

2024-06-21 Thread Luke Chen
Hi Fede,

Thanks for the KIP!
+1 from me.

Luke

On Fri, Jun 21, 2024 at 6:44 PM Federico Valeri 
wrote:

> Hi all, I'd like to kick off a vote on KIP-1057.
>
> Design doc:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1057%3A+Add+remote+log+metadata+flag+to+the+dump+log+tool
>
> Discussion thread:
> https://lists.apache.org/thread/kxx1h4qwshgcjh4d5xzqltkx5mx9qopm
>
> Thanks,
> Fede
>


Re: [VOTE] 3.7.1 RC2

2024-06-28 Thread Luke Chen
Hi Igor,

I think we've passed the vote for the release.
Are we waiting for anything to release v3.7.1?

Please let me know if you need help.

Thanks.
Luke

On Wed, Jun 26, 2024 at 5:17 AM Justine Olshan 
wrote:

> Hey folks,
>
> Sorry for the delay. I have done the following:
>
>
> 1. Checked signatures
> 2. Built from source
> 3. Inspected unit/integration test failures
> 4. Scanned documentation and other artifacts
> 5. Ran ZK and Kraft quick starts and simple workloads on staged binary for
> 2.13
>
> +1 (binding) from me.
> Thanks,
>
> Justine
>
> On Fri, Jun 21, 2024 at 11:03 AM Jakub Scholz  wrote:
>
> > +1 (non-binding). I used the staged binaries (based on Scala 2.13) and
> > Maven artifacts to run my tests. All seems to work fine.
> >
> > Thanks & Regards
> > Jakub
> >
> > On Wed, Jun 19, 2024 at 10:55 AM Igor Soarez  wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the second candidate for release of Apache Kafka 3.7.1.
> > >
> > > This is a bugfix release with several fixes.
> > >
> > > Release notes for the 3.7.1 release:
> > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Friday June 28, 11am UTC.
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/
> > >
> > > * Docker release artifact to be voted upon:
> > > apache/kafka:3.7.1-rc2
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 3.7 branch) is the 3.7.1 tag:
> > > https://github.com/apache/kafka/releases/tag/3.7.1-rc2
> > >
> > > * Documentation:
> > > https://kafka.apache.org/37/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/37/protocol.html
> > >
> > > * Successful Jenkins builds for the 3.7 branch:
> > > Unit/integration tests:
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/184/
> > > The latest test run includes some flaky tests, all of which were
> > confirmed
> > > to pass locally.
> > >
> > > System tests:
> > > I don't have access to the Jenkins instance used for system tests in
> > > https://jenkins.confluent.io/job/system-test-kafka/job/3.7
> > > Luke has kindly shared results in the previous RC (thank you Luke!),
> > > and all issues have been addressed.
> > > If anyone with access is able to confirm the latest test results,
> please
> > > reply with details.
> > >
> > > * Successful Docker Image Github Actions Pipeline for 3.7 branch:
> > > Docker Build Test Pipeline:
> > > https://github.com/apache/kafka/actions/runs/9572915509
> > >
> > > /**
> > >
> > > Thanks,
> > >
> > > --
> > > Igor Soarez
> > >
> >
>


Re: [ANNOUNCE] Apache Kafka 3.7.1

2024-07-01 Thread Luke Chen
Thanks Igor for running this release!

Luke

On Mon, Jul 1, 2024 at 5:45 PM Mickael Maison 
wrote:

> Congratulations!
>
> Thanks Igor for running the release.
>
> Mickael
>
> On Mon, Jul 1, 2024 at 11:26 AM Igor Soarez  wrote:
> >
> > The Apache Kafka community is pleased to announce the release for Apache
> Kafka 3.7.1
> >
> > This is a bug fix release and it includes fixes and improvements.
> >
> > All of the changes in this release can be found in the release notes:
> > https://www.apache.org/dist/kafka/3.7.1/RELEASE_NOTES.html
> >
> >
> > You can download the source and binary release (Scala 2.12 and 2.13)
> from:
> > https://kafka.apache.org/downloads#3.7.1
> >
> >
> ---
> >
> >
> > Apache Kafka is a distributed streaming platform with four core APIs:
> >
> >
> > ** The Producer API allows an application to publish a stream of records
> to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output stream to one or more output topics, effectively transforming the
> > input streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might
> > capture every change to a table.
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react
> > to the streams of data.
> >
> >
> > Apache Kafka is in use at large and small companies worldwide, including
> > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> >
> > A big thank you for the following 1086 contributors to this release!
> (Please report an unintended omission)
> >
> > Adrian Preston, Anatoly Popov, Andras Katona, Andrew Schofield, Anna
> Sophie Blee-Goldman, Anton Liauchuk, Apoorv Mittal, Ayoub Omari, Bill
> Bejeck, Bruno Cadonna, Calvin Liu, Cameron Redpath, Cheng-Kai Zhang,
> Chia-Ping Tsai, Chris Egerton, Colin Patrick McCabe, David Arthur, David
> Jacot, Divij Vaidya, Dmitry Werner, Edoardo Comar, flashmouse, Florin
> Akermann, Gantigmaa Selenge, Gaurav Narula, Greg Harris, Igor Soarez,
> ilyazr, Ismael Juma, Jason Gustafson, Jeff Kim, jiangyuan, Joel Hamill,
> John Yu, Johnny Hsu, José Armando García Sancio, Josep Prat, Jun Rao,
> Justine Olshan, Kamal Chandraprakash, Ken Huang, Kuan-Po (Cooper) Tseng,
> Lokesh Kumar, Luke Chen, Manikumar Reddy, Mario Pareja, Matthias J. Sax,
> Mayank Shekhar Narula, Mickael Maison, Murali Basani, Omnia Ibrahim, Paolo
> Patierno, PoAn Yang, Sagar Rao, sanepal, Sean Quah, Sebastian Marsching,
> Stanislav Kozlovski, Vedarth Sharma, Walker Carlson, Yash Mayya
> >
> > We welcome your help and feedback. For more information on how to
> > report problems, and to get involved, visit the project website at
> > https://kafka.apache.org/
> >
> > Thank you!
> >
> >
> > Regards,
> >
> > --
> > Igor Soarez
> > Release Manager for Apache Kafka 3.7.1
>


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Luke Chen
t; >>> >>> >>> > > >>>>
> >>> >>> >>> > > >>>> Best,
> >>> >>> >>> > > >>>> David
> >>> >>> >>> > > >>>>
> >>> >>> >>> > > >>>> Le ven. 14 juin 2024 à 21:57, José Armando García
> Sancio
> >>> >>> >>> > > >>>>  a écrit :
> >>> >>> >>> > > >>>>
> >>> >>> >>> > > >>>>> +1 on the proposed release plan for 3.8.
> >>> >>> >>> > > >>>>>
> >>> >>> >>> > > >>>>> Thanks!
> >>> >>> >>> > > >>>>>
> >>> >>> >>> > > >>>>> On Fri, Jun 14, 2024 at 3:33 PM Ismael Juma <
> >>> >>> m...@ismaeljuma.com
> >>> >>> >>> >
> >>> >>> >>> > > wrote:
> >>> >>> >>> > > >>>>>>
> >>> >>> >>> > > >>>>>> +1 to the plan we converged on in this thread.
> >>> >>> >>> > > >>>>>>
> >>> >>> >>> > > >>>>>> Ismael
> >>> >>> >>> > > >>>>>>
> >>> >>> >>> > > >>>>>> On Fri, Jun 14, 2024 at 10:46 AM Josep Prat
> >>> >>> >>> > > >>  >>> >>> >>> > > >>>>>>
> >>> >>> >>> > > >>>>>> wrote:
> >>> >>> >>> > > >>>>>>
> >>> >>> >>> > > >>>>>>> Hi all,
> >>> >>> >>> > > >>>>>>>
> >>> >>> >>> > > >>>>>>> Thanks Colin, yes go ahead.
> >>> >>> >>> > > >>>>>>>
> >>> >>> >>> > > >>>>>>> As we are now past code freeze I would like to ask
> >>> >>> everyone
> >>> >>> >>> > > involved
> >>> >>> >>> > > >>>>> in a
> >>> >>> >>> > > >>>>>>> KIP that is not yet complete, to verify if what
> >>> landed on
> >>> >>> >>> the 3.8
> >>> >>> >>> > > >>>>> branch
> >>> >>> >>> > > >>>>>>> needs to be reverted or if it can stay.
> Additionally,
> >>> >>> I'll be
> >>> >>> >>> > > pinging
> >>> >>> >>> > > >>>>> KIPs
> >>> >>> >>> > > >>>>>>> and Jira reporters asking for their status as some
> >>> Jiras
> >>> >>> >>> seem to
> >>> >>> >>> > > have
> >>> >>> >>> > > >>>>> all
> >>> >>> >>> > > >>>>>>> related GitHub PRs merged but their status is still
> >>> Open
> >>> >>> or
> >>> >>> >>> In
> >>> >>> >>> > > >>>>> Progress.
> >>> >>> >>> > > >>>>>>> I'll be checking all the open blockers and check if
> >>> they
> >>> >>> are
> >>> >>> >>> > > really a
> >>> >>> >>> > > >>>>>>> blocker or can be pushed.
> >>> >>> >>> > > >>>>>>>
> >>> >>> >>> > > >>>>>>>
> >>> >>> >>> > > >>>>>>> Regarding timeline, I'll attempt to generate the
> >>> first
> >>> >>> RC on
> >>> >>> >>> > > >> Wednesday
> >>> >>> >>> > > >>>>> or
> >>> >>

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Luke Chen
ite-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/service.py",
> line 345, in run
> self.wait()
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/background_thread.py",
> line 72, in wait
> super(BackgroundThreadService, self).wait(timeout_sec)
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/service.py",
> line 293, in wait
> raise TimeoutError("Timed out waiting %s seconds for service nodes
> to finish. " % str(timeout_sec)
> ducktape.errors.TimeoutError: Timed out waiting 600 seconds for
> service nodes to finish. These nodes are still alive:
> ['ProducerPerformanceService-0-140496695824336 node 1 on worker3']
>
>
> On Thu, Jul 4, 2024 at 11:57 AM Luke Chen  wrote:
>
> > Hi Josep,
> >
> > For this
> > - QuotaTest --> speaking with Bruno we suspect there is a problem with
> the
> > test setup, failed with "ValueError: max() arg is an empty sequence"
> >
> > It's a known issue: KAFKA-16138
> > <https://issues.apache.org/jira/browse/KAFKA-16138> .
> > It should be passed with local specific tests run.
> > Do you want me help verify it by running it in my environment?
> >
> > Thanks.
> > Luke
> >
> >
> >
> > On Thu, Jul 4, 2024 at 4:03 PM Josep Prat 
> > wrote:
> >
> > > Hi all,
> > >
> > > We have had 2[1][2] runs of the system tests since the last blocker was
> > > merged on 3.8. So far we have 19 tests that failed on both runs. I've
> > > compiled them in this list[3].
> > >
> > > There seems to these different categories of failing tests:
> > > - QuotaTest --> speaking with Bruno we suspect there is a problem with
> > the
> > > test setup, failed with "ValueError: max() arg is an empty sequence"
> > > - Streams cooperative rebalance upgrade --> It fails on versions 2.3.1
> or
> > > older, failed with Timeout
> > > - KRaft Upgrade --> from dev with Isolated and combined KRaft, failed
> > with
> > > RemoteCommandError
> > > - Network degrade test -> failed with RemoteCommandError
> > > - Replica verification tool test --> Timeout for KRaft, but ZK failed
> on
> > > the first run but worked on the second
> > >
> > > If someone has further ideas on what could be causing these failures,
> > > please let me know. Given holidays in the US, the possible test setup
> > > problem might not be able to be fixed today.
> > >
> > > [1]:
> > >
> > >
> >
> https://confluent-open-source-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.8/2024-07-02--001.05d6b151-356a-47e5-b724-6fcd79493422--1719991984--confluentinc--3.8--49d2ee3db9/report.html
> > > [2]:
> > >
> > >
> >
> https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-03--001.4803d99b-52df-4f6d-82c2-3f050a6207fa--1720038529--apache--3.8--2fbe32ecb9/report.html
> > > [3]:
> > >
> > >
> >
> https://docs.google.com/document/d/1wbcyzO6GM2SYQaqTMITBTBjHgZgM7mmiAt7TUfh1xt8/edit
> > >
> > > Best,
> > >
> > > On Tue, Jul 2, 2024 at 7:29 PM Josep Prat  wrote:
> > >
> > > > Hi all,
> > > > Thanks for reviewing and merging the latest blockers for 3.8.0.
> > Tomorrow,
> > > > I will start with the process to get the first RC out.
> > > >
> > > > Best!
> > > >
> > > > On Sat, Jun 29, 2024 at 9:04 PM Josep Prat 
> > wrote:
> > > >
> > > >> Hi Justine,
> > > >>
> > > >> Marking MV 3.8-IV0 as latest
> > > >> production MV is done in this PR (I did both together)
> > > >> https://github.com/apache/kafka/pull/16400
> > > >>
> > > >> Best,
> > > >>
> > > >> --
> > > >> Josep Prat
> > > >> Open Source Engineering Director, Aiven
> > > >> josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > >> Aiven Deutschland GmbH
> > > >> Alexanderufer 3-7, 10117 Berlin
> > > >> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > >> Amtsgericht Charlottenburg, HRB 209739 B
> > > >>
> > > >> On Sat, Jun 29, 2024, 00:52 Justine Olshan
> >  > > &g

Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-09 Thread Luke Chen
Hi Mickael,

Thanks for the KIP!
This is a long waiting feature for many users!

Questions:
1. I think piggyback the "BrokerHeartbeatRequest" to forward the corden log
dir to controller makes sense to me.
We already did similar things for fence, controller shutdown, failed log
dir...etc.

2. In the admin API, what parameters will the new added isCordoned() method
take?

3. In the KIP, we said:
"defaultDir(): This method will not return the Uuid of a log directory that
is not cordoned."
--> It's hard to understand. Does that mean we will only return cordoned
log dir?
>From the current java doc of the interface, it doesn't look right:
"Get the default directory for new partitions placed in a given broker."

4. Currently, if a broker is registered and then go offline. In this state,
the controller will still distribute partitions to this broker.
So, if now, the broker get startup with "cordoned.log.dirs" set, what will
happen?
Will the newly assigned partitions be created successfully or not?

5. I think after a log dir get cordoned, we can always uncordon it, right?
I think we should mention it in the KIP.

6. If a broker is startup with "cordoned.log.dirs" set, and does that mean
the internal topics partitions (ex: __consumer_offsets) cannot be created,
either?
Also, if this log dir is happen to be the metadata log dir, what will
happen to the metadata topic creation?

Thanks.
Luke


On Tue, Jul 9, 2024 at 12:12 AM Mickael Maison 
wrote:

> Hi,
>
> Thanks for taking a look.
>
> - Yes you're right, I meant AlterPartitionReassignments. Fixed.
> - That's a good idea. I was expecting users to discover cordoned log
> directories by describing broker configurations. But being able to
> also get this information when describing log directories makes sense.
> I've added that to the KIP.
>
> Thanks,
> Mickael
>
>
> On Fri, Jul 5, 2024 at 8:05 AM Haruki Okada  wrote:
> >
> > Hi,
> >
> > Thank you for the KIP.
> > The motivation sounds make sense to me.
> >
> > I have a few questions:
> >
> > - [nits] "AlterPartitions request" in Error handling section is
> > "AlterPartitionReassignments request" actually, right?
> > - Don't we need to include cordoned information in DescribeLogDirs
> response
> > too? Some tools (e.g. CruiseControl) need to have a way to know which
> > broker/log-dirs are cordoned to generate partition reassignment proposal.
> >
> > Thanks,
> >
> > 2024年7月4日(木) 22:57 Mickael Maison :
> >
> > > Hi,
> > >
> > > I'd like to start a discussion on KIP-1066 that introduces a mechanism
> > > to cordon log directories and brokers.
> > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories
> > >
> > > Thanks,
> > > Mickael
> > >
> >
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
>


Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-10 Thread Luke Chen
Hi Mickael,

Thanks for the response.

> 4. Cordoned log directories are persisted to the metadata log via the
RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
broker is offline, the controller will use the latest known state of
the broker to determine the broker's cordoned log directories. I've
added a sentence clarifying this point.

OK, so if the broker A goes offline, and the controller is in "fenced"
state, without any cordoned log dirs, then some topic created and assigned
to broker A. Later, broker A starts up with all its log dirs cordoned
configured. At this situation, will the broker A create the partitions?

> 6. I'm leaning towards considering that scenario a configuration
error. If all log directories are cordoned before the internal topics
are created, then the broker will not be able to create them. This
seems like a pretty strange scenario, where it's the first time you
start a broker, you've cordoned all its log directory, and the
internal topics (offsets and transactions) have not yet been created
in the rest of the cluster.

Yes, I agree that this should be a configuration error.
So the follow-up question is: Suppose users encounter this issue, and how
could they resolve it?
Uncordon the log dir dynamically using kafka-configs.sh? Will the
uncordoning config change recreate the partitions we didn't create earlier
because of log dir cordoned?

> The metadata log is different (not managed by LogManager), so I think
it should always be created regardless if its log directory is
cordoned or not.

I agree we should treat "__cluster_metadata" differently.


Thanks.
Luke


On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison 
wrote:

> Hi Luke,
>
> 2. isCordoned() is a new method on LogDirDescription. It does not take
> any arguments. It just returns true if this log directory the
> LogDirDescription represents is cordoned.
>
> 3. Sorry that was a typo. This method will only return a log directory
> that is not cordoned. Fixed
>
> 4. Cordoned log directories are persisted to the metadata log via the
> RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
> broker is offline, the controller will use the latest known state of
> the broker to determine the broker's cordoned log directories. I've
> added a sentence clarifying this point.
>
> 5. Yes a log directory can be uncordoned. You can either update the
> properties file and restart the broker or dynamically change the value
> at runtime using kafka-configs. I've added a paragraph about it in the
> KIP.
>
> 6. I'm leaning towards considering that scenario a configuration
> error. If all log directories are cordoned before the internal topics
> are created, then the broker will not be able to create them. This
> seems like a pretty strange scenario, where it's the first time you
> start a broker, you've cordoned all its log directory, and the
> internal topics (offsets and transactions) have not yet been created
> in the rest of the cluster.
> The metadata log is different (not managed by LogManager), so I think
> it should always be created regardless if its log directory is
> cordoned or not.
>
> Thanks,
> Mickael
>
> On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping Tsai  wrote:
> >
> > hi Mickael
> >
> > That is totally a good idea, but I have a question about the
> implementation
> >
> > Do we consider making pluggable ReplicaPlacer (KIP-660) first and then
> add
> > another impl of ReplicaPlacer to offer cordon mechanism? Noted that
> > `ReplicaPlacer` can implement Reconfigurable to get updated at runtime.
> > That is similar to KIP-1066 - change cordoned.log.dirs through configs
> > update.
> >
> > The benefit is to let users have their optimized policy for specific
> > scenario. Also, it can avoid that we add more and more mechanism to our
> > code base. Of course we can merge the mechanism which can be used by 99%
> > users :smile
> >
> > Best,
> > Chia-Ping
> >
> >
> > Luke Chen  於 2024年7月9日 週二 下午9:07寫道:
> >
> > > Hi Mickael,
> > >
> > > Thanks for the KIP!
> > > This is a long waiting feature for many users!
> > >
> > > Questions:
> > > 1. I think piggyback the "BrokerHeartbeatRequest" to forward the
> corden log
> > > dir to controller makes sense to me.
> > > We already did similar things for fence, controller shutdown, failed
> log
> > > dir...etc.
> > >
> > > 2. In the admin API, what parameters will the new added isCordoned()
> method
> > > take?
> > >
> > > 3. In the KIP, we said:
> > > "defaultDir(): This method w

Re: [DISCUSS] KAFKA-17094: How should unregistered broker nodes be handled in KRaft quorum state?

2024-07-11 Thread Luke Chen
Hi Tina,

Thanks for starting the discussion thread. This is indeed a problem if
users don't know which node ID need to be unregistered.
I'm +1 for adding another field in the response to include registered
observer nodes that are inactive. That will make it clear and won't confuse
existing users.

Thanks.
Luke

On Wed, Jul 10, 2024 at 11:43 PM Gantigmaa Selenge 
wrote:

> Hi all,
>
> As reported in KAFKA-17094 [1], to scale down KRaft-based broker nodes,
> they must first be unregistered via the Kafka Admin API. If a node is
> removed before being unregistered, it can't be listed for unregistration
> because the describeQuorum won't show inactive observer nodes. This happens
> because the quorum state excludes nodes that haven't heartbeated within the
> observer session timeout [2].
>
> To address this issue, we could stop clearing the observers list, changing
> its meaning from "active observer nodes" to "all registered observer
> nodes". While the current code implies the list should only include active
> nodes, there's no documentation explicitly stating this. Moreover, the
> voters list already includes all registered/configured voter nodes,
> inactive or not. Making this change would align the behavior of the
> observers and voters lists.
>
> Alternatively, we could add another field in the response (requiring a KIP)
> to include registered observer nodes that are offline. This would result in
> two separate lists: one for active observer nodes and one for inactive
> observer nodes.
>
> What are your thoughts on this issue?
> [1] https://issues.apache.org/jira/browse/KAFKA-17094
> [2]
>
> https://github.com/apache/kafka/blob/trunk/raft/src/main/java/org/apache/kafka/raft/LeaderState.java#L469
>
>
> Thanks!
> Regards,
> Gantigmaa Selenge
>


Re: [VOTE] KIP-1067: Remove ReplicaVerificationTool in 4.0 (deprecate in 3.9)

2024-07-11 Thread Luke Chen
+1 (binding) from me.

Thanks Dongjin.
Luke

On Thu, Jul 11, 2024 at 12:23 AM Justine Olshan
 wrote:

> +1 (binding)
>
> Thanks,
> Justine
>
> On Mon, Jul 8, 2024 at 1:59 AM Chia-Ping Tsai  wrote:
>
> > >
> > > Note that we already have this tracker for tools deprecations, but I'm
> > > fine to have a dedicated one for this tool (maybe we can link them).
> > > https://issues.apache.org/jira/browse/KAFKA-14705.
> >
> >
> > happy to know it. I have added the link to
> > https://issues.apache.org/jira/browse/KAFKA-17073
> >
> > Federico Valeri  於 2024年7月8日 週一 下午3:45寫道:
> >
> > > +1
> > >
> > > Note that we already have this tracker for tools deprecations, but I'm
> > > fine to have a dedicated one for this tool (maybe we can link them).
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-14705.
> > >
> > > On Sun, Jul 7, 2024 at 3:58 PM Chia-Ping Tsai 
> > wrote:
> > > >
> > > > +1
> > > >
> > > > Dongjin Lee  於 2024年7月7日 週日 下午9:22寫道:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to call for a vote on KIP-1067: Remove
> > > ReplicaVerificationTool in
> > > > > 4.0 (deprecate in 3.9):
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627623
> > > > >
> > > > > Thanks,
> > > > > Dongjin
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > >
> > > > >
> > > > > *github:  github.com/dongjinleekr
> > > > > keybase:
> > > https://keybase.io/dongjinleekr
> > > > > linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > > speakerdeck:
> > > > > speakerdeck.com/dongjin
> > > > > *
> > > > >
> > >
> >
>


Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Luke Chen
Thanks Mickael!
+1 for increasing the test coverage for admin clients.
But I don't think this should be the blocker for v3.8.0, given the delay of
v3.8.0 and we already have many releases with this state.
What do you think?

Thanks.
Luke

On Mon, Jul 15, 2024 at 4:57 PM Mickael Maison 
wrote:

> Hi,
>
> I'm concerned we did not have tests to catch that issue earlier. Such
> an essential API like describeTopics() should be properly tested.
> Taking a quick look, it seems a bunch of other Admin APIs also don't
> have integration tests. I created
> https://issues.apache.org/jira/browse/KAFKA-17137 to address that.
>
> Thanks,
> Mickael
>
> On Mon, Jul 15, 2024 at 9:53 AM Josep Prat 
> wrote:
> >
> > Hi all,
> >
> > I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with the
> > backport https://github.com/apache/kafka/pull/16593 and I'll generate a
> new
> > RC as soon as it's merged.
> >
> > Best,
> >
> > On Sat, Jul 13, 2024 at 7:09 PM Josep Prat  wrote:
> >
> > > Thanks for reviewing the RC Jakub,
> > >
> > > If you can open a PR with this fix pointing to the 3.8 branch I could
> cut
> > > another RC.
> > >
> > > Best!
> > >
> > > --
> > > Josep Prat
> > > Open Source Engineering Director, Aiven
> > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > Aiven Deutschland GmbH
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
> > >
> > >> Hi Josep,
> > >>
> > >> Thanks for the RC.
> > >>
> > >> I gave it a quick try and ran into issues with an application using
> the
> > >> Kafka Admin API that looks like this issue:
> > >> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that this
> > >> breaks what was working fine with Kafka 3.7, can the fix be
> backported to
> > >> 3.8.0 as well? If needed, I tried to create a simple reproducer for
> the
> > >> issue:
> https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
> > >>
> > >> Thanks & Regards
> > >> Jakub
> > >>
> > >> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat
> 
> > >> wrote:
> > >>
> > >> > Hello Kafka users, developers and client-developers,
> > >> >
> > >> > This is the first candidate for release of Apache Kafka 3.8.0.
> > >> > Some of the major features included in this release are:
> > >> > * KIP-1028: Docker Official Image for Apache Kafka
> > >> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
> > >> > * KIP-1036: Extend RecordDeserializationException exception
> > >> > * KIP-1019: Expose method to determine Metric Measurability
> > >> > * KIP-1004: Enforce tasks.max property in Kafka Connect
> > >> > * KIP-989: Improved StateStore Iterator metrics for detecting leaks
> > >> > * KIP-993: Allow restricting files accessed by File and Directory
> > >> > ConfigProviders
> > >> > * KIP-924: customizable task assignment for Streams
> > >> > * KIP-813: Shareable State Stores
> > >> > * KIP-719: Deprecate Log4J Appender
> > >> > * KIP-390: Support Compression Level
> > >> > * KIP-1018: Introduce max remote fetch timeout config for
> > >> > DelayedRemoteFetch requests
> > >> > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
> > >> > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to
> replace
> > >> > kafka.serializer.Decoder
> > >> > * KIP-899: Allow producer and consumer clients to rebootstrap
> > >> >
> > >> > Release notes for the 3.8.0 release:
> > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/RELEASE_NOTES.html
> > >> >
> > >> > *** Please download, test and vote by Monday, July 15, 12pm PT
> > >> >
> > >> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > >> > https://kafka.apache.org/KEYS
> > >> >
> > >> > * Release artifacts to be voted upon (source and binary):
> > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/
> > >> >
> > >> > * Docker release artifacts to be voted upon:
> > >> > apache/kafka:3.8.0-rc0
> > >> > apache/kafka-native:3.8.0-rc0
> > >> >
> > >> > * Maven artifacts to be voted upon:
> > >> >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >> >
> > >> > * Javadoc:
> > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/javadoc/
> > >> >
> > >> > * Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
> > >> > https://github.com/apache/kafka/releases/tag/3.8.0-rc0
> > >> >
> > >> >
> > >> > Once https://github.com/apache/kafka-site/pull/608 is merged. You
> will
> > >> be
> > >> > able to find the proper documentation under kafka.apache.org.
> > >> > * Documentation:
> > >> > https://kafka.apache.org/38/documentation.html
> > >> >
> > >> > * Protocol:
> > >> > https://kafka.apache.org/38/protocol.html
> > >> >
> > >> >
> > >> > * Successful Jenkins builds for the 3.8 branch:
> > >> > Unit/integration tests:
> > >> >
> > >> >
> > >>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%252Fkafka/detail/3.8/67/

Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Luke Chen
Sounds good!
Thank you.

Luke

On Mon, Jul 15, 2024 at 5:11 PM Mickael Maison 
wrote:

> Yeah I've not marked it as a blocker for 3.8.0. It's just something we
> need to do in the background.
>
> On Mon, Jul 15, 2024 at 11:05 AM Luke Chen  wrote:
> >
> > Thanks Mickael!
> > +1 for increasing the test coverage for admin clients.
> > But I don't think this should be the blocker for v3.8.0, given the delay
> of
> > v3.8.0 and we already have many releases with this state.
> > What do you think?
> >
> > Thanks.
> > Luke
> >
> > On Mon, Jul 15, 2024 at 4:57 PM Mickael Maison  >
> > wrote:
> >
> > > Hi,
> > >
> > > I'm concerned we did not have tests to catch that issue earlier. Such
> > > an essential API like describeTopics() should be properly tested.
> > > Taking a quick look, it seems a bunch of other Admin APIs also don't
> > > have integration tests. I created
> > > https://issues.apache.org/jira/browse/KAFKA-17137 to address that.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Mon, Jul 15, 2024 at 9:53 AM Josep Prat  >
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with
> the
> > > > backport https://github.com/apache/kafka/pull/16593 and I'll
> generate a
> > > new
> > > > RC as soon as it's merged.
> > > >
> > > > Best,
> > > >
> > > > On Sat, Jul 13, 2024 at 7:09 PM Josep Prat 
> wrote:
> > > >
> > > > > Thanks for reviewing the RC Jakub,
> > > > >
> > > > > If you can open a PR with this fix pointing to the 3.8 branch I
> could
> > > cut
> > > > > another RC.
> > > > >
> > > > > Best!
> > > > >
> > > > > --
> > > > > Josep Prat
> > > > > Open Source Engineering Director, Aiven
> > > > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > > > Aiven Deutschland GmbH
> > > > > Alexanderufer 3-7, 10117 Berlin
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > > > On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
> > > > >
> > > > >> Hi Josep,
> > > > >>
> > > > >> Thanks for the RC.
> > > > >>
> > > > >> I gave it a quick try and ran into issues with an application
> using
> > > the
> > > > >> Kafka Admin API that looks like this issue:
> > > > >> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that
> this
> > > > >> breaks what was working fine with Kafka 3.7, can the fix be
> > > backported to
> > > > >> 3.8.0 as well? If needed, I tried to create a simple reproducer
> for
> > > the
> > > > >> issue:
> > > https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
> > > > >>
> > > > >> Thanks & Regards
> > > > >> Jakub
> > > > >>
> > > > >> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat
> > > 
> > > > >> wrote:
> > > > >>
> > > > >> > Hello Kafka users, developers and client-developers,
> > > > >> >
> > > > >> > This is the first candidate for release of Apache Kafka 3.8.0.
> > > > >> > Some of the major features included in this release are:
> > > > >> > * KIP-1028: Docker Official Image for Apache Kafka
> > > > >> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
> > > > >> > * KIP-1036: Extend RecordDeserializationException exception
> > > > >> > * KIP-1019: Expose method to determine Metric Measurability
> > > > >> > * KIP-1004: Enforce tasks.max property in Kafka Connect
> > > > >> > * KIP-989: Improved StateStore Iterator metrics for detecting
> leaks
> > > > >> > * KIP-993: Allow restricting files accessed by File and
> Directory
> > > > >> > ConfigProviders
> > > > >> > * KIP-924: customizable task assignment for Streams
> > > > >> > * KIP-813: Shareable State St

Re: Getting Started: requesting permissions to contribute to Apache Kafka

2024-07-21 Thread Luke Chen
Hi Rich,

Your accounts are all set.

Thanks.
Luke

On Mon, Jul 22, 2024 at 12:37 AM Rich C.  wrote:

> Hi team,
>
> Following Getting Started #4
> <
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >,
> I would like to request permission to contribute to Apache Kafka. I guess
> the edit permission for wiki (KIP).
>
> Jira: jychen7
> wiki: juchen7 (note: wiki id is different from Jira id, due to a typo
> during signup)
>
> Thanks,
> Rich
>


Re: Wiki Kafka Kip access

2024-07-21 Thread Luke Chen
Hi Mason,

Your accounts are all set.

Thanks.
Luke

On Sun, Jul 21, 2024 at 6:25 PM Mason Chen  wrote:

> Hi there,
>
> I would like to have the right access permission to create Kip. My wiki id
> and jira id are both masonc. Thank you for the help!
>
> Best regards,
> Mason
>


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-07-25 Thread Luke Chen
Hi all,

While implementing the feature in KRaft mode, I found something we need to
change the original proposal:

(1) In the KIP of "Disablement - KRaft backed Cluster
",
we said:
Controller persists configuration change and completes disablement:

   1. The controller creates a ConfigRecord and persists it in the metadata
   topic.
   2. The controller creates a TopicRecord to increment the tiered_epoch
   and update the tiered_state to DISABLED state.
   3. This update marks the completion of the disablement process,
   indicating that tiered storage has been successfully disabled for the
   KRaft-backed clusters. Similar to topic deletion all replicas will
   eventually pick up the changes from the cluster metadata topic and apply
   them to their own state. Any deletion failures will be picked up by the
   expiration threads which should be deleting data before the log start
   offset. If the retention policy is delete, a new expiration thread will be
   started on leadership change on any historical tiered topic to confirm that
   there aren't any leftover segments in remote which need deletion. After a
   cycle in which it didn't delete anything, it will die.

For the (b) step, I don't think the controller needs to create TopicRecord
because:
1. The broker can fetch the "tiered_state" from the ConfigRecord
2. The "tiered_epoch" is not necessary because raft protocol will keep the
order for us. The broker can rely on the raft protocol and apply them in
order, to get the expected results.
3. Marking the completion of the disablement process. In KRaft, it's not
necessary because once the ConfigRecord is accepted by the controller, it
must be applied by all the observers "in order".

So, I'd like to propose to remove the (b) step in KRaft mode.

(2) Current configuration make users and implementation confusion.
This is what originally we proposed in KIP-950:

remote.storage.enable

remote.log.disable.policy(new)

remote storage data

true

null/retain/delete

uploadable + readable

false

retain (default)

readable, but remote storage is disabled? For users, they are also
surprised if this topic is reading data from remote storage.

Note: This also makes development difficult because it’s unable to
distinguish between:

(1) a topic never enables remote storage

(2) a topic enabled and then disabled remote storage

A problem we have is when broker startup and trying to set log start
offset. Since the remote storage is disabled, we originally should set to
“local log start offset”, but in case (2), we expect it to treat it as
“remote storage enabled”, which is confusing.

false

delete

All remote data are deleted


Therefore, Kamal and I would like to propose a new version of the
configuration:

remote.storage.enable

remote.copy.disabled (new)

remote storage data

true

false (default)

uploadable + readable

true

true

readable

false

true/false

All remote data are deleted

The advantage is this config makes users clear what it is configuring, and
the result is expected.
Also, on the implementation side, we can still rely on
"remote.storage.enable" to identify is this feature is on/off.

Any thoughts about it?

Thank you.
Luke



On Thu, May 30, 2024 at 6:50 PM David Jacot 
wrote:

> Hi all,
>
> Thanks for the KIP. This is definitely a worthwhile feature. However, I am
> a bit sceptical on the ZK part of the story. The 3.8 release is supposed to
> be the last one supporting ZK so I don't really see how we could bring it
> to ZK, knowing that we don't plan to do a 3.9 release (current plan). I
> strongly suggest clarifying this before implementing the ZK part in order
> to avoid having new code [1] being deleted right after 3.8 is released
> :). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK
> part.
>
> [1] https://github.com/apache/kafka/pull/16131
>
> Best,
> David
>
> On Tue, May 28, 2024 at 1:31 PM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > I agree with Chia-Ping, I think we could drop the ZK variant
> > altogether, especially if this is not going to make it in 3.8.0.
> > Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of
> > new ZooKeeper-related code in that release to delete it all right
> > after in 4.0.
> >
> > Thanks,
> > Mickael
> >
> > On Fri, May 24, 2024 at 5:03 PM Christo Lolov 
> > wrote:
> > >
> > > Hello!
> > >
> > > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping
> and
> > > Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews!
> > >
> > > Realistically, I don't think I have the bandwidth to get this in 3.8.0.
> > > Due to this, I will mark tentatively the Zookeeper part for 3.9 if the
> > > community decides that they do in fact want one more 3.x release.
> > > I will mark the KRaft part as ready to be started and aiming for either
> > 4.0
> > > or 3.9.
>

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-07-25 Thread Luke Chen
Hi all,

I just found the table is not able to be displayed correctly in the email.
I've put the table content in google doc here
<https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit?usp=sharing>
.

Thanks.
Luke

On Thu, Jul 25, 2024 at 6:30 PM Luke Chen  wrote:

> Hi all,
>
> While implementing the feature in KRaft mode, I found something we need to
> change the original proposal:
>
> (1) In the KIP of "Disablement - KRaft backed Cluster
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster>",
> we said:
> Controller persists configuration change and completes disablement:
>
>1. The controller creates a ConfigRecord and persists it in the
>metadata topic.
>2. The controller creates a TopicRecord to increment the tiered_epoch
>and update the tiered_state to DISABLED state.
>3. This update marks the completion of the disablement process,
>indicating that tiered storage has been successfully disabled for the
>KRaft-backed clusters. Similar to topic deletion all replicas will
>eventually pick up the changes from the cluster metadata topic and apply
>them to their own state. Any deletion failures will be picked up by the
>expiration threads which should be deleting data before the log start
>offset. If the retention policy is delete, a new expiration thread will be
>started on leadership change on any historical tiered topic to confirm that
>there aren't any leftover segments in remote which need deletion. After a
>cycle in which it didn't delete anything, it will die.
>
> For the (b) step, I don't think the controller needs to create TopicRecord
> because:
> 1. The broker can fetch the "tiered_state" from the ConfigRecord
> 2. The "tiered_epoch" is not necessary because raft protocol will keep the
> order for us. The broker can rely on the raft protocol and apply them in
> order, to get the expected results.
> 3. Marking the completion of the disablement process. In KRaft, it's not
> necessary because once the ConfigRecord is accepted by the controller, it
> must be applied by all the observers "in order".
>
> So, I'd like to propose to remove the (b) step in KRaft mode.
>
> (2) Current configuration make users and implementation confusion.
> This is what originally we proposed in KIP-950:
>
> remote.storage.enable
>
> remote.log.disable.policy(new)
>
> remote storage data
>
> true
>
> null/retain/delete
>
> uploadable + readable
>
> false
>
> retain (default)
>
> readable, but remote storage is disabled? For users, they are also
> surprised if this topic is reading data from remote storage.
>
> Note: This also makes development difficult because it’s unable to
> distinguish between:
>
> (1) a topic never enables remote storage
>
> (2) a topic enabled and then disabled remote storage
>
> A problem we have is when broker startup and trying to set log start
> offset. Since the remote storage is disabled, we originally should set to
> “local log start offset”, but in case (2), we expect it to treat it as
> “remote storage enabled”, which is confusing.
>
> false
>
> delete
>
> All remote data are deleted
>
>
> Therefore, Kamal and I would like to propose a new version of the
> configuration:
>
> remote.storage.enable
>
> remote.copy.disabled (new)
>
> remote storage data
>
> true
>
> false (default)
>
> uploadable + readable
>
> true
>
> true
>
> readable
>
> false
>
> true/false
>
> All remote data are deleted
>
> The advantage is this config makes users clear what it is configuring, and
> the result is expected.
> Also, on the implementation side, we can still rely on
> "remote.storage.enable" to identify is this feature is on/off.
>
> Any thoughts about it?
>
> Thank you.
> Luke
>
>
>
> On Thu, May 30, 2024 at 6:50 PM David Jacot 
> wrote:
>
>> Hi all,
>>
>> Thanks for the KIP. This is definitely a worthwhile feature. However, I am
>> a bit sceptical on the ZK part of the story. The 3.8 release is supposed
>> to
>> be the last one supporting ZK so I don't really see how we could bring it
>> to ZK, knowing that we don't plan to do a 3.9 release (current plan). I
>> strongly suggest clarifying this before implementing the ZK part in order
>> to avoid having new code [1] being deleted right after 3.8 is released
>> :). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK
>> part.
>>

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-07-25 Thread Luke Chen
Hi Christo,

Thanks for your reply.

> keep the remote.log.disable.policy, but only allow it to take a value of
"delete".

I agree, or maybe make it a boolean value, and rename it to
`remote.log.delete.on.disable`, which is clearer.
And because of this new config, there will be a case that the config is
like this:

remote.storage.enable=false
remote.log.delete.on.disable=false (default)

That means, in this case, we'll keep all remote storage data, but close all
remote log tasks, and make "log start offset = local log start offset".
This will make the remote storage metadata in an unknown state because the
data in the remote storage is inaccessible anymore (since log start moved
to LLSO). And once this topic re-enables the `remote.storage.enable`, the
old remote log metadata will be included, but log start offset is not
expected anymore

So, I'd like to propose that we don't allow this configuration:

remote.storage.enable=false
remote.log.delete.on.disable=false (default)

If the topic config is set to this, or changed to this, we'll return
ConfigException during validation.

To make it clear, this is the new proposed solution:
https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit

Let me know what you think.

Thanks.
Luke



On Thu, Jul 25, 2024 at 8:07 PM Christo Lolov 
wrote:

> Hello!
>
> Thank you for raising this!
>
> Up to now KIP-950 took the stance that you can disable tiering whenever you
> wish as long as you specify what you would like to do with the data in
> remote. Amongst other things it also made the promise that it will not
> delete data without a user explicitly saying that they want their data
> deleted. In other words there is a 2-step verification that the user truly
> wants their data deleted.
>
> From the table of the new proposal I am left with the impression that the
> moment a user tries to disable tiering their data will by deleted. In other
> words, there is no 2-step verification that they want their data deleted.
>
> On a first read, I wouldn't be opposed to this proposal since it provides a
> neat alternative to the tiered epoch as long as there is still a 2-step
> verification that the user is aware their data will be deleted. I think
> that a reasonable way to achieve this is to keep the
> remote.log.disable.policy, but only allow it to take a value of "delete".
>
> What are your thoughts?
>
> Best,
> Christo
>
>
> On Thu, 25 Jul 2024 at 12:10, Luke Chen  wrote:
>
> > Hi all,
> >
> > I just found the table is not able to be displayed correctly in the
> email.
> > I've put the table content in google doc here
> > <
> >
> https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit?usp=sharing
> > >
> > .
> >
> > Thanks.
> > Luke
> >
> > On Thu, Jul 25, 2024 at 6:30 PM Luke Chen  wrote:
> >
> > > Hi all,
> > >
> > > While implementing the feature in KRaft mode, I found something we need
> > to
> > > change the original proposal:
> > >
> > > (1) In the KIP of "Disablement - KRaft backed Cluster
> > > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster
> > >",
> > > we said:
> > > Controller persists configuration change and completes disablement:
> > >
> > >1. The controller creates a ConfigRecord and persists it in the
> > >metadata topic.
> > >2. The controller creates a TopicRecord to increment the
> tiered_epoch
> > >and update the tiered_state to DISABLED state.
> > >3. This update marks the completion of the disablement process,
> > >indicating that tiered storage has been successfully disabled for
> the
> > >KRaft-backed clusters. Similar to topic deletion all replicas will
> > >eventually pick up the changes from the cluster metadata topic and
> > apply
> > >them to their own state. Any deletion failures will be picked up by
> > the
> > >expiration threads which should be deleting data before the log
> start
> > >offset. If the retention policy is delete, a new expiration thread
> > will be
> > >started on leadership change on any historical tiered topic to
> > confirm that
> > >there aren't any leftover segments in remote which need deletion.
> > After a
> > >cycle in which it didn't delete anything, it will die.
> > >
> > > For the (b) step, I don't think the controller needs to cr

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-07-26 Thread Luke Chen
Hi Kamal,

Thanks for the comments.

For this:
> If we throw an exception from the server for invalid config, then there
will be inconsistency between the CLI tools and the actual state of the
topic in the cluster. This can cause some confusion to the users whether
tiered storage is disabled or not. I don't know how the Kraft topic config
propagation/validation works.

I've confirmed we can validate the topic configuration change on the
controller level, by comparing existing configuration and new changed
configuration.
In my local POC, we can fail the configuration change if it's invalid like
this:

# Disable with remote.log.delete.on.disable=false (default)
bin/kafka-configs.sh --bootstrap-server {bootstrap-string} \
   --alter --entity-type topics --entity-name {topic-name} \
   --add-config 'remote.storage.enable=false'

Error while executing config command with args '--bootstrap-server
{bootstrap-string} --entity-type topics --entity-name {topic-name} --alter
--add-config remote.storage.enable=false'
java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.InvalidConfigurationException: It is invalid
to disable remote storage without deleting remote data. If you want to keep
the remote data, but turn to read only, please set `remote.copy.disabled=
true`. If you want to disable remote storage and delete all remote data,
please set `remote.storage.enable=false,remote.log.delete.on.disable=true`.

I've updated the KIP. Please take a look when available.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement

Thank you.
Luke


On Fri, Jul 26, 2024 at 2:05 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Correction:
>
> (2): Wait for all the remote segments to be deleted async due to breach by
> retention time (or) size,
>then set the `remote.storage.enable = false` and
> `remote.log.delete.on.disable = true`. This step is optional.
>
> On Thu, Jul 25, 2024 at 11:13 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Chia-Ping,
> >
> > Thanks for the review!
> >
> > >  If so, what is the purpose of `remote.log.delete.on.disable=false`?
> >
> > IIUC, the purpose of `remote.log.delete.on.disable` is to get explicit
> > confirmation from the user
> > before deleting the remote log segments. The concern raised in the thread
> > is that if the user
> > accidentally changes the value of `remote.storage.enable` from true to
> > false, then remote segments
> > get lost.
> >
> > For ungraceful disablement, (ie) disabling the remote storage for the
> > topic and deleting all the
> > remote segments, the user should set both the configs at once:
> >
> > (1) remote.storage.enable = false and remote.log.delete.on.disable = true
> >
> > If the user accidentally sets only the remote.storage.enable = true and
> > leaves the `remote.log.delete.on.disable`
> > with default value of `false`, then we will throw ConfigException to
> > prevent the deletion of remote logs.
> >
> > For graceful disablement, the user should set:
> >
> > (1): remote.copy.disabled = true.
> > (2): Wait for all the remote segments to be deleted async due to breach
> by
> > retention time (or) size,
> >then set the `remote.storage.enable = false`. This step is
> > optional.
> >
> > Luke,
> >
> > In ZK mode, once the topic config value gets updated, then it gets saved
> > in the /configs/topics/ znode.
> > If we throw an exception from the server for invalid config, then there
> > will be inconsistency between the CLI tools
> > and the actual state of the topic in the cluster. This can cause some
> > confusion to the users whether tiered storage
> > is disabled or not. I don't know how the Kraft topic config
> > propagation/validation works.
> >
> > --
> > Kamal
> >
> > On Thu, Jul 25, 2024 at 7:10 PM Chia-Ping Tsai 
> wrote:
> >
> >> remote.storage.enable=false
> >> remote.log.delete.on.disable=false (default)
> >> If the topic config is set to this, or changed to this, we'll return
> >> ConfigException during validation.
> >>
> >> Pardon me, I'm a bit confused.
> >>
> >> when `remote.storage.enable=true`, `remote.log.delete.on.disable=false`
> is
> >> no-op
> >> when `remote.storage.enable=false`, `remote.log.delete.on.disable=false`
> >> is
> >> error
> >>
> >> If `remote.log.delete.on.disable` must be true when setting
> >> `remote.storage.enable`
> >> to false, does it mean changing `remote.storage.en

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-07-26 Thread Luke Chen
Thanks Kamal for the comments.
KIP updated.

Thanks.
Luke

On Fri, Jul 26, 2024 at 6:56 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Luke,
>
> Thanks for confirming the topic config change validation on the controller
> and updating the KIP.
> The updated KIP LGTM.
>
> 1. Can we update the below sentence in the KIP to clarify that
> remote.storage.enable should be true during graceful disablement?
>
> > Users set the configuration
> "remote.storage.enable=false,remote.log.delete.on.disable=true", or
> "remote.copy.disabled=true" for the desired topic, indicating the
> disablement of tiered storage.
> to
> > Users set the configuration
> "remote.storage.enable=false,remote.log.delete.on.disable=true", or
> "remote.storage.enable=true,remote.copy.disabled=true" for the desired
> topic, indicating the disablement of tiered storage.
>
> 2. Can we clarify in the public interface that the StopReplica v5,
> tiered_epoch, and tiered_state changes are required only for ZK mode and
> won't be implemented?
>
> Thanks,
> Kamal
>
> On Fri, Jul 26, 2024 at 1:40 PM Luke Chen  wrote:
>
> > Hi Kamal,
> >
> > Thanks for the comments.
> >
> > For this:
> > > If we throw an exception from the server for invalid config, then there
> > will be inconsistency between the CLI tools and the actual state of the
> > topic in the cluster. This can cause some confusion to the users whether
> > tiered storage is disabled or not. I don't know how the Kraft topic
> config
> > propagation/validation works.
> >
> > I've confirmed we can validate the topic configuration change on the
> > controller level, by comparing existing configuration and new changed
> > configuration.
> > In my local POC, we can fail the configuration change if it's invalid
> like
> > this:
> >
> > # Disable with remote.log.delete.on.disable=false (default)
> > bin/kafka-configs.sh --bootstrap-server {bootstrap-string} \
> >--alter --entity-type topics --entity-name {topic-name} \
> >--add-config 'remote.storage.enable=false'
> >
> > Error while executing config command with args '--bootstrap-server
> > {bootstrap-string} --entity-type topics --entity-name {topic-name}
> --alter
> > --add-config remote.storage.enable=false'
> > java.util.concurrent.ExecutionException:
> > org.apache.kafka.common.errors.InvalidConfigurationException: It is
> invalid
> > to disable remote storage without deleting remote data. If you want to
> keep
> > the remote data, but turn to read only, please set `remote.copy.disabled=
> > true`. If you want to disable remote storage and delete all remote data,
> > please set
> `remote.storage.enable=false,remote.log.delete.on.disable=true`.
> >
> > I've updated the KIP. Please take a look when available.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
> >
> > Thank you.
> > Luke
> >
> >
> > On Fri, Jul 26, 2024 at 2:05 AM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Correction:
> > >
> > > (2): Wait for all the remote segments to be deleted async due to breach
> > by
> > > retention time (or) size,
> > >then set the `remote.storage.enable = false` and
> > > `remote.log.delete.on.disable = true`. This step is optional.
> > >
> > > On Thu, Jul 25, 2024 at 11:13 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Chia-Ping,
> > > >
> > > > Thanks for the review!
> > > >
> > > > >  If so, what is the purpose of
> `remote.log.delete.on.disable=false`?
> > > >
> > > > IIUC, the purpose of `remote.log.delete.on.disable` is to get
> explicit
> > > > confirmation from the user
> > > > before deleting the remote log segments. The concern raised in the
> > thread
> > > > is that if the user
> > > > accidentally changes the value of `remote.storage.enable` from true
> to
> > > > false, then remote segments
> > > > get lost.
> > > >
> > > > For ungraceful disablement, (ie) disabling the remote storage for the
> > > > topic and deleting all the
> > > > remote segments, the user should set both the configs at once:
> > > >
> > > > (1) remote.storage.enable = false and remote.log.delete.on.disable =
> 

Re: New release branch 3.9

2024-07-30 Thread Luke Chen
Hi Colin and all,

If KIP-853 can complete in v3.9.0 in time (or a little delay), I agree we
should try to keep v3.9.0 as the last release before v4.0.
This way, it will let all Kafka ecosystem projects have a clear (and
certain) picture about what will happen in Apache Kafka.

Hi Colin,
For KIP-950 (KAFKA-15132 )
to allow to disable tiered storage on topic level, the PR is under review
and we should be able to merge within this week.
For KIP-1005 (KAFKA-15857
) to expose remote
storage related offset in kafka-get-offsets.sh, this KIP was reverted in
v3.8.0 because of MV issue. We'd like to add it back and can be completed
within this week.

These 2 KIPs are important feature for tiered storage, we hope they can be
added into v3.9.0.

Thank you.
Luke



On Wed, Jul 31, 2024 at 7:31 AM Colin McCabe  wrote:

> Yeah, please go ahead. I know a lot of people are waiting for 4.0.
>
> best,
> Colin
>
>
> On Tue, Jul 30, 2024, at 16:05, Matthias J. Sax wrote:
> > Thanks for clarifying Colin. So my assumptions were actually correct.
> >
> > We have a lot of contributors waiting to pick-up 4.0 tickets, and I'll
> > go ahead a tell them that we are ready and they can start to pick them
> up.
> >
> > Thanks.
> >
> >
> > -Matthias
> >
> > On 7/30/24 3:51 PM, Colin McCabe wrote:
> >> Hi Chia-Ping Tsai,
> >>
> >> If you can get them done this week then I think we can merge them in to
> 3.9. If not, then let's wait until 4.0, please.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Tue, Jul 30, 2024, at 09:07, Chia-Ping Tsai wrote:
> >>> hi Colin,
> >>>
> >>> Could you please consider adding
> >>> https://issues.apache.org/jira/browse/KAFKA-1 to 3.9.0
> >>>
> >>> The issue is used to deprecate the formatters in core module. Also, it
> >>> implements the replacements for them.
> >>>
> >>> In order to follow the deprecation rules, it would be nice to have
> >>> KAFKA-1 in 3.9.0
> >>>
> >>> If you agree to have them in 3.9.0, I will cherry-pick them into 3.9.0
> when
> >>> they get merged to trunk.
> >>>
> >>> Best,
> >>> Chia-Ping
> >>>
> >>>
> >>> José Armando García Sancio  於
> 2024年7月30日 週二
> >>> 下午11:59寫道:
> >>>
>  Thanks Colin.
> 
>  For KIP-853 (KRaft Controller Membership Changes), we still have the
>  following features that are in progress.
> 
>  1. UpdateVoter RPC and request handling
>  
>  2. Storage tool changes for KIP-853
>  
>  3. kafka-metadata-quorum describe changes for KIP-853
>  
>  4. kafka-metadata-quorum add voter and remove voter changes
>  
>  5. Sending UpdateVoter request and response handling
>  
> 
>  Can we cherry pick them to the release branch 3.9.0 when they get
> merged to
>  trunk? They have a small impact as they shouldn't affect the rest of
> Kafka
>  and only affect the kraft controller membership change feature. I
> expected
>  them to get merged to the trunk branch in the coming days.
> 
>  Thanks,
> 
>  On Mon, Jul 29, 2024 at 7:02 PM Colin McCabe 
> wrote:
> 
> > Hi Kafka developers and friends,
> >
> > As promised, we now have a release branch for the upcoming 3.9.0
> release.
> > Trunk has been bumped to 4.0.0-SNAPSHOT.
> >
> > I'll be going over the JIRAs to move every non-blocker from this
> release
>  to
> > the next release.
> >
> >  From this point, most changes should go to trunk.
> > *Blockers (existing and new that we discover while testing the
> release)
> > will be double-committed. *Please discuss with your reviewer whether
> your
> > PR should go to trunk or to trunk+release so they can merge
> accordingly.
> >
> > *Please help us test the release! *
> >
> > best,
> > Colin
> >
> 
> 
>  --
>  -José
> 
>


Re: [kafka-clients] [ANNOUNCE] Apache Kafka 3.8.0

2024-07-30 Thread Luke Chen
; > > >> (Please report an unintended omission)
> > > > >>
> > > > >> Aadithya Chandra, Abhijeet Kumar, Abhinav Dixit, Adrian Preston,
> > > Afshin
> > > > >> Moazami, Ahmed Najiub, Ahmed Sobeh, Akhilesh Chaganti, Almog
> Gavra,
> > > > Alok
> > > > >> Thatikunta, Alyssa Huang, Anatoly Popov, Andras Katona, Andrew
> > > > >> Schofield, Anna Sophie Blee-Goldman, Antoine Pourchet, Anton
> > Agestam,
> > > > >> Anton Liauchuk, Anuj Sharma, Apoorv Mittal, Arnout Engelen, Arpit
> > > > Goyal,
> > > > >> Artem Livshits, Ashwin Pankaj, Ayoub Omari, Bruno Cadonna, Calvin
> > Liu,
> > > > >> Cameron Redpath, charliecheng630, Cheng-Kai, Zhang, Cheryl
> Simmons,
> > > > Chia
> > > > >> Chuan Yu, Chia-Ping Tsai, ChickenchickenLove, Chris Egerton, Chris
> > > > >> Holland, Christo Lolov, Christopher Webb, Colin P. McCabe, Colt
> > > > McNealy,
> > > > >> cooper.ts...@suse.com, Vedarth Sharma, Crispin Bernier, Daan
> > Gerits,
> > > > >> David Arthur, David Jacot, David Mao, dengziming, Divij Vaidya,
> > > DL1231,
> > > > >> Dmitry Werner, Dongnuo Lyu, Drawxy, Dung Ha, Edoardo Comar, Eduwer
> > > > >> Camacaro, Emanuele Sabellico, Erik van Oosten, Eugene Mitskevich,
> > Fan
> > > > >> Yang, Federico Valeri, Fiore Mario Vitale, flashmouse, Florin
> > > Akermann,
> > > > >> Frederik Rouleau, Gantigmaa Selenge, Gaurav Narula, ghostspiders,
> > > > >> gongxuanzhang, Greg Harris, Gyeongwon Do, Hailey Ni, Hao Li,
> Hector
> > > > >> Geraldino, highluck, hudeqi, Hy (하이), IBeyondy, Iblis Lin, Igor
> > > Soarez,
> > > > >> ilyazr, Ismael Juma, Ivan Vaskevych, Ivan Yurchenko, James
> Faulkner,
> > > > >> Jamie Holmes, Jason Gustafson, Jeff Kim, jiangyuan, Jim Galasyn,
> > > > Jinyong
> > > > >> Choi, Joel Hamill, John Doe zh2725284...@gmail.com, John Roesler,
> > > John
> > > > >> Yu, Johnny Hsu, Jorge Esteban Quilcate Otoya, Josep Prat, José
> > Armando
> > > > >> García Sancio, Jun Rao, Justine Olshan, Kalpesh Patel, Kamal
> > > > >> Chandraprakash, Ken Huang, Kirk True, Kohei Nozaki, Krishna
> Agarwal,
> > > > >> KrishVora01, Kuan-Po (Cooper) Tseng, Kvicii, Lee Dongjin, Leonardo
> > > > >> Silva, Lianet Magrans, LiangliangSui, Linu Shibu, lixinyang,
> Lokesh
> > > > >> Kumar, Loïc GREFFIER, Lucas Brutschy, Lucia Cerchie, Luke Chen,
> > > > >> Manikumar Reddy, mannoopj, Manyanda Chitimbo, Mario Pareja,
> Matthew
> > de
> > > > >> Detrich, Matthias Berndt, Matthias J. Sax, Matthias Sax, Max
> Riedel,
> > > > >> Mayank Shekhar Narula, Michael Edgar, Michael Westerby, Mickael
> > > Maison,
> > > > >> Mike Lloyd, Minha, Jeong, Murali Basani, n.izhikov, Nick Telford,
> > > > Nikhil
> > > > >> Ramakrishnan, Nikolay, Octavian Ciubotaru, Okada Haruki, Omnia G.H
> > > > >> Ibrahim, Ori Hoch, Owen Leung, Paolo Patierno, Philip Nee,
> > > > >> Phuc-Hong-Tran, PoAn Yang, Proven Provenzano, Qichao Chu, Ramin
> > > Gharib,
> > > > >> Ritika Reddy, Rittika Adhikari, Rohan, Ron Dagostino, runom,
> > rykovsi,
> > > > >> Sagar Rao, Said Boudjelda, sanepal, Sanskar Jhajharia, Satish
> > Duggana,
> > > > >> Sean Quah, Sebastian Marsching, Sebastien Viale, Sergio Troiano,
> Sid
> > > > >> Yagnik, Stanislav Kozlovski, Stig Døssing, Sudesh Wasnik, TaiJuWu,
> > > > >> TapDang, testn, TingIāu "Ting" Kì, vamossagar12, Vedarth
> > > > >> Sharma, Victor van den Hoven, Vikas Balani, Viktor Somogyi-Vass,
> > > > Vincent
> > > > >> Rose, Walker Carlson, wernerdv, Yang Yu, Yash Mayya, yicheny,
> > Yu-Chen
> > > > >> Lai, yuz10, Zhifeng Chen, Zihao Lin, Ziming Deng, 谭九鼎
> > > > >>
> > > > >> We welcome your help and feedback. For more information on how to
> > > > >> report problems, and to get involved, visit the project website at
> > > > >> https://kafka.apache.org/
> > > > >>
> > > > >> Thank you!
> > > > >>
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Josep Prat
> > > > >> Release Manager for Apache Kafka 3.8.0
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> You received this message because you are subscribed to the Google
> > > > Groups "kafka-clients" group.
> > > > >> To unsubscribe from this group and stop receiving emails from it,
> > send
> > > > an email to kafka-clients+unsubscr...@googlegroups.com.
> > > > >> To view this discussion on the web visit
> > > >
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CAOJ18G5D-jOLuyPjR6Qq0msoC8wFHG_1XQPvbn-34_u%2BYHYnhw%40mail.gmail.com
> > > > <
> > > >
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CAOJ18G5D-jOLuyPjR6Qq0msoC8wFHG_1XQPvbn-34_u%2BYHYnhw%40mail.gmail.com?utm_medium=email&utm_source=footer
> > > > >.
> > > > >
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > > Groups "kafka-clients" group.
> > > > > To unsubscribe from this group and stop receiving emails from it,
> > send
> > > > an email to kafka-clients+unsubscr...@googlegroups.com.
> > > > > To view this discussion on the web visit
> > > >
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CAFbGOyxeAoXNhzswSqEVCNMnLndzEe-BJZr-pJBKKbvE03chLw%40mail.gmail.com
> > > > <
> > > >
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CAFbGOyxeAoXNhzswSqEVCNMnLndzEe-BJZr-pJBKKbvE03chLw%40mail.gmail.com?utm_medium=email&utm_source=footer
> > > > >.
> > > >
> > >
> >
>


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-08-07 Thread Luke Chen
Hi all,

Based on the original design:
When tiered storage is disabled or becomes read-only on a topic, the local
retention configuration becomes irrelevant, and all data expiration follows
the topic-wide retention configuration exclusively.

That works well. But we are afraid users will not check the document and
"thought" the local log is bound to the local.retention.ms/bytes after
`remote.log.copy.disable=true` (i.e. read-only remote storage). The
confusion might cause the local disk to be full and bring down the broker.
To avoid this "surprise" to users, we'd like to add one more validation
when `remote.log.copy.disable` is set to true:
  - validation: when `remote.log.copy.disable=true`,
-- `local.retention.ms` must equal to `retention.ms` or -2 (which
means `retention.ms` value to be used)
-- `local.retention.bytes` must equal to `retention.byes` or -2 (which
means `retention.ms` value to be used)

So, basically, we don't change the original design, just want to make sure
users are aware of the retention policy change after disabling remote log
copy.

Let me know if you have any comments.

Thank you.
Luke

On Fri, Jul 26, 2024 at 8:17 PM Luke Chen  wrote:

> Thanks Kamal for the comments.
> KIP updated.
>
> Thanks.
> Luke
>
> On Fri, Jul 26, 2024 at 6:56 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
>> Luke,
>>
>> Thanks for confirming the topic config change validation on the controller
>> and updating the KIP.
>> The updated KIP LGTM.
>>
>> 1. Can we update the below sentence in the KIP to clarify that
>> remote.storage.enable should be true during graceful disablement?
>>
>> > Users set the configuration
>> "remote.storage.enable=false,remote.log.delete.on.disable=true", or
>> "remote.copy.disabled=true" for the desired topic, indicating the
>> disablement of tiered storage.
>> to
>> > Users set the configuration
>> "remote.storage.enable=false,remote.log.delete.on.disable=true", or
>> "remote.storage.enable=true,remote.copy.disabled=true" for the desired
>> topic, indicating the disablement of tiered storage.
>>
>> 2. Can we clarify in the public interface that the StopReplica v5,
>> tiered_epoch, and tiered_state changes are required only for ZK mode and
>> won't be implemented?
>>
>> Thanks,
>> Kamal
>>
>> On Fri, Jul 26, 2024 at 1:40 PM Luke Chen  wrote:
>>
>> > Hi Kamal,
>> >
>> > Thanks for the comments.
>> >
>> > For this:
>> > > If we throw an exception from the server for invalid config, then
>> there
>> > will be inconsistency between the CLI tools and the actual state of the
>> > topic in the cluster. This can cause some confusion to the users whether
>> > tiered storage is disabled or not. I don't know how the Kraft topic
>> config
>> > propagation/validation works.
>> >
>> > I've confirmed we can validate the topic configuration change on the
>> > controller level, by comparing existing configuration and new changed
>> > configuration.
>> > In my local POC, we can fail the configuration change if it's invalid
>> like
>> > this:
>> >
>> > # Disable with remote.log.delete.on.disable=false (default)
>> > bin/kafka-configs.sh --bootstrap-server {bootstrap-string} \
>> >--alter --entity-type topics --entity-name {topic-name} \
>> >--add-config 'remote.storage.enable=false'
>> >
>> > Error while executing config command with args '--bootstrap-server
>> > {bootstrap-string} --entity-type topics --entity-name {topic-name}
>> --alter
>> > --add-config remote.storage.enable=false'
>> > java.util.concurrent.ExecutionException:
>> > org.apache.kafka.common.errors.InvalidConfigurationException: It is
>> invalid
>> > to disable remote storage without deleting remote data. If you want to
>> keep
>> > the remote data, but turn to read only, please set
>> `remote.copy.disabled=
>> > true`. If you want to disable remote storage and delete all remote data,
>> > please set
>> `remote.storage.enable=false,remote.log.delete.on.disable=true`.
>> >
>> > I've updated the KIP. Please take a look when available.
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
>> >
>> > Thank you.
>> > Luke
>> >
>> >
>> > On Fri, Jul 26, 2024 at 2:05 AM Kamal Chandraprakash <
>> > kamal.c

Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-08-07 Thread Luke Chen
Hi Satish,

Thanks for the review.
I've updated the KIP to move ZK based solution to the appendix section.

> One minor comment I have is to change the config name from
"remote.log.copy.disable" to "remote.log.copy.enable" with the default
value being true.

I also think the original name "remote.log.copy.disable" makes more sense
because users are "disabling" the remote copy feature.

Thank you.
Luke

On Thu, Aug 8, 2024 at 12:21 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Satish,
>
> Thanks for the review!
>
> > One minor comment I have is to change the config name from
> "remote.log.copy.disable" to "remote.log.copy.enable" with the default
> value being true.
>
> I'm inclined towards maintaining the config as "remote.log.copy.disable"
> to keep the default value as "false".
>
> --
> Kamal
>
> On Thu, Aug 8, 2024 at 9:23 AM Satish Duggana 
> wrote:
>
> > Thanks Kamal, and Luke for improving the earlier solution for KRaft.
> >
> > One minor comment I have is to change the config name from
> > "remote.log.copy.disable" to "remote.log.copy.enable" with the default
> > value being true.
> >
> > The solution summary to disable tiered storage on a topic:
> >
> > - When a user wants to disable tiered storage on a topic, we should
> > make sure that local.log and log.retention are same. This is to make
> > sure the user understands the implications of the local storage
> > requirements while disabling tiered storage and set them
> > appropriately.
> >
> > - Stop copying the log segments to remote storage as broker needs to
> > accumulate the required data locally to serve the required data from
> > local storage before we disable in remote storage. This will be done
> > by updating the config "remote.log.copy.enable" as false.
> >
> > - We added a guardrail to make sure user understands that disabling
> > tiered storage will delete the remote storage data. This is by setting
> > "remote.log.delete.on.disable" should be true before setting
> > "remote.storage.enable" as false.
> >
> >
> > I think it is better to refactor the KIP to have only the updated
> > KRaft based solution and move the ZK based solution to the appendix
> > for reference. wdyt?
> >
> > ~Satish.
> >
> >
> >
> > On Wed, 7 Aug 2024 at 17:38, Luke Chen  wrote:
> > >
> > > Hi all,
> > >
> > > Based on the original design:
> > > When tiered storage is disabled or becomes read-only on a topic, the
> > local
> > > retention configuration becomes irrelevant, and all data expiration
> > follows
> > > the topic-wide retention configuration exclusively.
> > >
> > > That works well. But we are afraid users will not check the document
> and
> > > "thought" the local log is bound to the local.retention.ms/bytes after
> > > `remote.log.copy.disable=true` (i.e. read-only remote storage). The
> > > confusion might cause the local disk to be full and bring down the
> > broker.
> > > To avoid this "surprise" to users, we'd like to add one more validation
> > > when `remote.log.copy.disable` is set to true:
> > >   - validation: when `remote.log.copy.disable=true`,
> > > -- `local.retention.ms` must equal to `retention.ms` or -2 (which
> > > means `retention.ms` value to be used)
> > > -- `local.retention.bytes` must equal to `retention.byes` or -2
> > (which
> > > means `retention.ms` value to be used)
> > >
> > > So, basically, we don't change the original design, just want to make
> > sure
> > > users are aware of the retention policy change after disabling remote
> log
> > > copy.
> > >
> > > Let me know if you have any comments.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, Jul 26, 2024 at 8:17 PM Luke Chen  wrote:
> > >
> > > > Thanks Kamal for the comments.
> > > > KIP updated.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > > On Fri, Jul 26, 2024 at 6:56 PM Kamal Chandraprakash <
> > > > kamal.chandraprak...@gmail.com> wrote:
> > > >
> > > >> Luke,
> > > >>
> > > >> Thanks for confirming the topic config change validation on the
> > controller
> > > >> and updating the KIP.
> >

Re: [DISCUSS] KIP-1073 Return inactive observer nodes in DescribeQuorum response

2024-08-08 Thread Luke Chen
Hi Gantigmaa,

Thanks for the KIP!
The motivation and change looks good to me.

Some comments:
1. typo: When a KRaft broker node shuts down, it is in "fenced" state, not
"unfenced" state
2. Will the "–include-inactive-observers" option apply to "
kafka-metadata-quorum.sh describe --replication"?
I don't think we must have it, but maybe it's useful to let users know the
offset lag for the inactive observers?

Thank you.
Luke

On Thu, Jul 25, 2024 at 9:21 PM Gantigmaa Selenge 
wrote:

> Hi everyone,
>
> I would like to start a discussion on KIP-1073 that includes inactive
> observer nodes in the response for describeQuorum request.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1073%3A+Return+inactive+observer+nodes+in+DescribeQuorum+response
>
> The initial discussion on this issue is here, if you would like to see what
> was previously discussed:
> https://lists.apache.org/list.html?dev@kafka.apache.org
>
> Any feedback and suggestions for the KIP are welcome in this email thread.
>
> Thank you.
> Regards,
> Gantigmaa Selenge
>


Re: [VOTE] KIP-877: Mechanism for plugins and connectors to register metrics

2024-08-08 Thread Luke Chen
Hi Mickael,

Thanks for the KIP.
+1 (binding) from me.

Thanks.
Luke

On Fri, Aug 2, 2024 at 4:57 AM Tom Bentley  wrote:

> +1 (binding).
>
> Thanks Mickael!
>
> On Thu, 1 Aug 2024 at 05:12, Mickael Maison 
> wrote:
>
> > Hi,
> >
> > Bumping this thread to get some more votes and/or feedback.
> >
> > As I restarted the vote on June 10 after major changes, I'm only
> > counting votes since then.
> > So we have 1 binding (Chris) and 1 non-binding (Hector) votes.
> >
> > Thanks,
> > Mickael
> >
> > On Mon, Jul 8, 2024 at 8:40 PM Hector Geraldino (BLOOMBERG/ 919 3RD A)
> >  wrote:
> > >
> > > This will help eliminate some boilerplate code we have for our
> > connectors.
> > >
> > > +1 (non-binding)
> > >
> > > From: dev@kafka.apache.org At: 06/25/24 04:30:27 UTC-4:00To:
> > dev@kafka.apache.org
> > > Subject: Re: [VOTE] KIP-877: Mechanism for plugins and connectors to
> > register metrics
> > >
> > > Bumping this thread.
> > >
> > > Let me know if you have any feedback.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Mon, Jun 10, 2024 at 1:44 PM Chris Egerton  >
> > wrote:
> > > >
> > > > +1 (binding), thanks Mickael!
> > > >
> > > > On Mon, Jun 10, 2024, 04:24 Mickael Maison  >
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Following the feedback in the DISCUSS thread, I made significant
> > > > > changes to the proposal. So I'd like to restart a vote for KIP-877:
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-877%3A+Mechanism+for+plugi
> > > ns+and+connectors+to+register+metrics
> > > > >
> > > > > Thanks,
> > > > > Mickael
> > > > >
> > > > > On Thu, Jan 25, 2024 at 2:59 AM Tom Bentley 
> > wrote:
> > > > > >
> > > > > > Hi Mickael,
> > > > > >
> > > > > > You'll have seen that I left some comments on the discussion
> > thread, but
> > > > > > they're minor enough that I'm happy to vote +1 here.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > > On Thu, 11 Jan 2024 at 06:14, Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Bumping this thread since I've not seen any feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Mickael
> > > > > > >
> > > > > > > On Tue, Dec 19, 2023 at 10:03 AM Mickael Maison
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'd like to start a vote on KIP-877:
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-877%3A+Mechanism+for+plugi
> > > ns+and+connectors+to+register+metrics
> > > > > > > >
> > > > > > > > Let me know if you have any feedback.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Mickael
> > > > > > >
> > > > > > >
> > > > >
> > >
> > >
> >
> >
>


Re: [DISCUSS] KIP-1075: Introduce delayed remote list offsets purgatory to make LIST_OFFSETS async

2024-08-08 Thread Luke Chen
Hi Kamal,

Thanks for the KIP!
I think it is a good improvement to avoid users making many list_offset
requests to starve other high priority requests.

Questions:
1. When consumer poll the partitions first time, it'll also try to call
list_offset to get the offset to fetch.
If this offset is located in remote storage, the consumer will need to wait
for more time to start fetch real data.
Doesn't that conflict with what we are trying to achieve?

2. Since we will add a new timeout attribute to `ListOffsetsRequest
`,
please write it out clearly, like other KIP did (ex: KIP-1073

)
3. We added a new timeout attribute for the ListOffsetsRequest
,
as well as a remote.list.offsets.request.timeout.ms broker config.
Could you explain why we need 2 timeout configurations? I guess the broker
one is for the purgatory?
What will happen if the client timeout > broker timeout? And client timeout
< broker timeout?

Thank you.
Luke



On Mon, Aug 5, 2024 at 8:59 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Bumping this thread. Please take a look.
>
> On Fri, Aug 2, 2024 at 12:32 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi all,
> >
> > I would like to start a discussion thread on KIP-1075
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1075%3A+Introduce+delayed+remote+list+offsets+purgatory+to+make+LIST_OFFSETS+async>
> to
> > make the remote LIST_OFFSETS an async operation.
> >
> > The KIP is here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1075%3A+Introduce+delayed+remote+list+offsets+purgatory+to+make+LIST_OFFSETS+async
> >
> > Draft PR: https://github.com/apache/kafka/pull/16602
> >
> > Please take a look. Feedbacks and suggestions are welcome.
> >
> > Thanks,
> > Kamal
> >
>


  1   2   3   4   5   6   7   8   9   10   >