Re: [VOTE] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

2023-05-30 Thread ziming deng
Hello ShunKang,

+1(binding) from me

--
Thanks,
Ziming

> On May 30, 2023, at 20:07, ShunKang Lin  wrote:
> 
> Hi all,
> 
> Bump this thread again and see if we could get a few more votes. Currently
> we have +2 non-binding and +1 binding.
> Hoping we can get this approved, reviewed, and merged in time for 3.6.0.
> 
> Best,
> ShunKang
> 
> ShunKang Lin  于2023年5月7日周日 15:24写道:
> 
>> Hi everyone,
>> 
>> I'd like to open the vote for KIP-872, which proposes to add
>> Serializer#serializeToByteBuffer() to reduce memory copying.
>> 
>> The proposal is here:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
>> 
>> The pull request is here:
>> https://github.com/apache/kafka/pull/12685
>> 
>> Thanks to all who reviewed the proposal, and thanks in advance for taking
>> the time to vote!
>> 
>> Best,
>> ShunKang
>> 



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1878

2023-05-30 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-05-30 Thread Walker Carlson
Thanks for all the additional comments. I will either address them here or
update the kip accordingly.


I mentioned a follow kip to add extra features before and in the responses.
I will try to briefly summarize what options and optimizations I plan to
include. If a concern is not covered in this list I for sure talk about it
below.

* Allowing non versioned tables to still use the stream buffer
* Automatically materializing tables instead of forcing the user to do it
* Configurable for in memory buffer
* Order the records in offset order or in time order
* Non memory use buffer (offset order, delayed pull from stream.)
* Time synced between stream and table side (maybe)
* Do not drop late records and process them as they come in instead.


First, Victoria.

1) (One of your nits covers this, but you are correct it doesn't make
sense. so I removed that part of the example.)
For those examples with the "bad" join results I said without buffering the
stream it would look like that, but that was incomplete. If the look up was
simply looking at the latest version of the table when the stream records
came in then the results were possible. If we are using the point in time
lookup that versioned tables let us then you are correct the future results
are not possible.

2) I'll get to this later as Matthias brought up something related.

To your additional thoughts, I agree that we need to call those things out
in the documentation. I'm writing up a follow up kip with a lot of the
ideas we have discussed so that we can improve this feature beyond the base
implementation if it's needed.

I addressed the nits in the kip. I somehow missed the table stream table
join processor improvement, it makes your first question make a lot more
sense.  Table history retention is a much cleaner way to describe it.

As to your mention of the syncing the time for the table and stream.
Matthias mentioned that as well. I will address both here. I plan to bring
that up in the future, but for now we will leave it out. I suppose it will
be more useful after the table history retention is separable from the
table grace period.


To address Matthias comments.

You are correct by saying the in memory store shouldn't cause any semantic
concerns. My concern would be more with if we limited the number of records
on the buffer and what we would do if we hit said limits, (emitting those
records might be an issue, throwing an error and halting would not). I
think we can leave this discussion to the follow up kip along with a few
other options.

I will go through your proposals now.

  - don't support non-versioned KTables

Sure, we can always expand this later on. Will include as part of the of
the improvement kip

  - if grace period is added, users need to explicitly materialize the
table as version (either directly, or upstream. Upstream only works if
downstream tables "inherit" versioned semantics -- cf KIP-914)

again, that works for me for now, if we find a use we can always add later.

  - the table's history retention time must be larger than the grace
period (should be easy to check at runtime, when we build the topology)

agreed

  - because switching from non-versioned to version stores is not
backward compatibly (cf KIP-914), users need to take care of this
themselves, and this also implies that adding grace period is not a
backward compatible change (even only if via indirect means)

sure, this works

As to the dropping of late records, I'm not sure. One one hand I like not
dropping things. But on the other I struggle to see how a user can filter
out late records that might have incomplete join results. The point in time
look up will aggressively expire old data and if new data has been replaced
it will return null if outside of the retention. This seems like it could
corrupt the integrity of the join output. Seeing that we drop late records
on the table side as well I would think it makes sense to drop late records
on the stream buffer. I could be convinced otherwise I suppose, I could see
adding this as an option in a follow up kip. It would be very easy to
implement either way. For now unless no one else objects I'm going to stick
with dropping the records for the sake of getting this kip passed. It is
functionally a small change to make and we can update later if you feel
strongly about it.

For the ordering. I have to say that it would be more complicated to
implement it to be in offset order, if the goal it to get as many of the
records validly joined as possible. Because we would process as things left
the buffer a sufficiency early enough record could hold up records that
would otherwise be valid past the table history retention. To fix this we
could process by timestamp then store in a second queue and emit by offset,
but that would be a lot more complicated. If we didn't care about not
missing some valid joins we could just have no store and pull from the
topic at a delay only caring about the timestamp of the next offset. 

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

2023-05-30 Thread Beyene, Mehari
Thank you for the feedback Justine. Yes, I can expand on the KIP what the 
behavior is when we return INVALID_TIMESTAMP.


On 5/30/23, 1:34 PM, "Justine Olshan" mailto:jols...@confluent.io.inva>LID> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






Hi Mehari,


This is an interesting KIP! I've seen my fair share of issues due to future
timestamps, so this is definitely an area that could be improved.


I noticed in the compatibility section it says:
> There are no changes to public interfaces that will impact clients.
However, this change is considered breaking, as messages with future
timestamps that were previously accepted by the broker will now be rejected.


It seems that INVALID_TIMESTAMP is not retriable so it will fail the batch.
I think that if we expect to handle this case just as we do for the already
existing invalid_timestamp error, we should include some information on how
that is handled in the KIP.


Thanks,
Justine




On Tue, May 30, 2023 at 1:07 PM Beyene, Mehari mailto:meh...@amazon.com.inva>lid>
wrote:


> Hi Everyone,
>
> I would like to start a discussion on KIP-937: Improve Message Timestamp
> Validation (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation
>  
> 
> ).
> This is a small KIP that aims to tighten the current validation logic of a
> message timestamp.
>
> Thanks,
> Mehari
>





Re: [DISCUSS] KIP-925: rack aware task assignment in Kafka Streams

2023-05-30 Thread Sophie Blee-Goldman
Hey Hao, thanks for the KIP!

1. There's a typo in the "internal.rack.aware.assignment.strategry" config,
this
should be internal.rack.aware.assignment.strategy.

2.

>  For O(E^2 * (CU)) complexity, C and U can be viewed as constant. Number of
> edges E is T * N where T is the number of clients and N is the number of
> Tasks. This is because a task can be assigned to any client so there will
> be an edge between every task and every client. The total complexity would
> be O(T * N) if we want to be more specific.

I feel like I'm missing something here, but if E = T * N and the complexity
is ~O(E^2), doesn't
this make the total complexity order of O(T^2 * N^2)?

3.

> Since 3.C.I and 3.C.II have different tradeoffs and work better in
> different workloads etc, we

could add an internal configuration to choose one of them at runtime.
>
Why only an internal configuration? Same goes for
internal.rack.aware.assignment.standby.strategry (which also has the typo)

4.

>  There are no changes in public interfaces.

I think it would be good to explicitly call out that users can utilize this
new feature by setting the
ConsumerConfig's CLIENT_RACK_CONFIG, possibly with a brief example

5.

> The idea is that if we always try to make it overlap as much with
> HAAssignor’s target

assignment, at least there’s a higher chance that tasks won’t be shuffled a
> lot if the clients

remain the same across rebalances.
>
This line definitely gave me some pause -- if there was one major takeaway
I had after KIP-441,
one thing that most limited the feature's success, it was our assumption
that clients are relatively
stable across rebalances. This was mostly true at limited scale or for
on-prem setups, but
unsurprisingly broke down in cloud environments or larger clusters. Not
only do clients naturally
fall in and out of the group, autoscaling is becoming more and more of a
thing.

Lastly, and this is more easily solved but still worth calling out, an
assignment is only deterministic
as long as the client.id is persisted. Currently in Streams, we only write
the process UUID to the
state directory if there is one, ie if at least one persistent stateful
task exists in the topology. This
made sense in the context of KIP-441, which targeted heavily stateful
deployments, but this KIP
presumably intends to target more than just the persistent & stateful
subset of applications. To
make matters even worse,  "persistent" is defined in a semantically
inconsistent way throughout
Streams.

All this is to say, it may sound more complicated to remember the previous
assignment, but (a)
imo it only introduces a lot more complexity and shaky assumptions to
continue down this
path, and (b) we actually already do persist some amount of state, like the
process UUID, and
(c) it seems like this is the perfect opportunity to finally rid ourselves
of the determinism constraint
which has frankly caused more trouble and time lost in sum than it would
have taken us to just
write the HighAvailabilityTaskAssignor to consider the previous assignment
from the start in KIP-441

6.

> StickyTaskAssignor  users who would like to use rack aware assignment
> should upgrade their

Kafka Streams version to the version in which HighAvailabilityTaskAssignor
> and rack awareness

assignment are available.

Building off of the above, the HAAssignor hasn't worked out perfectly for
everybody up until now,
given that we are only adding complexity to it now, on the flipside I would
hesitate to try and force
everyone to use it if they want to upgrade. We added a "secret" backdoor
internal config to allow
users to set the task assignor back in KIP-441 for this reason. WDYT about
bumping this to a public
config on the side in this KIP?


On Tue, May 23, 2023 at 11:46 AM Hao Li  wrote:

> Thanks John! Yeah. The ConvergenceTest looks very helpful. I will add it to
> the test plan. I will also add tests to verify the new optimizer will
> produce a balanced assignment which has no worse cross AZ cost than the
> existing assignor.
>
> Hao
>
> On Mon, May 22, 2023 at 3:39 PM John Roesler  wrote:
>
> > Hi Hao,
> >
> > Thanks for the KIP!
> >
> > Overall, I think this is a great idea. I always wanted to circle back
> > after the Smooth Scaling KIP to put a proper optimization algorithm into
> > place. I think this has the promise to really improve the quality of the
> > balanced assignments we produce.
> >
> > Thanks for providing the details about the MaxCut/MinFlow algorithm. It
> > seems like a good choice for me, assuming we choose the right scaling
> > factors for the weights we add to the graph. Unfortunately, I don't think
> > that there's a good way to see how easy or hard this is going to be until
> > we actually implement it and test it.
> >
> > That leads to the only real piece of feedback I had on the KIP, which is
> > the testing portion. You mentioned system/integration/unit tests, but
> > there's not too much information about what those tests will do. I'd like
> > to 

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

2023-05-30 Thread Justine Olshan
Hi Mehari,

This is an interesting KIP! I've seen my fair share of issues due to future
timestamps, so this is definitely an area that could be improved.

I noticed in the compatibility section it says:
> There are no changes to public interfaces that will impact clients.
However, this change is considered breaking, as messages with future
timestamps that were previously accepted by the broker will now be rejected.

It seems that INVALID_TIMESTAMP is not retriable so it will fail the batch.
I think that if we expect to handle this case just as we do for the already
existing invalid_timestamp error, we should include some information on how
that is handled in the KIP.

Thanks,
Justine


On Tue, May 30, 2023 at 1:07 PM Beyene, Mehari 
wrote:

> Hi Everyone,
>
> I would like to start a discussion on KIP-937: Improve Message Timestamp
> Validation (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation
> ).
> This is a small KIP that aims to tighten the current validation logic of a
> message timestamp.
>
> Thanks,
> Mehari
>


[DISCUSS] KIP-937 Improve Message Timestamp Validation

2023-05-30 Thread Beyene, Mehari
Hi Everyone,

I would like to start a discussion on KIP-937: Improve Message Timestamp 
Validation 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation).
This is a small KIP that aims to tighten the current validation logic of a 
message timestamp.

Thanks,
Mehari


Re: [VOTE] KIP-925: rack aware task assignment in Kafka Streams

2023-05-30 Thread Colt McNealy
+1 (non-binding)

Thank you Hao!

Colt McNealy

*Founder, LittleHorse.dev*


On Tue, May 30, 2023 at 9:50 AM Hao Li  wrote:

> Hi all,
>
> I'd like to open the vote for KIP-925: rack aware task assignment in Kafka
> Streams. The link for the KIP is
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
> .
>
> --
> Thanks,
> Hao
>


Re: [VOTE] 3.4.1 RC3

2023-05-30 Thread Chris Egerton
Hi Luke,

Many thanks for your continued work on this release!

To verify, I:
- Built from source using Java 11 with both:
- - the 3.4.1-rc3 tag on GitHub
- - the kafka-3.4.1-src.tgz artifact from
https://home.apache.org/~showuon/kafka-3.4.1-rc3/
- Checked signatures and checksums
- Ran the quickstart using the kafka_2.13-3.4.1.tgz artifact from
https://home.apache.org/~showuon/kafka-3.4.1-rc3/ with Java 11 and Scala 13
in KRaft mode
- Ran all unit tests
- Ran all integration tests for Connect and MM2

+1 (binding)

Cheers,

Chris

On Tue, May 30, 2023 at 11:16 AM Mickael Maison 
wrote:

> Hi Luke,
>
> I built from source with Java 11 and Scala 2.13 and ran the unit and
> integration tests. It took a few retries to get some of them to pass.
> I verified signatures and hashes and also ran the zookeeper quickstart.
>
> +1 (binding)
>
> Thanks,
> Mickael
>
> On Sat, May 27, 2023 at 12:58 PM Jakub Scholz  wrote:
> >
> > +1 (non-binding) ... I used the staged binaries and Maven artifacts to
> run
> > my tests and all seems to work fine.
> >
> > Thanks for running the release.
> >
> > Jakub
> >
> > On Fri, May 26, 2023 at 9:34 AM Luke Chen  wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the 4th candidate for release of Apache Kafka 3.4.1.
> > >
> > > This is a bugfix release with several fixes since the release of
> 3.4.0. A
> > > few of the major issues include:
> > > - core
> > > KAFKA-14644 
> Process
> > > should stop after failure in raft IO thread
> > > KAFKA-14946  KRaft
> > > controller node shutting down while renouncing leadership
> > > KAFKA-14887  ZK
> session
> > > timeout can cause broker to shutdown
> > > - client
> > > KAFKA-14639  Kafka
> > > CooperativeStickyAssignor revokes/assigns partition in one rebalance
> cycle
> > > - connect
> > > KAFKA-12558  MM2
> may
> > > not
> > > sync partition offsets correctly
> > > KAFKA-14666  MM2
> should
> > > translate consumer group offsets behind replication flow
> > > - stream
> > > KAFKA-14172  bug:
> State
> > > stores lose state when tasks are reassigned under EOS
> > >
> > >
> > > Release notes for the 3.4.1 release:
> > > https://home.apache.org/~showuon/kafka-3.4.1-rc3/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Jun 2, 2023
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~showuon/kafka-3.4.1-rc3/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~showuon/kafka-3.4.1-rc3/javadoc/
> > >
> > > * Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
> > > https://github.com/apache/kafka/releases/tag/3.4.1-rc3
> > >
> > > * Documentation: (will be updated after released)
> > > https://kafka.apache.org/34/documentation.html
> > >
> > > * Protocol: (will be updated after released)
> > > https://kafka.apache.org/34/protocol.html
> > >
> > > The most recent build has had test failures. These all appear to be
> due to
> > > flakiness, but it would be nice if someone more familiar with the
> failed
> > > tests could confirm this. I may update this thread with passing build
> links
> > > if I can get one, or start a new release vote thread if test failures
> must
> > > be addressed beyond re-running builds until they pass.
> > >
> > > Unit/integration tests:
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/141/
> > >
> > > System tests:
> > > Will update the results later
> > >
> > > Thank you
> > > Luke
> > >
>


Requesting permission to contribute

2023-05-30 Thread Igor Buzatovic
Hi,
I'm kindly asking for permission to contribute to the Apache Kafka project,
as described on
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
page.

My Apache Wiki ID: beegor
My Apache JIRA ID: beegor

Thanks in advance,
Igor


[VOTE] KIP-925: rack aware task assignment in Kafka Streams

2023-05-30 Thread Hao Li
Hi all,

I'd like to open the vote for KIP-925: rack aware task assignment in Kafka
Streams. The link for the KIP is
https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
.

-- 
Thanks,
Hao


Re: [DISCUSS] KIP-932: Queues for Kafka

2023-05-30 Thread Andrew Schofield
Yes, that’s it. I imagine something similar to KIP-848 for managing the share 
group
membership, and consumers that fetch records from their assigned partitions and
acknowledge when delivery completes.

Thanks,
Andrew

> On 30 May 2023, at 16:52, Adam Warski  wrote:
>
> Thanks for the explanation!
>
> So effectively, a share group is subscribed to each partition - but the data 
> is not pushed to the consumer, but only sent on demand. And when demand is 
> signalled, a batch of messages is sent?
> Hence it would be up to the consumer to prefetch a sufficient number of 
> batches to ensure, that it will never be "bored"?
>
> Adam
>
>> On 30 May 2023, at 15:25, Andrew Schofield  wrote:
>>
>> Hi Adam,
>> Thanks for your question.
>>
>> With a share group, each fetch is able to grab available records from any 
>> partition. So, it alleviates
>> the “head-of-line” blocking problem where a slow consumer gets in the way. 
>> There’s no actual
>> stealing from a slow consumer, but it can be overtaken and must complete its 
>> processing within
>> the timeout.
>>
>> The way I see this working is that when a consumer joins a share group, it 
>> receives a set of
>> assigned share-partitions. To start with, every consumer will be assigned 
>> all partitions. We
>> can be smarter than that, but I think that’s really a question of writing a 
>> smarter assignor
>> just as has occurred over the years with consumer groups.
>>
>> Only a small proportion of Kafka workloads are super high throughput. Share 
>> groups would
>> struggle with those I’m sure. Share groups do not diminish the value of 
>> consumer groups
>> for streaming. They just give another option for situations where a 
>> different style of
>> consumption is more appropriate.
>>
>> Thanks,
>> Andrew
>>
>>> On 29 May 2023, at 17:18, Adam Warski  wrote:
>>>
>>> Hello,
>>>
>>> thank you for the proposal! A very interesting read.
>>>
>>> I do have one question, though. When you subscribe to a topic using 
>>> consumer groups, it might happen that one consumer has processed all 
>>> messages from its partitions, while another one still has a lot of work to 
>>> do (this might be due to unbalanced partitioning, long processing times 
>>> etc.). In a message-queue approach, it would be great to solve this problem 
>>> - so that a consumer that is free can steal work from other consumers. Is 
>>> this somehow covered by share groups?
>>>
>>> Maybe this is planned as "further work", as indicated here:
>>>
>>> "
>>> It manages the topic-partition assignments for the share-group members. An 
>>> initial, trivial implementation would be to give each member the list of 
>>> all topic-partitions which matches its subscriptions and then use the 
>>> pull-based protocol to fetch records from all partitions. A more 
>>> sophisticated implementation could use topic-partition load and lag metrics 
>>> to distribute partitions among the consumers as a kind of autonomous, 
>>> self-balancing partition assignment, steering more consumers to busier 
>>> partitions, for example. Alternatively, a push-based fetching scheme could 
>>> be used. Protocol details will follow later.
>>> "
>>>
>>> but I’m not sure if I understand this correctly. A fully-connected graph 
>>> seems like a lot of connections, and I’m not sure if this would play well 
>>> with streaming.
>>>
>>> This also seems as one of the central problems - a key differentiator 
>>> between share and consumer groups (the other one being persisting state of 
>>> messages). And maybe the exact way we’d want to approach this would, to a 
>>> certain degree, dictate the design of the queueing system?
>>>
>>> Best,
>>> Adam Warski
>>>
>>> On 2023/05/15 11:55:14 Andrew Schofield wrote:
 Hi,
 I would like to start a discussion thread on KIP-932: Queues for Kafka. 
 This KIP proposes an alternative to consumer groups to enable cooperative 
 consumption by consumers without partition assignment. You end up with 
 queue semantics on top of regular Kafka topics, with per-message 
 acknowledgement and automatic handling of messages which repeatedly fail 
 to be processed.

 https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka

 Please take a look and let me know what you think.

 Thanks.
 Andrew
>>>
>>
>
> --
> Adam Warski
>
> https://www.softwaremill.com/
> https://twitter.com/adamwarski




Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-30 Thread Greg Harris
Hey all,

I received an invitation to collaborate on apache/kafka, but let the
invitation expire after 7 days.
Is there a workflow for refreshing the invite, or is an admin able to
manually re-invite me?
I'm gharris1727 on github.

Thanks!
Greg

On Wed, May 24, 2023 at 9:32 AM Justine Olshan
 wrote:
>
> Hey Yash,
> I'm not sure how it used to be for sure, but I do remember we used to have
> a different build system. I wonder if this used to work with the old build
> system and not any more.
> I'd be curious if other projects have something similar and how it works.
>
> Thanks,
> Justine
>
> On Wed, May 24, 2023 at 9:22 AM Yash Mayya  wrote:
>
> > Hi Justine,
> >
> > Thanks for the response. Non-committers don't have Apache accounts; are you
> > suggesting that there wasn't a need to sign in earlier and a change in this
> > policy is restricting collaborators from triggering Jenkins builds?
> >
> > Thanks,
> > Yash
> >
> > On Wed, May 24, 2023 at 9:30 PM Justine Olshan
> > 
> > wrote:
> >
> > > Yash,
> > >
> > > When I rebuild, I go to the CloudBees CI page and I have to log in with
> > my
> > > apache account.
> > > Not sure if the change in the build system or the need to sign in is part
> > > of the problem.
> > >
> > >
> > > On Wed, May 24, 2023 at 4:54 AM Federico Valeri 
> > > wrote:
> > >
> > > > +1 on Divij suggestions
> > > >
> > > >
> > > > On Wed, May 24, 2023 at 12:04 PM Divij Vaidya  > >
> > > > wrote:
> > > > >
> > > > > Hey folks
> > > > >
> > > > > A week into this experiment, I am finding the ability to add labels,
> > > > > request for reviewers and ability to close PRs very useful.
> > > > >
> > > > > 1. May I suggest an improvement to the process by requesting for some
> > > > > guidance on the interest areas for various committers. This would
> > help
> > > us
> > > > > request for reviews from the right set of individuals.
> > > > > As a reference, we have tried something similar with Apache TinkerPop
> > > > (see
> > > > > TinkerPop Contributors section at the end) [1], where the committers
> > > self
> > > > > identify their preferred area of interest.
> > > > >
> > > > > 2. I would also request creation of the following new labels:
> > > > > tiered-storage, transactions, security, refactor, zk-migration,
> > > > > first-contribution (so that we can prioritize reviews for first time
> > > > > contributors as an encouragement), build, metrics
> > > > >
> > > > > [1] https://tinkerpop.apache.org/
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 15, 2023 at 11:07 PM John Roesler 
> > > > wrote:
> > > > >
> > > > > > Hello again, all,
> > > > > >
> > > > > > Just a quick update: after merging the changes to asf.yaml, I
> > > received
> > > > a
> > > > > > notification that the list is limited to only 10 people, not 20 as
> > > the
> > > > > > documentation states.
> > > > > >
> > > > > > Here is the list of folks who will now be able to triage PRs and
> > > > trigger
> > > > > > builds: Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy,
> > Yash
> > > > > > Mayya, Philip Nee, vamossagar12, Christo Lolov, Federico Valeri,
> > and
> > > > andymg3
> > > > > >
> > > > > > Thanks all,
> > > > > > -John
> > > > > >
> > > > > > On 2023/05/12 15:53:40 John Roesler wrote:
> > > > > > > Thanks again for bringing this up, David!
> > > > > > >
> > > > > > > As an update to the community, the PMC has approved a process to
> > > make
> > > > > > use of this feature.
> > > > > > >
> > > > > > > Here are the relevant updates:
> > > > > > >
> > > > > > > PR to add the policy:
> > > https://github.com/apache/kafka-site/pull/510
> > > > > > >
> > > > > > > PR to update the list:
> > https://github.com/apache/kafka/pull/13713
> > > > > > >
> > > > > > > Ticket to automate this process.. Contributions welcome :)
> > > > > > https://issues.apache.org/jira/browse/KAFKA-14995
> > > > > > >
> > > > > > > And to make sure it doesn't fall through the cracks in the mean
> > > time,
> > > > > > here's the release process step:
> > > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList
> > > > > > >
> > > > > > > Unfortunately, the "collaborator" feature only allows 20
> > usernames,
> > > > so
> > > > > > we have decided to simply take the top 20 non-committer authors
> > from
> > > > the
> > > > > > past year (according to git shortlog). Congratulations to our new
> > > > > > collaborators!
> > > > > > >
> > > > > > > Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash
> > > Mayya,
> > > > > > Philip Nee, vamossagar12, Christo Lolov, Federico Valeri, and
> > andymg3
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -John
> > > > > > >
> > > > > > > On 2023/04/27 18:45:09 David Arthur wrote:
> > > > > > > > Hey folks,
> > > > > > > >
> > > > > > > > I stumbled across this wiki page from the infra team that
> > > > describes the
> > > > > > > > various features supported in 

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-05-30 Thread Igor Soarez
Hi Alexandre,

Thank you for having a look at this KIP, and thank you for pointing this out.

I like the idea of expanding the health status of a log directory beyond
just online/offline status.

This KIP currently proposes a single logdir state transition, from
online to offline, conveyed in a list of logdir UUIDs sent in the new
field `LogDirsOfflined` as part of the broker heartbeat request.

It's nice that the request schema itself doesn't allow for a heartbeat
request to convey a state transition for a logdir from offline to online,
as that transition is not (currently) valid, as brokers need to be
restarted for logdirs to be allowed to come back online.

We could make changes now to accommodate for further logdir states in
the future, by instead conveying new state and logdir pairs in the
heartbeat request, for any logdir which had a state change.
But right now, that would look a bit strange since there's only one
state we'd allow to represented in the request – offline.

Since creating new request versions/schemas is relatively easy now, and
since this logdir QoS feature would merit a KIP anyway, I'm a bit more
inclined to keep things simple for now.

Best,

--
Igor



Re: [DISCUSS] KIP-932: Queues for Kafka

2023-05-30 Thread Adam Warski
Thanks for the explanation!

So effectively, a share group is subscribed to each partition - but the data is 
not pushed to the consumer, but only sent on demand. And when demand is 
signalled, a batch of messages is sent?
Hence it would be up to the consumer to prefetch a sufficient number of batches 
to ensure, that it will never be "bored"?

Adam

> On 30 May 2023, at 15:25, Andrew Schofield  wrote:
> 
> Hi Adam,
> Thanks for your question.
> 
> With a share group, each fetch is able to grab available records from any 
> partition. So, it alleviates
> the “head-of-line” blocking problem where a slow consumer gets in the way. 
> There’s no actual
> stealing from a slow consumer, but it can be overtaken and must complete its 
> processing within
> the timeout.
> 
> The way I see this working is that when a consumer joins a share group, it 
> receives a set of
> assigned share-partitions. To start with, every consumer will be assigned all 
> partitions. We
> can be smarter than that, but I think that’s really a question of writing a 
> smarter assignor
> just as has occurred over the years with consumer groups.
> 
> Only a small proportion of Kafka workloads are super high throughput. Share 
> groups would
> struggle with those I’m sure. Share groups do not diminish the value of 
> consumer groups
> for streaming. They just give another option for situations where a different 
> style of
> consumption is more appropriate.
> 
> Thanks,
> Andrew
> 
>> On 29 May 2023, at 17:18, Adam Warski  wrote:
>> 
>> Hello,
>> 
>> thank you for the proposal! A very interesting read.
>> 
>> I do have one question, though. When you subscribe to a topic using consumer 
>> groups, it might happen that one consumer has processed all messages from 
>> its partitions, while another one still has a lot of work to do (this might 
>> be due to unbalanced partitioning, long processing times etc.). In a 
>> message-queue approach, it would be great to solve this problem - so that a 
>> consumer that is free can steal work from other consumers. Is this somehow 
>> covered by share groups?
>> 
>> Maybe this is planned as "further work", as indicated here:
>> 
>> "
>> It manages the topic-partition assignments for the share-group members. An 
>> initial, trivial implementation would be to give each member the list of all 
>> topic-partitions which matches its subscriptions and then use the pull-based 
>> protocol to fetch records from all partitions. A more sophisticated 
>> implementation could use topic-partition load and lag metrics to distribute 
>> partitions among the consumers as a kind of autonomous, self-balancing 
>> partition assignment, steering more consumers to busier partitions, for 
>> example. Alternatively, a push-based fetching scheme could be used. Protocol 
>> details will follow later.
>> "
>> 
>> but I’m not sure if I understand this correctly. A fully-connected graph 
>> seems like a lot of connections, and I’m not sure if this would play well 
>> with streaming.
>> 
>> This also seems as one of the central problems - a key differentiator 
>> between share and consumer groups (the other one being persisting state of 
>> messages). And maybe the exact way we’d want to approach this would, to a 
>> certain degree, dictate the design of the queueing system?
>> 
>> Best,
>> Adam Warski
>> 
>> On 2023/05/15 11:55:14 Andrew Schofield wrote:
>>> Hi,
>>> I would like to start a discussion thread on KIP-932: Queues for Kafka. 
>>> This KIP proposes an alternative to consumer groups to enable cooperative 
>>> consumption by consumers without partition assignment. You end up with 
>>> queue semantics on top of regular Kafka topics, with per-message 
>>> acknowledgement and automatic handling of messages which repeatedly fail to 
>>> be processed.
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
>>> 
>>> Please take a look and let me know what you think.
>>> 
>>> Thanks.
>>> Andrew
>> 
> 

-- 
Adam Warski

https://www.softwaremill.com
https://twitter.com/adamwarski



Re: [VOTE] 3.4.1 RC3

2023-05-30 Thread Mickael Maison
Hi Luke,

I built from source with Java 11 and Scala 2.13 and ran the unit and
integration tests. It took a few retries to get some of them to pass.
I verified signatures and hashes and also ran the zookeeper quickstart.

+1 (binding)

Thanks,
Mickael

On Sat, May 27, 2023 at 12:58 PM Jakub Scholz  wrote:
>
> +1 (non-binding) ... I used the staged binaries and Maven artifacts to run
> my tests and all seems to work fine.
>
> Thanks for running the release.
>
> Jakub
>
> On Fri, May 26, 2023 at 9:34 AM Luke Chen  wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the 4th candidate for release of Apache Kafka 3.4.1.
> >
> > This is a bugfix release with several fixes since the release of 3.4.0. A
> > few of the major issues include:
> > - core
> > KAFKA-14644  Process
> > should stop after failure in raft IO thread
> > KAFKA-14946  KRaft
> > controller node shutting down while renouncing leadership
> > KAFKA-14887  ZK session
> > timeout can cause broker to shutdown
> > - client
> > KAFKA-14639  Kafka
> > CooperativeStickyAssignor revokes/assigns partition in one rebalance cycle
> > - connect
> > KAFKA-12558  MM2 may
> > not
> > sync partition offsets correctly
> > KAFKA-14666  MM2 should
> > translate consumer group offsets behind replication flow
> > - stream
> > KAFKA-14172  bug: State
> > stores lose state when tasks are reassigned under EOS
> >
> >
> > Release notes for the 3.4.1 release:
> > https://home.apache.org/~showuon/kafka-3.4.1-rc3/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Jun 2, 2023
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~showuon/kafka-3.4.1-rc3/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~showuon/kafka-3.4.1-rc3/javadoc/
> >
> > * Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.4.1-rc3
> >
> > * Documentation: (will be updated after released)
> > https://kafka.apache.org/34/documentation.html
> >
> > * Protocol: (will be updated after released)
> > https://kafka.apache.org/34/protocol.html
> >
> > The most recent build has had test failures. These all appear to be due to
> > flakiness, but it would be nice if someone more familiar with the failed
> > tests could confirm this. I may update this thread with passing build links
> > if I can get one, or start a new release vote thread if test failures must
> > be addressed beyond re-running builds until they pass.
> >
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/141/
> >
> > System tests:
> > Will update the results later
> >
> > Thank you
> > Luke
> >


[jira] [Created] (KAFKA-15039) Reduce logging level to trace in PartitionChangeBuilder.tryElection()

2023-05-30 Thread Ron Dagostino (Jira)
Ron Dagostino created KAFKA-15039:
-

 Summary: Reduce logging level to trace in 
PartitionChangeBuilder.tryElection()
 Key: KAFKA-15039
 URL: https://issues.apache.org/jira/browse/KAFKA-15039
 Project: Kafka
  Issue Type: Improvement
  Components: kraft
Reporter: Ron Dagostino
Assignee: Ron Dagostino
 Fix For: 3.6.0


A CPU profile in a large cluster showed PartitionChangeBuilder.tryElection() 
taking significant CPU due to logging.  Decrease the logging statements in that 
method from debug level to trace to mitigate the impact of this CPU hog under 
normal operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-921 OpenJDK CRaC support

2023-05-30 Thread Divij Vaidya
Thank you for mentioning the use case.

I will try to summarize what you mentioned above in my own words to ensure
that I have understood this correctly.
There are practical use cases where the user application has a dependency
on Kafka producer and requires very quick cold starts. In such cases, CRaC
could be used to restore the user application. However, since Kafka clients
do not support CRaC, they become a bottleneck for snapshotting the user
application. With this change, we will provide the ability for user
applications to snapshot including a clean restore of the kafka clients.

Now, having understood the motivation, I have the following comments:

#1 I think we need to make it very clear that a snapshot of a kafka
producer will mean:
- Losing all records which are waiting to be sent to the broker (stored in
RecordAccumulator). Perhaps, we need to print logs (warn level) and emit
metrics on suspend wrt how many requests were waiting to be sent.

#2 We need to document all customer facing changes in the KIP. As an
example, the metrics that will be added, changes to any interface that is
public facing (I don't think any public interface change is required) and
changes to customer facing behaviour (we need to add documentation around
this).

#3 We need to document what happens to existing uncommitted transactions. I
think the answer would probably be that the behaviour will be similar to a
client closing an existing connection but if we don't close things properly
(such as in transactionManager), there may be lingering effects.

Question:
#4 Does this JDK feature retain the state of the local data structures such
as a Map/List? If yes, do we want to add validation of recovered data as
part of the startup process?
#5 How do we plan to test the correctness of the recovery/restore?

Overall, I think that simply restoring the Sender might not work. We will
probably need to devise a clean "suspend" mechanism for the entire producer
including components such as TransactionManager, Partitioner, Record
Accumulator etc. Additionally, Justine (github: jolshan) is our expert on
producer side things, perhaps, they have a better idea to create a clean
producer suspension.

--
Divij Vaidya



On Thu, May 25, 2023 at 11:15 AM Radim Vansa 
wrote:

> Hi Divij,
>
> I have prototyped this using Quarkus Superheroes [1], a demo application
> consisting of several microservices that communicate with each other
> using both HTTP and Kafka. I wanted to add the ability to transparently
> checkpoint and restore this application - while the regular startup
> takes seconds, the restore could bring this application online in the
> order of tens of milliseconds.
>
> I agree that the change will not help Kafka itself to get any faster; it
> will enable CRaC for the whole application that, amongst other
> technologies, uses Kafka. You're saying that the clients are not
> supposed to be re-created quickly, I hope that a use case where the app
> is scaled down if it's idle e.g. 60 seconds and then needs to be started
> on a request (to serve it ASAP) would make sense to you. It's really not
> about Kafka per-se - it's about the needs of those who consume it. Of
> course, I'll be glad for any comments pointing out difficulties e.g. if
> the producer is replicated.
>
> An alternative, and less transparent approach, would handle this in the
> integration layer. However from my experience this can be problematic if
> the integration layer provides Kafka API directly, losing control over
> the instance - it's not possible to simply shutdown the client and
> reopen the instance, and some sort of proxy would be needed that
> prevents access to this closed instance. And besides complexity, proxy
> means degraded performance.
>
> Another motivation to push changes as far down the dependency tree is
> the fan-out of these changes: we don't want to target Quarkus
> specifically, but other frameworks (Spring Boot, ...) and stand-alone
> applications as well. By keeping it low level we can concentrate the
> maintenance efforts to once place.
>
> Thank you for spending time reviewing the proposal and let me know if I
> can clarify this further.
>
> Radim
>
>
> [1] https://quarkus.io/quarkus-workshops/super-heroes/
>
> On 24. 05. 23 17:13, Divij Vaidya wrote:
> > Caution: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
> >
> >
> > Hey Radim
> >
> > After reading the KIP, I am still not sure about the motivation for this
> > change. The bottleneck in starting a producer on Kafka is setup of the
> > network connection with the broker (since it performs SSL + AuthN). From
> > what I understand, checkpoint and restore is not going to help with that.
> > Also, Kafka clients are supposed to be long running clients which aren't
> > supposed to be destroyed and re-created quickly in a short span. I fail
> to
> > understand the benefit of using 

Re: [DISCUSS] KIP-932: Queues for Kafka

2023-05-30 Thread Andrew Schofield
Hi Adam,
Thanks for your question.

With a share group, each fetch is able to grab available records from any 
partition. So, it alleviates
the “head-of-line” blocking problem where a slow consumer gets in the way. 
There’s no actual
stealing from a slow consumer, but it can be overtaken and must complete its 
processing within
the timeout.

The way I see this working is that when a consumer joins a share group, it 
receives a set of
assigned share-partitions. To start with, every consumer will be assigned all 
partitions. We
can be smarter than that, but I think that’s really a question of writing a 
smarter assignor
just as has occurred over the years with consumer groups.

Only a small proportion of Kafka workloads are super high throughput. Share 
groups would
struggle with those I’m sure. Share groups do not diminish the value of 
consumer groups
for streaming. They just give another option for situations where a different 
style of
consumption is more appropriate.

Thanks,
Andrew

> On 29 May 2023, at 17:18, Adam Warski  wrote:
>
> Hello,
>
> thank you for the proposal! A very interesting read.
>
> I do have one question, though. When you subscribe to a topic using consumer 
> groups, it might happen that one consumer has processed all messages from its 
> partitions, while another one still has a lot of work to do (this might be 
> due to unbalanced partitioning, long processing times etc.). In a 
> message-queue approach, it would be great to solve this problem - so that a 
> consumer that is free can steal work from other consumers. Is this somehow 
> covered by share groups?
>
> Maybe this is planned as "further work", as indicated here:
>
> "
> It manages the topic-partition assignments for the share-group members. An 
> initial, trivial implementation would be to give each member the list of all 
> topic-partitions which matches its subscriptions and then use the pull-based 
> protocol to fetch records from all partitions. A more sophisticated 
> implementation could use topic-partition load and lag metrics to distribute 
> partitions among the consumers as a kind of autonomous, self-balancing 
> partition assignment, steering more consumers to busier partitions, for 
> example. Alternatively, a push-based fetching scheme could be used. Protocol 
> details will follow later.
> "
>
> but I’m not sure if I understand this correctly. A fully-connected graph 
> seems like a lot of connections, and I’m not sure if this would play well 
> with streaming.
>
> This also seems as one of the central problems - a key differentiator between 
> share and consumer groups (the other one being persisting state of messages). 
> And maybe the exact way we’d want to approach this would, to a certain 
> degree, dictate the design of the queueing system?
>
> Best,
> Adam Warski
>
> On 2023/05/15 11:55:14 Andrew Schofield wrote:
>> Hi,
>> I would like to start a discussion thread on KIP-932: Queues for Kafka. This 
>> KIP proposes an alternative to consumer groups to enable cooperative 
>> consumption by consumers without partition assignment. You end up with queue 
>> semantics on top of regular Kafka topics, with per-message acknowledgement 
>> and automatic handling of messages which repeatedly fail to be processed.
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
>>
>> Please take a look and let me know what you think.
>>
>> Thanks.
>> Andrew
>



Re: [DISCUSS] KIP-932: Queues for Kafka

2023-05-30 Thread Andrew Schofield
Hi Luke,
Thanks for your comments.

1) I expect that fetch-from-follower will not be supported for share groups. If 
you think about it,
FFF gives freedom to fetch records from a nearby broker, but it does not also 
give the
ability to commit offsets to a nearby broker. For a share-group, the 
share-partition leader is
intimately involved in which records are fetched and acknowledged.

2) I have two big areas to fill in with the KIP - the RPCs and the storage. I’m 
working on the
RPCs now. The storage will be next.

3) Yes, we need new metrics. I’ll put a placeholder in the next update to the 
KIP. I think it will
be easy to enumerate them once the proposal has stabilised.

Thanks,
Andrew


> On 29 May 2023, at 10:04, Luke Chen  wrote:
>
> Hi Andrew,
>
> Thanks for the KIP.
> Some high level questions:
> 1. How do we handle "fetch from follower" case?
> It looks like in current design, each call needs to go to "shared partition
> leader", where the shared state stored. Is my understanding correct?
>
> 2. Where does the state info stored?
> It looks like we only store them in the memory of "shared partition
> leader". What happened after the leader crashed and move to other ISR
> replica?
>
> 3. New metrics needed
> Since we're introducing a new kind of consumer group, I think there should
> be new metrics added for client and broker to monitor them.
>
> Thank you.
> Luke
>
> On Mon, May 29, 2023 at 1:01 PM Satish Duggana 
> wrote:
>
>> Minor correction on 103, latest instead of earliest for SPSO default value.
>>
>> 103 It talks about SPSO values, latest being the default and user
>> can reset it to a target offset timestamp. What is the maximum value
>> for SPEO? It is good to clarify what could be the maximum value for
>> SPSO and SPEO. It can be HW or LogStableOffset or some other value?
>>
>> Thanks,
>> Satish.
>>
>> On Mon, 29 May 2023 at 10:06, Satish Duggana 
>> wrote:
>>>
>>> Hi Andrew,
>>> Thanks for the nice KIP on a very interesting feature about
>>> introducing some of the traditional MessageQueue semantics to Kafka.
>>> It is good to see that we are extending the existing consumer groups
>>> concepts and related mechanisms for shared subscriptions instead of
>>> bringing any large architectural/protocol changes.
>>>
>>> This KIP talks about introducing a durable subscription feature for
>>> topics with multiple consumers consuming messages parallely from a
>>> single topic partition.
>>>
>>> 101 Are you planning to extend this functionality for queueing
>>> semantics like JMS point to point style in future?
>>>
>>> 102 When a message is rejected by the target consumer, how do users
>>> know what records/offsets are dropped because of the failed records
>>> due to rejection ack or due to timeouts etc before DLQs are
>>> introduced?
>>>
>>> 103 It talks about SPSO values, earliest being the default and user
>>> can reset it to a target offset timestamp. What is the maximum value
>>> for SPEO? It is good to clarify what could be the maximum value for
>>> SPSO and SPEO. It can be HW or LogStableOffset or some other value?
>>>
>>> 104 KIP mentions that "share.delivery.count.limit" as the maximum
>>> number of delivery attempts for a record delivered to a share group.
>>> But the actual delivery count may be more than this number as the
>>> leader may fail updating the delivery count as leader or consumer may
>>> fail and more delivery attempts may be made later. It may be the
>>> minimum number of delivery attempts instead of the maximum delivery
>>> attempts.
>>>
>>> Thanks,
>>> Satish.
>>>
>>>
>>> On Wed, 24 May 2023 at 21:26, Andrew Schofield
>>>  wrote:

 Hi Stanislav,
 Thanks for your email. You bring up some interesting points.

 1) Tiered storage
 I think the situation here for fetching historical data is equivalent
>> to what happens if a user resets the committed offset for a consumer
 group back to an earlier point in time. So, I will mention this in the
>> next update to the KIP document but I think there's nothing
 especially different here.

 2) SSO initialized to the latest offset
 The KIP does mention that it is possible for an administrator to set
>> the SSO using either AdminClient.alterShareGroupOffsets or
 kafka-share-groups.sh. It is entirely intentional that there is no
>> KafkaConsumer config for initializing the SSO. I know that's how it
 can be done for consumer groups, but it suffers from the situation
>> where different consumers have different opinions about
 the initial value (earliest vs latest) and then the first one in wins.
>> Also, KIP-842 digs into some problems with how consumer
 group offset reset works (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-842%3A+Add+richer+group+offset+reset+mechanisms)
>> so
 I've tried to sidestep those problems too.

 Another possibility is to follow KIP-848 which proposes that
>> AdminClient.incrementalAlterConfigs is enhanced to support a new
 

[jira] [Resolved] (KAFKA-14970) Dual write mode testing for SCRAM and Quota

2023-05-30 Thread Proven Provenzano (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Proven Provenzano resolved KAFKA-14970.
---
Resolution: Fixed

Committed and merged into 3.5.

> Dual write mode testing for SCRAM and Quota
> ---
>
> Key: KAFKA-14970
> URL: https://issues.apache.org/jira/browse/KAFKA-14970
> Project: Kafka
>  Issue Type: Test
>  Components: kraft
>Reporter: Proven Provenzano
>Assignee: Proven Provenzano
>Priority: Blocker
>  Labels: 3.5
>
> SCRAM and Quota are stored together in ZK and we need better testing to 
> validate the dual write mode support for them.
> I will add some additional tests for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-932: Queues for Kafka

2023-05-30 Thread Andrew Schofield
Hi Satish,
Thanks for your comments.

101 I am not planning to extend this functionality for queuing semantics like 
the JMS
point-to-point model in future. While this KIP does make it more viable to 
create a
relatively thin JMS client that talks Kafka protocol directly, there’s a lot 
more to achieve
full JMS support. If there’s a specific JMS concept that you had in mind, I’m 
happy to
discuss more.

102 Before DLQs are introduced, there is no way to know which records were
archived as undeliverable because of reaching the limit of delivery attempts or
explicit rejection. I think DLQ is the most important follow-on to this KIP.

103 I agree that the language around SPSO and SPEO could be improved and stating
what the maximum values are would be helpful. I’ll do that in the next revision.

104 I understand your point but I think I would express it differently. The 
durable record
of the delivery count could perhaps not be incremented in the event of a broker 
crash or
other failure. The point of the delivery count is not to guarantee exactly 5 
delivery attempts.
It’s to make sure that you don’t get an unlimited number of futile attempts for 
an
unprocessable record. If you get very rarely get an additional attempt because 
there was
a failure to update the durable record of the delivery count, that’s still 
meeting the overall
aim.

Thanks,
Andrew

> On 29 May 2023, at 05:36, Satish Duggana  wrote:
>
> Hi Andrew,
> Thanks for the nice KIP on a very interesting feature about
> introducing some of the traditional MessageQueue semantics to Kafka.
> It is good to see that we are extending the existing consumer groups
> concepts and related mechanisms for shared subscriptions instead of
> bringing any large architectural/protocol changes.
>
> This KIP talks about introducing a durable subscription feature for
> topics with multiple consumers consuming messages parallely from a
> single topic partition.
>
> 101 Are you planning to extend this functionality for queueing
> semantics like JMS point to point style in future?
>
> 102 When a message is rejected by the target consumer, how do users
> know what records/offsets are dropped because of the failed records
> due to rejection ack or due to timeouts etc before DLQs are
> introduced?
>
> 103 It talks about SPSO values, earliest being the default and user
> can reset it to a target offset timestamp. What is the maximum value
> for SPEO? It is good to clarify what could be the maximum value for
> SPSO and SPEO. It can be HW or LogStableOffset or some other value?
>
> 104 KIP mentions that "share.delivery.count.limit" as the maximum
> number of delivery attempts for a record delivered to a share group.
> But the actual delivery count may be more than this number as the
> leader may fail updating the delivery count as leader or consumer may
> fail and more delivery attempts may be made later. It may be the
> minimum number of delivery attempts instead of the maximum delivery
> attempts.
>
> Thanks,
> Satish.
>
>
> On Wed, 24 May 2023 at 21:26, Andrew Schofield
>  wrote:
>>
>> Hi Stanislav,
>> Thanks for your email. You bring up some interesting points.
>>
>> 1) Tiered storage
>> I think the situation here for fetching historical data is equivalent to 
>> what happens if a user resets the committed offset for a consumer
>> group back to an earlier point in time. So, I will mention this in the next 
>> update to the KIP document but I think there's nothing
>> especially different here.
>>
>> 2) SSO initialized to the latest offset
>> The KIP does mention that it is possible for an administrator to set the SSO 
>> using either AdminClient.alterShareGroupOffsets or
>> kafka-share-groups.sh. It is entirely intentional that there is no 
>> KafkaConsumer config for initializing the SSO. I know that's how it
>> can be done for consumer groups, but it suffers from the situation where 
>> different consumers have different opinions about
>> the initial value (earliest vs latest) and then the first one in wins. Also, 
>> KIP-842 digs into some problems with how consumer
>> group offset reset works 
>> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-842%3A+Add+richer+group+offset+reset+mechanisms)
>>  so
>> I've tried to sidestep those problems too.
>>
>> Another possibility is to follow KIP-848 which proposes that 
>> AdminClient.incrementalAlterConfigs is enhanced to support a new
>> resource type called GROUP and supporting a dynamic group config in this 
>> manner would give a single point of control.
>>
>> 3) Durable storage
>> The KIP does not yet describe how durable storage works. I have a few ideas 
>> that I want to flesh out before updating the KIP.
>>
>> I will rule out using a compacted topic though. The problem is that each 
>> record on a compacted topic is a key:value pair, and
>> it's not obvious what to use as the key. If it's the share group name, it 
>> needs the entire in-flight record state to be recorded in
>> one hit which is extremely 

Re: [VOTE] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

2023-05-30 Thread ShunKang Lin
Hi all,

Bump this thread again and see if we could get a few more votes. Currently
we have +2 non-binding and +1 binding.
Hoping we can get this approved, reviewed, and merged in time for 3.6.0.

Best,
ShunKang

ShunKang Lin  于2023年5月7日周日 15:24写道:

> Hi everyone,
>
> I'd like to open the vote for KIP-872, which proposes to add
> Serializer#serializeToByteBuffer() to reduce memory copying.
>
> The proposal is here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
>
> The pull request is here:
> https://github.com/apache/kafka/pull/12685
>
> Thanks to all who reviewed the proposal, and thanks in advance for taking
> the time to vote!
>
> Best,
> ShunKang
>


[GitHub] [kafka-site] fvaleri commented on pull request #516: MINOR: Add blog for 3.5.0 release

2023-05-30 Thread via GitHub


fvaleri commented on PR #516:
URL: https://github.com/apache/kafka-site/pull/516#issuecomment-1568312800

   Thanks @mimaison.
   
   In addition to the previous comments, should we also add a link for 
reporting issues with KRaft migration?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1877

2023-05-30 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15038) Use topic id/name mapping from the Metadata cache in RLM

2023-05-30 Thread Alexandre Dupriez (Jira)
Alexandre Dupriez created KAFKA-15038:
-

 Summary: Use topic id/name mapping from the Metadata cache in RLM
 Key: KAFKA-15038
 URL: https://issues.apache.org/jira/browse/KAFKA-15038
 Project: Kafka
  Issue Type: Sub-task
Reporter: Alexandre Dupriez
Assignee: Alexandre Dupriez


Currently, the {{RemoteLogManager}} maintains its own cache of topic name to 
topic id 
[[1]|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L138]
 using the information provided during leadership changes, and removing the 
mapping upon receiving the notification of partition stopped.

It should be possible to re-use the mapping in a broker's metadata cache, 
removing the need for the RLM to build and update a local cache thereby 
duplicating the information in the metadata cache. It also allows to preserve a 
single source of authority regarding the association between topic names and 
ids.

[1] 
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L138



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] divijvaidya commented on pull request #516: MINOR: Add blog for 3.5.0 release

2023-05-30 Thread via GitHub


divijvaidya commented on PR #516:
URL: https://github.com/apache/kafka-site/pull/516#issuecomment-1568142629

   Suggestion: It would be useful for the user persona if we can add links to 
the "[upgrade to 
3.5.0](https://kafka.apache.org/35/documentation.html#upgrade_3_5_0)" somewhere 
in the blog post as a call to action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-15037) initialize unifiedLog with remoteStorageSystemEnable correctly

2023-05-30 Thread Luke Chen (Jira)
Luke Chen created KAFKA-15037:
-

 Summary: initialize unifiedLog with remoteStorageSystemEnable 
correctly
 Key: KAFKA-15037
 URL: https://issues.apache.org/jira/browse/KAFKA-15037
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen
Assignee: Luke Chen


UnifiedLog relied on the `remoteStorageSystemEnable` to identify if the broker 
is enabling remote storage, but we never pass this value from the config into 
UnifiedLog. So it'll always be false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] jlprat commented on a diff in pull request #516: MINOR: Add blog for 3.5.0 release

2023-05-30 Thread via GitHub


jlprat commented on code in PR #516:
URL: https://github.com/apache/kafka-site/pull/516#discussion_r1210019797


##
blog.html:
##
@@ -0,0 +1,69 @@
+
+
+
+
+
+
+
+
+Blog
+
+
+
+Apache 
Kafka 3.5.0 Release Announcement
+
+TODO May 2023 - Mickael Maison (https://twitter.com/MickaelMaison;>@MickaelMaison)
+We are proud to announce the release of Apache Kafka 3.5.0. 
This release contains many new features and improvements. This blog post will 
highlight some of the more prominent features. For a full list of changes, be 
sure to check the https://downloads.apache.org/kafka/3.5.0/RELEASE_NOTES.html;>release 
notes.
+The ability to migrate Kafka clusters from ZK to KRaft mode 
with no downtime is still an early access feature. It is currently only 
suitable for testing in non production environments. See https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration;>KIP-866
 for more details.
+Kafka Broker, Controller, Producer, Consumer and Admin 
Client
+
+KIP-881: Rack-aware Partition Assignment for Kafka 
Consumers: Kafka 3.4.0 only contained the protocol changes for https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers;>KIP-881.
 The built-in assignors have now been updated to support rack-awareness.
+KIP-887: Add ConfigProvider to make use of environment 
variables: https://cwiki.apache.org/confluence/display/KAFKA/KIP-887%3A+Add+ConfigProvider+to+make+use+of+environment+variables;>KIP-887
 introduces a new http://localhost/35/javadoc/org/apache/kafka/common/config/provider/ConfigProvider.html;>ConfigProvider
 implementation, EnvVarConfigProvider, to retrieve configurations 
from environment variables.

Review Comment:
   The Javadoc link needs to be updated the proper kafka.apache.org domain 
(currently it points to localhost)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



About My build In Package using KafkaJS

2023-05-30 Thread Md. Muhtasim Fuad Fahim
Greetings,

I have built a NPM package named "*kafka-pub-sub*" using the KafkaJS
library with the support of stream processing that enables applications to
publish, consume and process high volumes of record streams in a fast and
durable way. Anyone will be able to use the Kafka Pub Sub calling two
simple classes. For more details please see here:
https://www.npmjs.com/package/kafka-pub-sub and GitHub repo:
https://github.com/mdmuhtasimfuadfahim/kafka-pub-sub.

I would like to request the community to guide me how I can improve this
package or any contribution would be appreciated.

Thank you in advance.

-- 
*Md. Muhtasim Fuad Fahim*
*Backend Developer* | *XpeedStudio *

*Linkedin * | *Facebook*
 | Twitter
 | *GitHub*

*a*: Road 2, House 8, Block D/2 | Mirpur-2, Dhaka-1216, Bangladesh
*e*: muhtasim.fahim...@gmail.com | *m*: +880 1744 866688, +880 1637 803454


Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2023-05-30 Thread Mickael Maison
Hi Jorge,

There are a few issues with the current proposal. Once 3.5 is out, I
plan to start looking at this again.

Thanks,
Mickael

On Mon, May 15, 2023 at 3:19 PM Jorge Esteban Quilcate Otoya
 wrote:
>
> Hi Mickael,
>
> Just to check the status of this KIP as it looks very useful. I can see how
> new Tiered Storage interfaces and plugins may benefit from this.
>
> Cheers,
> Jorge.
>
> On Mon, 6 Feb 2023 at 23:00, Chris Egerton  wrote:
>
> > Hi Mickael,
> >
> > I agree that adding a getter method for Monitorable isn't great. A few
> > alternatives come to mind:
> >
> > 1. Introduce a new ConfiguredInstance (name subject to change) interface
> > that wraps an instance of type T, but also contains a getter method for any
> > PluginMetrics instances that the plugin was instantiated with (which may
> > return null either if no PluginMetrics instance could be created for the
> > plugin, or if it did not implement the Monitorable interface). This can be
> > the return type of the new AbstractConfig::getConfiguredInstance variants.
> > It would give us room to move forward with other plugin-for-your-plugin
> > style interfaces without cluttering things up with getter methods. We could
> > even add a close method to this interface which would handle cleanup of all
> > extra resources allocated for the plugin by the runtime, and even possibly
> > the plugin itself.
> >
> > 2. Break out the instantiation logic into two separate steps. The first
> > step, creating a PluginMetrics instance, can be either private or public
> > API. The second step, providing that PluginMetrics instance to a
> > newly-created object, can be achieved with a small tweak of the proposed
> > new methods for the AbstractConfig class; instead of accepting a Metrics
> > instance, they would now accept a PluginMetrics instance. For the first
> > step, we might even introduce a new CloseablePluginMetrics interface which
> > would be the return type of whatever method we use to create the
> > PluginMetrics instance. We can track that CloseablePluginMetrics instance
> > in tandem with the plugin it applies to, and close it when we're done with
> > the plugin.
> >
> > I know that this adds some complexity to the API design and some
> > bookkeeping responsibilities for our implementation, but I can't shake the
> > feeling that if we don't feel comfortable taking on the responsibility to
> > clean up these resources ourselves, it's not really fair to ask users to
> > handle it for us instead. And with the case of Connect, sometimes Connector
> > or Task instances that are scheduled for shutdown block for a while, at
> > which point we abandon them and bring up new instances in their place; it'd
> > be nice to have a way to forcibly clear out all the metrics allocated by
> > that Connector or Task instance before bringing up a new one, in order to
> > prevent issues due to naming conflicts.
> >
> > Regardless, and whether or not it ends up being relevant to this KIP, I'd
> > love to see a new Converter::close method. It's irked me for quite a while
> > that we don't have one already.
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Feb 6, 2023 at 1:50 PM Mickael Maison 
> > wrote:
> >
> > > Hi Chris,
> > >
> > > I envisioned plugins to be responsible for closing the PluginMetrics
> > > instance. This is mostly important for Connect connector plugins as
> > > they can be closed while the runtime keeps running (and keeps its
> > > Metrics instance). As far as I can tell, other plugins should only be
> > > closed when their runtime closes, so we should not be leaking metrics
> > > even if those don't explicitly call close().
> > >
> > > For Connect plugin, as you said, it would be nice to automatically
> > > close their associated PluginMetrics rather than relying on user
> > > logic. The issue is that with the current API there's no way to
> > > retrieve the PluginMetrics instance once it's passed to the plugin.
> > > I'm not super keen on having a getter method on the Monitorable
> > > interface and tracking PluginMetrics instances associated with each
> > > plugin would require a lot of changes. I just noticed Converter does
> > > not have a close() method so it's problematic for that type of plugin.
> > > The other Connect plugins all have close() or stop() methods. I wonder
> > > if the simplest is to make Converter extend Closeable. WDYT?
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Mon, Feb 6, 2023 at 6:39 PM Mickael Maison 
> > > wrote:
> > > >
> > > > Hi Yash,
> > > >
> > > > I added a sentence to the sensor() method mentioning the sensor name
> > > > must only be unique per plugin. Regarding having getters for sensors
> > > > and metrics I considered this not strictly necessary as I expect the
> > > > metrics logic in most plugins to be relatively simple. If you have a
> > > > use case that would benefit from these methods, let me know I will
> > > > reconsider.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > >
> > > > On Fri, Feb 3, 

Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-05-30 Thread Mickael Maison
Hi,

Traditionally we used to publish an announcement on the Apache blog
for major and minor releases. Unfortunately the Apache blog is going
away so we can't use it anymore.
There has been a few discussions on updating our website to have a
proper blog (and use a markup language instead of raw HTML) but we've
not made much progress yet.

So for 3.5.0, I propose to hard code a blog page with the announcement
post. I've opened a PR: https://github.com/apache/kafka-site/pull/516
Please take a look and let me know what you think.

Thanks,
Mickael

On Mon, May 22, 2023 at 3:56 PM Ismael Juma  wrote:
>
> Makes sense Mickael. Thanks!
>
> Ismael
>
> On Mon, May 22, 2023 at 4:56 AM Mickael Maison 
> wrote:
>
> > Hi Ismael,
> >
> > Yes we're getting a bit behind schedule. KAFKA-14980 was a significant
> > regression that would have broken a lot of MirrorMaker deployments.
> > Chris and I were at Kafka Summit last week, so it took a bit longer to
> > fix.
> >
> > The remaining issues are all about migration and at this point I'm not
> > sure we'll fix all of them in this release. We can keep migration in
> > early access in 3.5.
> >
> > I'll start working on RC0 today. If we decide to include some of the
> > migration fixes, I'll make another RC.
> >
> > Thanks,
> > Mickael
> >
> >
> >
> > On Sat, May 20, 2023 at 5:53 PM Ismael Juma  wrote:
> > >
> > > Hi Mickael,
> > >
> > > What's the plan for the first RC? In my opinion, we should really tighten
> > > the criteria for what's considered a blocker and release RC0 soon (given
> > > where we are timeline wise).
> > >
> > > Ismael
> > >
> > > On Sat, May 20, 2023, 8:15 AM Mickael Maison 
> > > wrote:
> > >
> > > > Hi Matthias,
> > > >
> > > > Yes feel free to backport to 3.5.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > > On Sat, May 20, 2023 at 12:53 AM Akhilesh Chaganti
> > > >  wrote:
> > > > >
> > > > > Hi Mickael,
> > > > >
> > > > > I raised the blockers for AK 3.5 and raised PRs for them. All of
> > them are
> > > > > in the Zk -> KRaft migration path and critical for the Zk -> KRaft
> > > > > migration to be successful.
> > > > > KAFKA-15003  --
> > Topic
> > > > > metadata is not correctly synced with Zookeeper while handling
> > snapshot
> > > > > during migration (dual write). --
> > > > > KAFKA-15004  --
> > > > Config
> > > > > changes are not correctly synced with Zookeeper whale handling
> > snapshot
> > > > > during migration (dual write). --
> > > > > KAFKA-15007  --
> > > > > MigrationPropagator may have wrong IBP while sending UMR, LISR
> > requests
> > > > to
> > > > > the Zk Brokers during migration. --
> > > > >
> > > > >
> > > > > Thanks
> > > > > Akhilesh
> > > > >
> > > > >
> > > > > On Fri, May 19, 2023 at 9:28 AM Matthias J. Sax 
> > > > wrote:
> > > > >
> > > > > > Mickael,
> > > > > >
> > > > > > we included a bug-fix into 3.5, and just discovered a critical bug
> > in
> > > > > > the fix itself, that would introduce a regression into 3.5.
> > > > > >
> > > > > > We have already a PR to fix-forward:
> > > > > > https://github.com/apache/kafka/pull/13734
> > > > > >
> > > > > > As we don't have an RC yet, I would like to cherry-pick this back
> > to
> > > > 3.5.
> > > > > >
> > > > > >
> > > > > > -Matthias
> > > > > >
> > > > > > On 5/10/23 1:47 PM, Sophie Blee-Goldman wrote:
> > > > > > > Thanks Mickael -- the fix has been merged to 3.5 now
> > > > > > >
> > > > > > > On Wed, May 10, 2023 at 1:12 AM Mickael Maison <
> > > > mickael.mai...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Sophie,
> > > > > > >>
> > > > > > >> Yes that's fine, thanks for letting me know!
> > > > > > >>
> > > > > > >> Mickael
> > > > > > >>
> > > > > > >> On Tue, May 9, 2023 at 10:54 PM Sophie Blee-Goldman
> > > > > > >>  wrote:
> > > > > > >>>
> > > > > > >>> Hey Mickael, I noticed a bug in the new versioned key-value
> > byte
> > > > store
> > > > > > >>> where it's delegating to the wrong API
> > > > > > >>> (copy/paste error I assume). I extracted this into its own PR
> > > > which I
> > > > > > >> think
> > > > > > >>> should be included in the 3.5 release.
> > > > > > >>>
> > > > > > >>> The tests are still running, but it's just a one-liner so I'll
> > > > merge it
> > > > > > >>> when they're done, and cherrypick to 3.5 if
> > > > > > >>> that's ok with you. See
> > https://github.com/apache/kafka/pull/13695
> > > > > > >>>
> > > > > > >>> Thanks for running the release!
> > > > > > >>>
> > > > > > >>> On Tue, May 9, 2023 at 1:28 PM Randall Hauch  > >
> > > > wrote:
> > > > > > >>>
> > > > > >  Thanks, Mickael.
> > > > > > 
> > > > > >  I've cherry-picked that commit to the `3.5` branch (
> > > > > >  https://issues.apache.org/jira/browse/KAFKA-14974).
> > > > > > 
> > > > > >  Best regards,
> > > > > >  Randall
> > > > > > 
> > > > 

Re: [VOTE] 3.5.0 RC0

2023-05-30 Thread Mickael Maison
Hi David,

Feel free to backport the necessary fixes to 3.5.

Thanks,
Mickael

On Tue, May 30, 2023 at 10:32 AM Mickael Maison
 wrote:
>
> Hi Greg,
>
> Thanks for the heads up, this indeed looks like something we want in
> 3.5. I've replied in the PR.
>
> Mickael
>
> On Sat, May 27, 2023 at 11:44 PM David Arthur
>  wrote:
> >
> > Mickael, after looking more closely, I definitely think KAFKA-15010 is a
> > blocker. It creates the case where the controller can totally miss a
> > metadata update and not write it back to ZK. Since things like dynamic
> > configs and ACLs are only read from ZK by the ZK brokers, we could have
> > significant problems while the brokers are being migrated (when some are
> > KRaft and some are ZK). E.g., ZK brokers could be totally unaware of an ACL
> > change while the KRaft brokers have it. I have a fix ready here
> > https://github.com/apache/kafka/pull/13758. I think we can get it committed
> > soon.
> >
> > Another blocker is KAFKA-15004 which was just merged to trunk. This is
> > another dual-write bug where new topic/broker configs will not be written
> > back to ZK by the controller.
> >
> > The fix for KAFKA-15010 has a few dependencies on fixes we made this past
> > week, so we'll need to cherry-pick a few commits. The changes are totally
> > contained within the migration area of code, so I think the risk in
> > including them is fairly low.
> >
> > -David
> >
> > On Thu, May 25, 2023 at 2:15 PM Greg Harris 
> > wrote:
> >
> > > Hey all,
> > >
> > > A contributor just pointed out a small but noticeable flaw in the
> > > implementation of KIP-581
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-581%3A+Value+of+optional+null+field+which+has+default+value
> > > which is planned for this release.
> > > Impact: the feature works for root values in a record, but does not
> > > work for any fields within structs. Fields within structs will
> > > continue to have their previous, backwards-compatible behavior.
> > > The contributor has submitted a bug-fix PR which reports the problem
> > > and does not yet have a merge-able solution, but they are actively
> > > responding and interested in having this fixed:
> > > https://github.com/apache/kafka/pull/13748
> > > The overall fix should be a one-liner + some unit tests. While this is
> > > not a regression, it does make the feature largely useless, as the
> > > majority of use-cases will be for struct fields.
> > >
> > > Thanks!
> > > Greg Harris
> > >
> > > On Wed, May 24, 2023 at 7:05 PM Ismael Juma  wrote:
> > > >
> > > > I agree the migration should be functional - it wasn't obvious if the
> > > > migration issues are edge cases or not. If they are edge cases, I think
> > > > 3.5.1 would be fine given the preview status.
> > > >
> > > > I understand that a new RC is needed, but that doesn't mean we should 
> > > > let
> > > > everything in. Each change carries some risk. And if we don't agree on
> > > the
> > > > bar for the migration work, we may be having the same discussion next
> > > week.
> > > > :)
> > > >
> > > > Ismael
> > > >
> > > > On Wed, May 24, 2023, 12:00 PM Josep Prat 
> > > > wrote:
> > > >
> > > > > Hi there,
> > > > > Is the plan described in KIP-833[1] still valid? In there it states
> > > that
> > > > > 3.5.0 should aim at deprecation of Zookeeper, so conceptually, the
> > > path to
> > > > > migrate to Kraft should be somewhat functional (in my opinion). If we
> > > don't
> > > > > want to deprecate Zookeeper in 3.5.0, then I share Ismael's opinion
> > > that
> > > > > these could be fixed in subsequent patches of 3.5.x. Just my 5cts.
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-833:+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.5
> > > > > Best,
> > > > >
> > > > > On Wed, May 24, 2023 at 8:51 PM Ismael Juma  wrote:
> > > > >
> > > > > > Are all these blockers? For example, zk to kraft migration are is
> > > still
> > > > > in
> > > > > > preview - can we fix some of these in 3.5.1?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, May 24, 2023, 10:22 AM Colin McCabe 
> > > wrote:
> > > > > >
> > > > > > > Hi Mickael,
> > > > > > >
> > > > > > > Thanks for putting together this RC. Unfortunately, we've
> > > identified
> > > > > > > several blocker issues in this release candidate.
> > > > > > >
> > > > > > > KAFKA-15009: New ACLs are not written to ZK during migration
> > > > > > > KAFKA-15007: MV is not set correctly in the MetadataPropagator in
> > > > > > > migration.
> > > > > > > KAFKA-15004: Topic config changes are not synced during zk to 
> > > > > > > kraft
> > > > > > > migration (dual-write)
> > > > > > > KAFKA-15003: TopicIdReplicaAssignment is not updated in migration
> > > > > > > (dual-write) when partitions are changed for topic
> > > > > > > KAFKA-14996: The KRaft controller should properly handle overly
> > > large
> > > > > > user
> > > > > > > operations
> > > > > > >

Re: [VOTE] 3.5.0 RC0

2023-05-30 Thread Mickael Maison
Hi Greg,

Thanks for the heads up, this indeed looks like something we want in
3.5. I've replied in the PR.

Mickael

On Sat, May 27, 2023 at 11:44 PM David Arthur
 wrote:
>
> Mickael, after looking more closely, I definitely think KAFKA-15010 is a
> blocker. It creates the case where the controller can totally miss a
> metadata update and not write it back to ZK. Since things like dynamic
> configs and ACLs are only read from ZK by the ZK brokers, we could have
> significant problems while the brokers are being migrated (when some are
> KRaft and some are ZK). E.g., ZK brokers could be totally unaware of an ACL
> change while the KRaft brokers have it. I have a fix ready here
> https://github.com/apache/kafka/pull/13758. I think we can get it committed
> soon.
>
> Another blocker is KAFKA-15004 which was just merged to trunk. This is
> another dual-write bug where new topic/broker configs will not be written
> back to ZK by the controller.
>
> The fix for KAFKA-15010 has a few dependencies on fixes we made this past
> week, so we'll need to cherry-pick a few commits. The changes are totally
> contained within the migration area of code, so I think the risk in
> including them is fairly low.
>
> -David
>
> On Thu, May 25, 2023 at 2:15 PM Greg Harris 
> wrote:
>
> > Hey all,
> >
> > A contributor just pointed out a small but noticeable flaw in the
> > implementation of KIP-581
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-581%3A+Value+of+optional+null+field+which+has+default+value
> > which is planned for this release.
> > Impact: the feature works for root values in a record, but does not
> > work for any fields within structs. Fields within structs will
> > continue to have their previous, backwards-compatible behavior.
> > The contributor has submitted a bug-fix PR which reports the problem
> > and does not yet have a merge-able solution, but they are actively
> > responding and interested in having this fixed:
> > https://github.com/apache/kafka/pull/13748
> > The overall fix should be a one-liner + some unit tests. While this is
> > not a regression, it does make the feature largely useless, as the
> > majority of use-cases will be for struct fields.
> >
> > Thanks!
> > Greg Harris
> >
> > On Wed, May 24, 2023 at 7:05 PM Ismael Juma  wrote:
> > >
> > > I agree the migration should be functional - it wasn't obvious if the
> > > migration issues are edge cases or not. If they are edge cases, I think
> > > 3.5.1 would be fine given the preview status.
> > >
> > > I understand that a new RC is needed, but that doesn't mean we should let
> > > everything in. Each change carries some risk. And if we don't agree on
> > the
> > > bar for the migration work, we may be having the same discussion next
> > week.
> > > :)
> > >
> > > Ismael
> > >
> > > On Wed, May 24, 2023, 12:00 PM Josep Prat 
> > > wrote:
> > >
> > > > Hi there,
> > > > Is the plan described in KIP-833[1] still valid? In there it states
> > that
> > > > 3.5.0 should aim at deprecation of Zookeeper, so conceptually, the
> > path to
> > > > migrate to Kraft should be somewhat functional (in my opinion). If we
> > don't
> > > > want to deprecate Zookeeper in 3.5.0, then I share Ismael's opinion
> > that
> > > > these could be fixed in subsequent patches of 3.5.x. Just my 5cts.
> > > >
> > > > [1]:
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-833:+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.5
> > > > Best,
> > > >
> > > > On Wed, May 24, 2023 at 8:51 PM Ismael Juma  wrote:
> > > >
> > > > > Are all these blockers? For example, zk to kraft migration are is
> > still
> > > > in
> > > > > preview - can we fix some of these in 3.5.1?
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, May 24, 2023, 10:22 AM Colin McCabe 
> > wrote:
> > > > >
> > > > > > Hi Mickael,
> > > > > >
> > > > > > Thanks for putting together this RC. Unfortunately, we've
> > identified
> > > > > > several blocker issues in this release candidate.
> > > > > >
> > > > > > KAFKA-15009: New ACLs are not written to ZK during migration
> > > > > > KAFKA-15007: MV is not set correctly in the MetadataPropagator in
> > > > > > migration.
> > > > > > KAFKA-15004: Topic config changes are not synced during zk to kraft
> > > > > > migration (dual-write)
> > > > > > KAFKA-15003: TopicIdReplicaAssignment is not updated in migration
> > > > > > (dual-write) when partitions are changed for topic
> > > > > > KAFKA-14996: The KRaft controller should properly handle overly
> > large
> > > > > user
> > > > > > operations
> > > > > >
> > > > > > We are working on PRs for these issues and will get them in soon,
> > we
> > > > > think!
> > > > > >
> > > > > > So unfortunately I have to leave a -1 here for RC0. Let's aim for
> > > > another
> > > > > > RC next week.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > On Wed, May 24, 2023, at 07:05, Mickael Maison wrote:
> > > > > > > Hi David,
> > > > > > >