Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2903

2024-05-14 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16671) Revisit SessionedProtocolIntegrationTest.ensureInternalEndpointIsSecured

2024-05-14 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16671.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Revisit SessionedProtocolIntegrationTest.ensureInternalEndpointIsSecured
> 
>
> Key: KAFKA-16671
> URL: https://issues.apache.org/jira/browse/KAFKA-16671
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
> Fix For: 3.8.0
>
>
> loop 1000times on my local, and all pass. Let's enable the test to see what 
> happens in our CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-05-14 Thread Sophie Blee-Goldman
Hey all,

I have a few updates to mention that have come up during the implementation
phase. The KIP should reflect the latest proposal and what has been merged
so far, but I'll list everything again here so that you don't have to sift
through the entire KIP:

U1: The KafkaStreamsAssignment interface was converted to a class, and two
public constructors were added in keeping with the fluent API used
elsewhere in Streams:


> public  class KafkaStreamsAssignment {

public static KafkaStreamsAssignment of(final ProcessId processId,
> final Set assignment);
>
> public KafkaStreamsAssignment withFollowupRebalance(final Instant
> rebalanceDeadline);

 }


U2: Any lag-related APIs in the KafkaStreamsState interface will throw an
UnsupportedOperationException if the user opted out of computing the task
lags when getting the KafkaStreamsState

U3: While refactoring the RackAwareTaskAssignor, we realized the current
proposal was missing the requisite rack id information. We will need to add
both the per-client rackId to the KafkaStreamsState, as well as the
per-task rack ids of all replicas hosting the topic partitions for that
task. The former is straightforward and leads to this new method on the
KafkaStreamsState interface:

interface KafkaStreamsAssignment {

Optional rackId();

 }


For the latter issue, we need to add the per-partition rack ids to the
ApplicationState interface, but the exact API is a bit less straightforward
since we don't currently have any concept of a partition in the proposed
API, instead dealing only with tasks.

option 1: https://github.com/apache/kafka/pull/15960
The easiest way to address this would be to add a single method to the
ApplicationState interface returning a map from TaskId to the rack ids for
its partitions. To avoid a nasty multi-layered nested data structure, we'd
also introduce a simple container for the partition to rack ids map, with
separate maps for input topics vs changelogs (since the
RackAwareTaskAssignor needs the ability to differentiate these, and so
would the new rack-aware assignment utility methods). See the short example
PR linked to above for the complete API being proposed in this option.

option 2: https://github.com/apache/kafka/pull/15959
While it is clear that tasks are the right level of abstraction for the
TaskAssignor on the whole, it could be argued that the topic partition
information might be valuable to a more sophisticated assignor. So another
option would be to go all-in and create a new metadata class for each task
that exposes essential and useful information: eg the set of input
partitions and changelog partitions belonging to each task and the mapping
of partition to rackIds, and perhaps also whether it is stateful and the
names of any state stores for that TaskId. This would also allow us to
simplify the ApplicationState interface to return just a single set of
tasks with all metadata encapsulated in the task, rather than having to
offer a separate API for stateful vs stateless tasks to differentiate the
two. See the example PR for the full proposal and changes to the existing
API

I personally am slightly in favor of option #2 (pull/15959
) as I believe including
general task metadata may be useful and this API would be easy to evolve if
we wanted to add anything else in a future KIP. The current KIP was updated
using this option, although nothing related to the rack ids has been merged
yet. We're happy to defer to anyone with a strong preference for either of
these options, or a new suggestion of their own.

As always, let us know if you have any questions or concerns or feedback of
any kind.

Thanks!


On Mon, May 6, 2024 at 1:33 PM Sophie Blee-Goldman 
wrote:

> Thanks guys. Updated the error codes in both the code and the explanation
> under "Public Changes". To sum up, here are the error codes listed in the
> KIP:
>
> enum AssignmentError {
> NONE,
> ACTIVE_TASK_ASSIGNED_MULTIPLE_TIMES,
> ACTIVE_AND_STANDBY_TASK_ASSIGNED_TO_SAME_KAFKASTREAMS,
> INVALID_STANDBY_TASK,
> UNKNOWN_PROCESS_ID,
> UNKNOWN_TASK_ID
> }
>
> Anything missing?
>
> (also updated all the code block headings, thanks for noticing that Bruno)
>
> On Fri, May 3, 2024 at 9:33 AM Matthias J. Sax  wrote:
>
>> 117f: Good point by Bruno. We should check for this, and could have an
>> additional `INVALID_STANDBY_TASK` error code?
>>
>>
>> -Matthias
>>
>> On 5/3/24 5:52 AM, Guozhang Wang wrote:
>> > Hi Sophie,
>> >
>> > Re: As for the return type of the TaskAssignmentUtils, I think that
>> > makes sense. LGTM.
>> >
>> > On Fri, May 3, 2024 at 2:26 AM Bruno Cadonna 
>> wrote:
>> >>
>> >> Hi Sophie,
>> >>
>> >> 117f:
>> >> I think, removing the STATEFUL and STATELESS types is not enough to
>> >> avoid the error Guozhang mentioned. The StreamsPartitionAssignor passes
>> >> the information whether a task is stateless or stateful into the task
>> >> assignor. However, the task assignor can return 

Re: [VOTE] KIP-989: RocksDB Iterator Metrics

2024-05-14 Thread Sophie Blee-Goldman
+1 (binding)

Thanks!

On Tue, May 14, 2024 at 6:58 PM Matthias J. Sax  wrote:

> +1 (binding)
>
> On 5/14/24 9:19 AM, Lucas Brutschy wrote:
> > Hi Nick!
> >
> > Thanks for the KIP.
> >
> > +1 (binding)
> >
> > On Tue, May 14, 2024 at 5:16 PM Nick Telford 
> wrote:
> >>
> >> Hi everyone,
> >>
> >> I'd like to call a vote on the Kafka Streams KIP-989: RocksDB Iterator
> >> Metrics:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics
> >>
> >> All of the points in the discussion thread have now been addressed.
> >>
> >> Regards,
> >>
> >> Nick
>


Re: [VOTE] KIP-989: RocksDB Iterator Metrics

2024-05-14 Thread Matthias J. Sax

+1 (binding)

On 5/14/24 9:19 AM, Lucas Brutschy wrote:

Hi Nick!

Thanks for the KIP.

+1 (binding)

On Tue, May 14, 2024 at 5:16 PM Nick Telford  wrote:


Hi everyone,

I'd like to call a vote on the Kafka Streams KIP-989: RocksDB Iterator
Metrics:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics

All of the points in the discussion thread have now been addressed.

Regards,

Nick


[jira] [Created] (KAFKA-16768) SocketServer leaks accepted SocketChannel instances due to race condition

2024-05-14 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16768:
---

 Summary: SocketServer leaks accepted SocketChannel instances due 
to race condition
 Key: KAFKA-16768
 URL: https://issues.apache.org/jira/browse/KAFKA-16768
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.8.0
Reporter: Greg Harris


The SocketServer has threads for Acceptors and Processors. These threads 
communicate via Processor#accept/Processor#configureNewConnections and the 
`newConnections` queue.

During shutdown, the Acceptor and Processors are each stopped by setting 
shouldRun to false, and then shutdown proceeds asynchronously in all instances 
together. This leads to a race condition where an Acceptor accepts a 
SocketChannel and queues it to a Processor, but that Processor instance has 
already started shutting down and has already drained the newConnections queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15170) CooperativeStickyAssignor cannot adjust assignment correctly

2024-05-14 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-15170.

Resolution: Fixed

> CooperativeStickyAssignor cannot adjust assignment correctly
> 
>
> Key: KAFKA-15170
> URL: https://issues.apache.org/jira/browse/KAFKA-15170
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 3.5.0
>Reporter: li xiangyuan
>Assignee: li xiangyuan
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> AbstractStickyAssignor use ConstrainedAssignmentBuilder to build assignment 
> when all consumers in group subscribe the same topic list, but it couldn't 
> add all partitions move owner to another consumer to 
> ``partitionsWithMultiplePreviousOwners``.
>  
> the reason is in function assignOwnedPartitions hasn't add partitions that 
> rack-mismatch with prev owner to allRevokedPartitions, then partition only in 
> this list would add to partitionsWithMultiplePreviousOwners.
>  
> In Cooperative Rebalance, partitions have changed owner must be removed from 
> final assignment or will lead to incorrect consume behavior, I have already 
> raise a pr, please take a look, thx
>  
> [https://github.com/apache/kafka/pull/13965]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16767) KRaft should track HWM outside the log layer

2024-05-14 Thread Jira
José Armando García Sancio created KAFKA-16767:
--

 Summary: KRaft should track HWM outside the log layer
 Key: KAFKA-16767
 URL: https://issues.apache.org/jira/browse/KAFKA-16767
 Project: Kafka
  Issue Type: Improvement
  Components: kraft
Reporter: José Armando García Sancio


The current implementation of KRaft tracks the HWM using the log layer 
implementation. The log layer has an invariant where the HWM <= LEO. This mean 
that the log layer always sets the HWM to the minimum of HWM and LEO.

This has the side-effect of the local KRaft reporting a HWM that is much 
smaller than the leader's HWM when the replica start with an empty log. E.g. a 
new broker or the kafka-metadata-shell.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-989: RocksDB Iterator Metrics

2024-05-14 Thread Nick Telford
Woops! Thanks for the catch Lucas. Given this was just a typo, I don't
think this affects the voting.

Cheers,
Nick

On Tue, 14 May 2024 at 18:06, Lucas Brutschy 
wrote:

> Hi Nick,
>
> you are still referring to oldest-open-iterator-age-ms in the
> `Proposed Changes` section.
>
> Cheers,
> Lucas
>
> On Thu, May 2, 2024 at 4:00 PM Lucas Brutschy 
> wrote:
> >
> > Hi Nick!
> >
> > I agree, the age variant is a bit nicer since the semantics are very
> > clear from the name. If you'd rather go for the simple implementation,
> > how about calling it `oldest-iterator-open-since-ms`? I believe this
> > could be understood without docs. Either way, I think we should be
> > able to open the vote for this KIP because nobody raised any major /
> > blocking concerns.
> >
> > Looking forward to getting this voted on soon!
> >
> > Cheers
> > Lucas
> >
> > On Sun, Mar 31, 2024 at 5:23 PM Nick Telford 
> wrote:
> > >
> > > Hi Matthias,
> > >
> > > > For the oldest iterator metric, I would propose something simple like
> > > > `iterator-opened-ms` and it would just be the actual timestamp when
> the
> > > > iterator was opened. I don't think we need to compute the actual age,
> > > > but user can to this computation themselves?
> > >
> > > That works for me; it's easier to implement like that :-D I'm a little
> > > concerned that the name "iterator-opened-ms" may not be obvious enough
> > > without reading the docs.
> > >
> > > > If we think reporting the age instead of just the timestamp is
> better, I
> > > > would propose `iterator-max-age-ms`. I should be sufficient to call
> out
> > > > (as it's kinda "obvious" anyway) that the metric applies to open
> > > > iterator only.
> > >
> > > While I think it's preferable to record the timestamp, rather than the
> age,
> > > this does have the benefit of a more obvious metric name.
> > >
> > > > Nit: the KIP says it's a store-level metric, but I think it would be
> > > > good to say explicitly that it's recorded with DEBUG level only?
> > >
> > > Yes, I've already updated the KIP with this information in the table.
> > >
> > > Regards,
> > >
> > > Nick
> > >
> > > On Sun, 31 Mar 2024 at 10:53, Matthias J. Sax 
> wrote:
> > >
> > > > The time window thing was just an idea. Happy to drop it.
> > > >
> > > > For the oldest iterator metric, I would propose something simple like
> > > > `iterator-opened-ms` and it would just be the actual timestamp when
> the
> > > > iterator was opened. I don't think we need to compute the actual age,
> > > > but user can to this computation themselves?
> > > >
> > > > If we think reporting the age instead of just the timestamp is
> better, I
> > > > would propose `iterator-max-age-ms`. I should be sufficient to call
> out
> > > > (as it's kinda "obvious" anyway) that the metric applies to open
> > > > iterator only.
> > > >
> > > > And yes, I was hoping that the code inside MetereXxxStore might
> already
> > > > be setup in a way that custom stores would inherit the iterator
> metrics
> > > > automatically -- I am just not sure, and left it as an exercise for
> > > > somebody to confirm :)
> > > >
> > > >
> > > > Nit: the KIP says it's a store-level metric, but I think it would be
> > > > good to say explicitly that it's recorded with DEBUG level only?
> > > >
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > > On 3/28/24 2:52 PM, Nick Telford wrote:
> > > > > Quick addendum:
> > > > >
> > > > > My suggested metric "oldest-open-iterator-age-seconds" should be
> > > > > "oldest-open-iterator-age-ms". Milliseconds is obviously a better
> > > > > granularity for such a metric.
> > > > >
> > > > > Still accepting suggestions for a better name.
> > > > >
> > > > > On Thu, 28 Mar 2024 at 13:41, Nick Telford  >
> > > > wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> Sorry for leaving this for so long. So much for "3 weeks until KIP
> > > > freeze"!
> > > > >>
> > > > >> On Sophie's comments:
> > > > >> 1. Would Matthias's suggestion of a separate metric tracking the
> age of
> > > > >> the oldest open iterator (within the tag set) satisfy this? That
> way we
> > > > can
> > > > >> keep iterator-duration-(avg|max) for closed iterators, which can
> be
> > > > useful
> > > > >> for performance debugging for iterators that don't leak. I'm not
> sure
> > > > what
> > > > >> we'd call this metric, maybe: "oldest-open-iterator-age-seconds"?
> Seems
> > > > >> like a mouthful.
> > > > >>
> > > > >> 2. You're right, it makes more sense to provide
> > > > >> iterator-duration-(avg|max). Honestly, I can't remember why I had
> > > > "total"
> > > > >> before, or why I was computing a rate-of-change over it.
> > > > >>
> > > > >> 3, 4, 5, 6. Agreed, I'll make all those changes as suggested.
> > > > >>
> > > > >> 7. Combined with Matthias's point about RocksDB, I'm convinced
> that this
> > > > >> is the wrong KIP for these. I'll introduce the additional Rocks
> metrics
> > > > in
> > > > >> another KIP.
> > > > >>
> > > > >> On Matthias's 

[jira] [Created] (KAFKA-16766) New consumer offsetsForTimes timeout exception does not have the proper message

2024-05-14 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-16766:
--

 Summary: New consumer offsetsForTimes timeout exception does not 
have the proper message
 Key: KAFKA-16766
 URL: https://issues.apache.org/jira/browse/KAFKA-16766
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 3.7.0
Reporter: Lianet Magrans
 Fix For: 3.8.0


If a call to consumer.offsetsForTimes times out, the new AsyncKafkaConsumer 
will throw a org.apache.kafka.common.errors.TimeoutException as expected, but 
with the following as message: "java.util.concurrent.TimeoutException". 

We should provide a clearer message, and I would even say we keep the same 
message that the LegacyConsumer shows in this case, ex: "Failed to get offsets 
by times in 6ms".

To fix this we should consider catching the timeout exception in the consumer 
when offsetsForTimes result times out 
([here|https://github.com/apache/kafka/blob/0e023e1f736ea03568032cc282803df8e61eb451/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1115]),
 and propagate it with the message specific to offsetsForTimes.

After the fix, we should be able to write a test like the 
[testOffsetsForTimesTimeout|https://github.com/apache/kafka/blob/0e023e1f736ea03568032cc282803df8e61eb451/clients/src/test/java/org/apache/kafka/clients/consumer/KafkaConsumerTest.java#L3246]
 that exist for the legacy consumer. Note that we would need a different test 
given that the legacy consumer does not issue a FindCoordinator request in this 
case but the AsyncConsumer does, so the test would have to account for that 
when matching requests/responses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-05-14 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16765:
---

 Summary: NioEchoServer leaks accepted SocketChannel instances due 
to race condition
 Key: KAFKA-16765
 URL: https://issues.apache.org/jira/browse/KAFKA-16765
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 3.8.0
Reporter: Greg Harris


The NioEchoServer has an AcceptorThread that calls accept() to open new 
SocketChannel instances and insert them into the `newChannels` List, and a main 
thread that drains the `newChannels` List and moves them to the 
`socketChannels` List.

During shutdown, the serverSocketChannel is closed, which causes both threads 
to exit their while loops. It is possible for the NioEchoServer main thread to 
sense the serverSocketChannel close and terminate before the Acceptor thread 
does, and for the Acceptor thread to put a SocketChannel in `newChannels` 
before terminating. This instance is never closed by either thread, because it 
is never moved to `socketChannels`.

A precise execution order that has this leak is:
1. NioEchoServer thread locks `newChannels`.
2. Acceptor thread accept() completes, and the SocketChannel is created
3. Acceptor thread blocks waiting for the `newChannels` lock
4. NioEchoServer thread releases the `newChannels` lock and does some processing
5. NioEchoServer#close() is called, which closes the serverSocketChannel
6. NioEchoServer thread checks serverSocketChannel.isOpen() and then terminates
7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
to `newChannels`.
8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
9. NioEchoServer#close() stops blocking now that both other threads have 
terminated.

The end result is that the leaked socket is left open in the `newChannels` list 
at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16764) New consumer should throw InvalidTopicException on poll when invalid topic in metadata

2024-05-14 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-16764:
--

 Summary: New consumer should throw InvalidTopicException on poll 
when invalid topic in metadata
 Key: KAFKA-16764
 URL: https://issues.apache.org/jira/browse/KAFKA-16764
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 3.7.0
Reporter: Lianet Magrans


A call to consumer.poll should throw InvalidTopicException if an invalid topic 
is discovered in metadata. This can be easily reproduced by calling 
subscribe("invalid topic") and then poll, for example.The new consumer does not 
throw the expected InvalidTopicException like the LegacyKafkaConsumer does. 

The legacy consumer achieves this by checking for metadata exceptions on every 
iteration of the ConsumerNetworkClient (see 
[here|https://github.com/apache/kafka/blob/0e023e1f736ea03568032cc282803df8e61eb451/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java#L315])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:[VOTE] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Hector Geraldino (BLOOMBERG/ 919 3RD A)
+1 (non-binding) Thanks Vedarth!

From: dev@kafka.apache.org At: 05/14/24 12:13:14 UTC-4:00To:  
dev@kafka.apache.org
Subject: [VOTE] KIP-1028: Docker Official Image for Apache Kafka

Hi everyone,

I'd like to call a vote on KIP-1028 which aims to introduce a JVM based
Docker Official Image (DOI) for Apache Kafka.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Im
age+for+Apache+Kafka

Discussion thread -
https://lists.apache.org/thread/6vvwx173jcbgj6vqoq6bo8c0k0ntym0w

Thanks and regards,
Vedarth




Re: [VOTE] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Justine Olshan
+1 (binding) from me as well.

Thanks for working on the KIP.

Justine

On Tue, May 14, 2024 at 9:20 AM Manikumar  wrote:

> +1 (binding)
>
> Thanks for the KIP.
>
>
>
>
>
> On Tue, May 14, 2024, 9:46 PM Chris Egerton 
> wrote:
>
> > +1 (binding), thanks for the KIP!
> >
> > On Tue, May 14, 2024, 12:13 Vedarth Sharma 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to call a vote on KIP-1028 which aims to introduce a JVM based
> > > Docker Official Image (DOI) for Apache Kafka.
> > >
> > > KIP -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Image+for+Apache+Kafka
> > >
> > > Discussion thread -
> > > https://lists.apache.org/thread/6vvwx173jcbgj6vqoq6bo8c0k0ntym0w
> > >
> > > Thanks and regards,
> > > Vedarth
> > >
> >
>


Re: [DISCUSS] KIP-891: Running multiple versions of Connector plugins

2024-05-14 Thread Greg Harris
Hey Chris,

Thanks for your comments. I'm glad you like the motivations, Snehashis
wrote that part!

> the configuration syntax for the most basic use case of
> specifying a single desired version is pretty counterintuitive.

I agree, and the "soft" requirement scheme is something I wasn't
explicitly looking for, but would be inherited from the library. I'm
fine with eliminating the soft requirement semantics, and having
"1.0.0" behave the same as "[1.0.0]". I'm less inclined to include
"range" in the property name, or have two properties.

> Failing startup is drastic and has the potential to disrupt the
> availability of connectors that would otherwise be able to run healthily
> because they were explicitly configured to use valid converters instead of
> the worker defaults.

I think this argument cuts both ways. If someone reconfigures a worker
and adds an invalid ".version" string to the worker (or changes the
plugin.path to make it invalid), it would be permitted to enter the
group, and accept work assignments. If those work assignments used
these configurations, a set of tasks could transition to FAILED and
not be able to recover, because they would be restarted again on the
same worker.

> Why are metrics utilized to report information about plugin versions
> utilized by connectors at runtime instead of publishing this info in the
> REST API

I was following the existing pattern for exposing runtime versions for
connectors, and it did seem like a "monitoring" feature. If that
approach is flawed and should be phased out, I think it would be a
good idea to reconsider the REST API rejected alternative.
We would need some additional design work to spec out the REST API
interface, as I don't have anything in mind currently.

> I'm unclear on whether or not it'll be possible to see this information via 
> the
> GET /connector-plugins//config endpoint

There is room in the API to add recommenders for "key.converter",
"value.converter", and "header.converter", but not for transforms and
predicates, as they include aliases that depend on an actual
configuration. We could explicitly say we're going to do that, or do
whatever is convenient during the implementation phase, or leave it
open to be improved later.
There will not be any recommenders for ".version" properties in the
`/config` endpoint, because those recommenders are dynamic and depend
on an actual configuration.

5) There are two relevant lines in the KIP: "If a .version property
contains a hard requirement, select the latest installed version which
satisfies the requirement." and "This configuration is re-evaluated
each time the connector or task are assigned to a new worker". I would
call this "eager" upgrade behavior, rather than a "sticky" or "lazy"
upgrade behavior.

6) Updated!

Thanks,
Greg

On Tue, May 14, 2024 at 9:14 AM Chris Egerton  wrote:
>
> Hi all,
>
> Thanks Greg for updating the KIP, and thanks Snehashis for starting the
> work on this originally.
>
> The motivation section makes a pretty convincing case for this kind of
> feature. My thoughts are mostly about specific details:
>
> 1) I like the support for version ranges (the example demonstrating how to
> avoid KAFKA-10574 with the header converter was particularly entertaining),
> but the configuration syntax for the most basic use case of specifying a
> single desired version is pretty counterintuitive. People may get bitten or
> at least frustrated if they put connector.version=3.8.0 in a config but
> then version 3.7.5 ends up running. I'd like it if we could either
> intentionally deviate from Maven ranges when a bare version is present, or
> separate things out into two properties: foo.version would be the single
> accepted version for the foo plugin, and foo.version.range would use Maven
> range syntax. Open to other options too, just providing a couple to get the
> ball rolling.
>
> 2) Although the current behavior for a worker with an invalid
> key/value/header converter class specified in its config file is a little
> strange (I was surprised to learn that it wouldn't fail on startup), I
> don't see a good reason to deviate from this when an invalid version is
> specified. Failing startup is drastic and has the potential to disrupt the
> availability of connectors that would otherwise be able to run healthily
> because they were explicitly configured to use valid converters instead of
> the worker defaults.
>
> 3) Why are metrics utilized to report information about plugin versions
> utilized by connectors at runtime instead of publishing this info in the
> REST API? I saw that this was mentioned as a rejected alternative, but I
> didn't get a sense of why. It seems like the REST API would be easier to
> access and more intuitive for most users than new metrics.
>
> 4) In the "Validation" section it's stated that "Users can use these
> recommenders to discover the valid plugin classes and versions, without
> requiring an earlier call to GET 

Re: [DISCUSS] KIP-989: RocksDB Iterator Metrics

2024-05-14 Thread Lucas Brutschy
Hi Nick,

you are still referring to oldest-open-iterator-age-ms in the
`Proposed Changes` section.

Cheers,
Lucas

On Thu, May 2, 2024 at 4:00 PM Lucas Brutschy  wrote:
>
> Hi Nick!
>
> I agree, the age variant is a bit nicer since the semantics are very
> clear from the name. If you'd rather go for the simple implementation,
> how about calling it `oldest-iterator-open-since-ms`? I believe this
> could be understood without docs. Either way, I think we should be
> able to open the vote for this KIP because nobody raised any major /
> blocking concerns.
>
> Looking forward to getting this voted on soon!
>
> Cheers
> Lucas
>
> On Sun, Mar 31, 2024 at 5:23 PM Nick Telford  wrote:
> >
> > Hi Matthias,
> >
> > > For the oldest iterator metric, I would propose something simple like
> > > `iterator-opened-ms` and it would just be the actual timestamp when the
> > > iterator was opened. I don't think we need to compute the actual age,
> > > but user can to this computation themselves?
> >
> > That works for me; it's easier to implement like that :-D I'm a little
> > concerned that the name "iterator-opened-ms" may not be obvious enough
> > without reading the docs.
> >
> > > If we think reporting the age instead of just the timestamp is better, I
> > > would propose `iterator-max-age-ms`. I should be sufficient to call out
> > > (as it's kinda "obvious" anyway) that the metric applies to open
> > > iterator only.
> >
> > While I think it's preferable to record the timestamp, rather than the age,
> > this does have the benefit of a more obvious metric name.
> >
> > > Nit: the KIP says it's a store-level metric, but I think it would be
> > > good to say explicitly that it's recorded with DEBUG level only?
> >
> > Yes, I've already updated the KIP with this information in the table.
> >
> > Regards,
> >
> > Nick
> >
> > On Sun, 31 Mar 2024 at 10:53, Matthias J. Sax  wrote:
> >
> > > The time window thing was just an idea. Happy to drop it.
> > >
> > > For the oldest iterator metric, I would propose something simple like
> > > `iterator-opened-ms` and it would just be the actual timestamp when the
> > > iterator was opened. I don't think we need to compute the actual age,
> > > but user can to this computation themselves?
> > >
> > > If we think reporting the age instead of just the timestamp is better, I
> > > would propose `iterator-max-age-ms`. I should be sufficient to call out
> > > (as it's kinda "obvious" anyway) that the metric applies to open
> > > iterator only.
> > >
> > > And yes, I was hoping that the code inside MetereXxxStore might already
> > > be setup in a way that custom stores would inherit the iterator metrics
> > > automatically -- I am just not sure, and left it as an exercise for
> > > somebody to confirm :)
> > >
> > >
> > > Nit: the KIP says it's a store-level metric, but I think it would be
> > > good to say explicitly that it's recorded with DEBUG level only?
> > >
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 3/28/24 2:52 PM, Nick Telford wrote:
> > > > Quick addendum:
> > > >
> > > > My suggested metric "oldest-open-iterator-age-seconds" should be
> > > > "oldest-open-iterator-age-ms". Milliseconds is obviously a better
> > > > granularity for such a metric.
> > > >
> > > > Still accepting suggestions for a better name.
> > > >
> > > > On Thu, 28 Mar 2024 at 13:41, Nick Telford 
> > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Sorry for leaving this for so long. So much for "3 weeks until KIP
> > > freeze"!
> > > >>
> > > >> On Sophie's comments:
> > > >> 1. Would Matthias's suggestion of a separate metric tracking the age of
> > > >> the oldest open iterator (within the tag set) satisfy this? That way we
> > > can
> > > >> keep iterator-duration-(avg|max) for closed iterators, which can be
> > > useful
> > > >> for performance debugging for iterators that don't leak. I'm not sure
> > > what
> > > >> we'd call this metric, maybe: "oldest-open-iterator-age-seconds"? Seems
> > > >> like a mouthful.
> > > >>
> > > >> 2. You're right, it makes more sense to provide
> > > >> iterator-duration-(avg|max). Honestly, I can't remember why I had
> > > "total"
> > > >> before, or why I was computing a rate-of-change over it.
> > > >>
> > > >> 3, 4, 5, 6. Agreed, I'll make all those changes as suggested.
> > > >>
> > > >> 7. Combined with Matthias's point about RocksDB, I'm convinced that 
> > > >> this
> > > >> is the wrong KIP for these. I'll introduce the additional Rocks metrics
> > > in
> > > >> another KIP.
> > > >>
> > > >> On Matthias's comments:
> > > >> A. Not sure about the time window. I'm pretty sure all existing avg/max
> > > >> metrics are since the application was started? Any other suggestions
> > > here
> > > >> would be appreciated.
> > > >>
> > > >> B. Agreed. See point 1 above.
> > > >>
> > > >> C. Good point. My focus was very much on Rocks memory leaks when I 
> > > >> wrote
> > > >> the first draft. I can generalise it. My only concern is that it might
> > > 

Re: [DISCUSS] KIP-1035: StateStore managed changelog offsets

2024-05-14 Thread Nick Telford
Hi everyone,

Sorry for the delay in replying. I've finally now got some time to work on
this.

Addressing Matthias's comments:

100.
Good point. As Bruno mentioned, there's already AbstractReadWriteDecorator
which we could leverage to provide that protection. I'll add details on
this to the KIP.

101,102.
It looks like these points have already been addressed by Bruno. Let me
know if anything here is still unclear or you feel needs to be detailed
more in the KIP.

103.
I'm in favour of anything that gets the old code removed sooner, but
wouldn't deprecating an API that we expect (some) users to implement cause
problems?
I'm thinking about implementers of custom StateStores, as they may be
confused by managesOffsets() being deprecated, especially since they would
have to mark their implementation as @Deprecated in order to avoid compile
warnings.
If deprecating an API *while it's still expected to be implemented* is
something that's generally done in the project, then I'm happy to do so
here.

104.
I think this is technically possible, but at the cost of considerable
additional code to maintain. Would we ever have a pathway to remove this
downgrade code in the future?


Regarding rebalance metadata:
Opening all stores on start-up to read and cache their offsets is an
interesting idea, especially if we can avoid re-opening the stores once the
Tasks have been assigned. Scalability shouldn't be too much of a problem,
because typically users have a fairly short state.cleanup.delay, so the
number of on-disk Task directories should rarely exceed the number of Tasks
previously assigned to that instance.
An advantage of this approach is that it would also simplify StateStore
implementations, as they would only need to guarantee that committed
offsets are available when the store is open.

I'll investigate this approach this week for feasibility and report back.

I think that covers all the outstanding feedback, unless I missed anything?

Regards,
Nick

On Mon, 6 May 2024 at 14:06, Bruno Cadonna  wrote:

> Hi Matthias,
>
> I see what you mean.
>
> To sum up:
>
> With this KIP the .checkpoint file is written when the store closes.
> That is when:
> 1. a task moves away from Kafka Streams client
> 2. Kafka Streams client shuts down
>
> A Kafka Streams client needs the information in the .checkpoint file
> 1. on startup because it does not have any open stores yet.
> 2. during rebalances for non-empty state directories of tasks that are
> not assigned to the Kafka Streams client.
>
> With hard crashes, i.e., when the Streams client is not able to close
> its state stores and write the .checkpoint file, the .checkpoint file
> might be quite stale. That influences the next rebalance after failover
> negatively.
>
>
> My conclusion is that Kafka Streams either needs to open the state
> stores at start up or we write the checkpoint file more often.
>
> Writing the .checkpoint file during processing more often without
> controlling the flush to disk would work. However, Kafka Streams would
> checkpoint offsets that are not yet persisted on disk by the state
> store. That is with a hard crash the offsets in the .checkpoint file
> might be larger than the offsets checkpointed in the state store. That
> might not be a problem if Kafka Streams uses the .checkpoint file only
> to compute the task lag. The downside is that it makes the managing of
> checkpoints more complex because now we have to maintain two
> checkpoints: one for restoration and one for computing the task lag.
> I think we should explore the option where Kafka Streams opens the state
> stores at start up to get the offsets.
>
> I also checked when Kafka Streams needs the checkpointed offsets to
> compute the task lag during a rebalance. Turns out Kafka Streams needs
> them before sending the join request. Now, I am wondering if opening the
> state stores of unassigned tasks whose state directory exists locally is
> actually such a big issue due to the expected higher latency since it
> happens actually before the Kafka Streams client joins the rebalance.
>
> Best,
> Bruno
>
>
>
>
>
>
>
> On 5/4/24 12:05 AM, Matthias J. Sax wrote:
> > That's good questions... I could think of a few approaches, but I admit
> > it might all be a little bit tricky to code up...
> >
> > However if we don't solve this problem, I think this KIP does not really
> > solve the core issue we are facing? In the end, if we rely on the
> > `.checkpoint` file to compute a task assignment, but the `.checkpoint`
> > file can be arbitrary stale after a crash because we only write it on a
> > clean close, there would be still a huge gap that this KIP does not
> close?
> >
> > For the case in which we keep the checkpoint file, this KIP would still
> > help for "soft errors" in which KS can recover, and roll back the store.
> > A significant win for sure. -- But hard crashes would still be an
> > problem? We might assign tasks to "wrong" instance, ie, which are not
> > most up to date, as the 

[jira] [Created] (KAFKA-16763) Upgrade to scala 2.12.19 and scala 2.13.14

2024-05-14 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16763:
--

 Summary: Upgrade to scala 2.12.19 and scala 2.13.14
 Key: KAFKA-16763
 URL: https://issues.apache.org/jira/browse/KAFKA-16763
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


scala 2.12.19 (https://github.com/scala/scala/releases/tag/v2.12.19)

 

scala 2.13.14 (https://github.com/scala/scala/releases/tag/v2.13.14)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-989: RocksDB Iterator Metrics

2024-05-14 Thread Lucas Brutschy
Hi Nick!

Thanks for the KIP.

+1 (binding)

On Tue, May 14, 2024 at 5:16 PM Nick Telford  wrote:
>
> Hi everyone,
>
> I'd like to call a vote on the Kafka Streams KIP-989: RocksDB Iterator
> Metrics:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics
>
> All of the points in the discussion thread have now been addressed.
>
> Regards,
>
> Nick


Re: [VOTE] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Manikumar
+1 (binding)

Thanks for the KIP.





On Tue, May 14, 2024, 9:46 PM Chris Egerton  wrote:

> +1 (binding), thanks for the KIP!
>
> On Tue, May 14, 2024, 12:13 Vedarth Sharma 
> wrote:
>
> > Hi everyone,
> >
> > I'd like to call a vote on KIP-1028 which aims to introduce a JVM based
> > Docker Official Image (DOI) for Apache Kafka.
> >
> > KIP -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Image+for+Apache+Kafka
> >
> > Discussion thread -
> > https://lists.apache.org/thread/6vvwx173jcbgj6vqoq6bo8c0k0ntym0w
> >
> > Thanks and regards,
> > Vedarth
> >
>


Re: [VOTE] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Chris Egerton
+1 (binding), thanks for the KIP!

On Tue, May 14, 2024, 12:13 Vedarth Sharma  wrote:

> Hi everyone,
>
> I'd like to call a vote on KIP-1028 which aims to introduce a JVM based
> Docker Official Image (DOI) for Apache Kafka.
>
> KIP -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Image+for+Apache+Kafka
>
> Discussion thread -
> https://lists.apache.org/thread/6vvwx173jcbgj6vqoq6bo8c0k0ntym0w
>
> Thanks and regards,
> Vedarth
>


Re: [DISCUSS] KIP-891: Running multiple versions of Connector plugins

2024-05-14 Thread Chris Egerton
Hi all,

Thanks Greg for updating the KIP, and thanks Snehashis for starting the
work on this originally.

The motivation section makes a pretty convincing case for this kind of
feature. My thoughts are mostly about specific details:

1) I like the support for version ranges (the example demonstrating how to
avoid KAFKA-10574 with the header converter was particularly entertaining),
but the configuration syntax for the most basic use case of specifying a
single desired version is pretty counterintuitive. People may get bitten or
at least frustrated if they put connector.version=3.8.0 in a config but
then version 3.7.5 ends up running. I'd like it if we could either
intentionally deviate from Maven ranges when a bare version is present, or
separate things out into two properties: foo.version would be the single
accepted version for the foo plugin, and foo.version.range would use Maven
range syntax. Open to other options too, just providing a couple to get the
ball rolling.

2) Although the current behavior for a worker with an invalid
key/value/header converter class specified in its config file is a little
strange (I was surprised to learn that it wouldn't fail on startup), I
don't see a good reason to deviate from this when an invalid version is
specified. Failing startup is drastic and has the potential to disrupt the
availability of connectors that would otherwise be able to run healthily
because they were explicitly configured to use valid converters instead of
the worker defaults.

3) Why are metrics utilized to report information about plugin versions
utilized by connectors at runtime instead of publishing this info in the
REST API? I saw that this was mentioned as a rejected alternative, but I
didn't get a sense of why. It seems like the REST API would be easier to
access and more intuitive for most users than new metrics.

4) In the "Validation" section it's stated that "Users can use these
recommenders to discover the valid plugin classes and versions, without
requiring an earlier call to GET /connector-plugins?connectorsOnly=false."
I really like the creativity and simplicity of reusing the recommender
mechanism to expose available versions for plugins via the REST API. I'm
unclear on whether or not it'll be possible to see this information via the
GET /connector-plugins//config endpoint, though. It'd be great if
this were supported, since we learned in KIP-769 [1] that people really
want to be able to see configuration options for connectors and their
plugins via some kind of GET endpoint without having to provide a complete
connector config for validation.

5) In the Maven version range docs, it's stated that "Maven picks the
highest version of each project that satisfies all the hard requirements of
the dependencies on that project." I'm guessing this behavior will be
retained for Connect; i.e., the highest-possible version of each plugin
that satisfies the user-specified version constraints will be run? (An
alternative approach could be to have some kind of "sticky" logic that only
restarts connectors/tasks when their currently-used version becomes
incompatible with the configured constraints.)

6) (Nit) It'd be nice to add a link to the TestPlugins class or somewhere
in its neighborhood to the testing plan; unfamiliar readers probably won't
get much out of what's there right now.

[1] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions

Cheers,

Chris

On Mon, May 13, 2024 at 2:01 PM Snehashis  wrote:

> Hi Greg,
>
> That is much appreciated. No complaints on the additional scope, I will
> make some time out to work on this once we have approval.
>
> Thanks
> Snehashis
>
> On Fri, May 10, 2024 at 9:28 PM Greg Harris 
> wrote:
>
> > Hey Snehashis,
> >
> > I'm glad to hear you're still interested in this KIP!
> > I'm happy to let you drive this, and I apologize for increasing the
> > scope of work so drastically. To make up for that, I'll volunteer to
> > be the primary PR reviewer to help get this done quickly once the KIP
> > is approved.
> >
> > Thanks,
> > Greg
> >
> >
> > On Fri, May 10, 2024 at 3:51 AM Snehashis 
> > wrote:
> > >
> > > Hi Greg,
> > >
> > > Thanks for the follow up to my original KIP, I am in favour of the
> > > additions made to expand its scope, the addition of range versions
> > > specifically make a lot of sense.
> > >
> > > Apologies if I have not publicly worked on this KIP for a long time.
> The
> > > original work was done when the move to service loading was in
> discussion
> > > and I wanted to loop back to this only after that work was completed.
> > Post
> > > its conclusion, I have not been able to take this up due to other
> > > priorities. If it's okay with you, I would still like to get this
> > > implemented myself, including the additional scope.
> > >
> > > Thanks and regards
> > > Snehashis
> > >
> > > On Fri, May 10, 2024 at 12:45 AM Greg Harris
> > 
> 

[VOTE] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Vedarth Sharma
Hi everyone,

I'd like to call a vote on KIP-1028 which aims to introduce a JVM based
Docker Official Image (DOI) for Apache Kafka.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Image+for+Apache+Kafka

Discussion thread -
https://lists.apache.org/thread/6vvwx173jcbgj6vqoq6bo8c0k0ntym0w

Thanks and regards,
Vedarth


Re: [DISCUSS] KIP-1032: Upgrade to Jakarta and JavaEE 9 in Kafka 4.0

2024-05-14 Thread Christopher Shannon
I just wanted to bump this and see if anyone had more feedback before
trying to call a vote for this for 4.0?

Chris

On Mon, Apr 1, 2024 at 3:41 PM Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Greg,
>
> 1. Ok sounds good we can target JDK 17 in this KIP if we decide to do that.
> 2. For the EE version, I don't think it really matters since we won't be
> using any new features that I am aware of. It's just something I noticed
> that we will need to pick because Jetty 12 supports multiple versions so it
> would affect which spec jars we use.  In the past Jetty versions have been
> tied to a specific Servlet spec but the new Jetty 12 they have abstracted
> things away and they support multiple versions simultaneously. There's
> different versions for all the specs but the primary one to note for us
> would be that JavaEE 9 uses the Servlet 5.0 spec and JavaEE 10 uses the
> Servlet 6.0 spec. JavaEE 11 is under development and will use the Servlet
> 6.1 spec. So we may not really need to call out the EE version at all if it
> doesn't matter and we are not using specific features but I wanted to bring
> it up since multiple versions are listed as being compatible with Jetty 12
> so we need to pick one. On the main page they list the different servlet
> specs they support: https://eclipse.dev/jetty/
> 3. Right, I didn't mean we should include it in the KIP, I was more asking
> I guess how to go about things. It looks like we could use a lot of it and
> adapt the work already done. While it's under the Apache 2.0 license and we
> could use it, someone else wrote it so it would still be good to properly
> credit that person as you mentioned. If I work on it I would probably start
> over with a new branch and just use the old PR as a guide and then maybe
> figure out a way to credit the original author. There's always that
> co-author tag that could be used I think.
>
> Chris
>
>
> On Mon, Apr 1, 2024 at 12:11 PM Greg Harris 
> wrote:
>
>> Hey Chris,
>>
>> Thanks for your questions!
>>
>> 1. KIPs are generally immutable once they've been voted on. In this
>> case, I think it's better that this KIP include the bump to Java 17
>> just for Connect and MirrorMaker 2, and should include that in the KIP
>> title.
>> 2. I'm not familiar with the difference, can you provide some more
>> context that would help us make a decision? AFAIU we haven't specified
>> an EE version in the past, and we don't do any sort of automated
>> testing for compatibility. I think it would be good to call out which
>> components have JavaEE-sensitive dependencies (just connect-runtime?).
>> We do not want to accidentally give users the idea that the clients
>> depend on the JavaEE version, as that could be very confusing.
>> 3. That's an implementation detail left up to anyone that effects this
>> KIP on the repo, and doesn't need to be mentioned in the KIP itself. I
>> have seen people adopt changes from un-merged PRs after the original
>> contributor has lost interest, while still crediting the original
>> contributor for their portion of the changes. If you're doing this,
>> then it's ultimately up to your judgement.
>>
>> Thanks,
>> Greg
>>
>> On Mon, Apr 1, 2024 at 6:30 AM Christopher Shannon
>>  wrote:
>> >
>> > Hi Greg,
>> >
>> > Thanks for the detailed analysis on the connect framework. It sounds
>> like
>> > hopefully we can go ahead and require JDK 17+ and bump that dependency
>> for
>> > the ConnectRestExtensionContext.
>> >
>> > I agree we can leave it open and hear what others think as well about
>> the
>> > requirement, if everyone ends up agreeing, I can update the KIP to
>> reflect
>> > requiring JDK 17 and going with Jetty 12.
>> >
>> > I had a couple of questions
>> > 1) If we go with JDK 17 as a requirement for the Connect framework,
>> should
>> > we modify and make that part of KIP-1013 or keep it in this one?
>> > 2) Should we go with JavaEE 9 or JavaEE 10? I'm not sure how much it
>> > matters in this case.
>> > 3) Can we just re-open https://github.com/apache/kafka/pull/10176 as a
>> > starting point or maybe we can create a new PR and use it as a basis?
>> It's
>> > a bit old so I suspect there would be a ton of conflicts so it might be
>> > best to start over and use it as a guide. I can work on a new PR once we
>> > are all on the same page. I don't think it would take too long to put
>> > together since most of the work is just dependency updates and package
>> > renaming.
>> >
>> > Chris
>> >
>> >
>> > On Fri, Mar 29, 2024 at 8:59 PM Greg Harris
>> 
>> > wrote:
>> >
>> > > Hey all,
>> > >
>> > > I looked into how Debezium handled the javax->jakarta changeover for
>> > > Quarkus, and found this release note:
>> > >
>> > >
>> https://debezium.io/blog/2023/04/20/debezium-2-2-final-released/#new-quarkus-3
>> > > It appears that Debezium 2.1 required Quarkus < 3.0, and Debezium 2.2
>> > > required Quarkus >= 3.0. The upgrade for Kafka could be very similar
>> > > and not incur a major version 

[jira] [Created] (KAFKA-16762) SyncGroup API for upgrading ConsumerGroup

2024-05-14 Thread Dongnuo Lyu (Jira)
Dongnuo Lyu created KAFKA-16762:
---

 Summary: SyncGroup API for upgrading ConsumerGroup
 Key: KAFKA-16762
 URL: https://issues.apache.org/jira/browse/KAFKA-16762
 Project: Kafka
  Issue Type: Sub-task
Reporter: Dongnuo Lyu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16406) Split long-running consumer integration test

2024-05-14 Thread Lianet Magrans (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans resolved KAFKA-16406.

Resolution: Fixed

> Split long-running consumer integration test
> 
>
> Key: KAFKA-16406
> URL: https://issues.apache.org/jira/browse/KAFKA-16406
> Project: Kafka
>  Issue Type: Task
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Major
> Fix For: 3.8.0
>
>
> PlaintextConsumerTest contains integration tests for the consumer. Since the 
> introduction of the new consumer group protocol (KIP-848) and the new 
> KafkaConsumer, this test has been parametrized to run with multiple 
> combinations, making sure we test the logic for the old and new coordinator, 
> as well as for the legacy and new KafkaConsumer. 
> This led to this being one of the longest-running integration tests, so in 
> the aim of reducing the impact on the build times we could split it to allow 
> for parallelization.  The tests covers multiple areas of the consumer logic, 
> in a single file, so splitting based on the high-level features being tested 
> would be sensible and achieve the result wanted.   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1033: Add Kafka Streams exception handler for exceptions occurring during processing

2024-05-14 Thread Lianet M.
+1 (non-binding)

Thanks for the KIP and updates!

Lianet

On Tue, May 14, 2024, 12:03 a.m. Matthias J. Sax  wrote:

> +1 (binding)
>
> On 5/13/24 5:54 PM, Sophie Blee-Goldman wrote:
> > Thanks for the KIP guys!
> >
> > +1 (binding)
> >
> > On Mon, May 13, 2024 at 6:02 AM Bill Bejeck  wrote:
> >
> >> Thanks for the KIP, this will be a great addition!
> >>
> >> +1(binding)
> >>
> >> -Bill
> >>
> >> On Fri, May 3, 2024 at 4:48 AM Bruno Cadonna 
> wrote:
> >>
> >>> Hi Damien, Sébastien, and Loïc,
> >>>
> >>> Thanks for the KIP!
> >>>
> >>> +1 (binding)
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>>
> >>> On 4/26/24 4:00 PM, Damien Gasparina wrote:
>  Hi all,
> 
>  We would like to start a vote for KIP-1033: Add Kafka Streams
>  exception handler for exceptions occurring during processing
> 
>  The KIP is available on
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1033%3A+Add+Kafka+Streams+exception+handler+for+exceptions+occurring+during+processing
> 
>  If you have any suggestions or feedback, feel free to participate to
>  the discussion thread:
>  https://lists.apache.org/thread/1nhhsrogmmv15o7mk9nj4kvkb5k2bx9s
> 
>  Best regards,
>  Damien Sebastien and Loic
> >>>
> >>
> >
>


Re: [DISCUSS] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Vedarth Sharma
Thanks for providing feedback and helping in improving this KIP.

We have done a very thorough discussion on this KIP and I feel it's now
ready for the voting process.

Thanks and regards,
Vedarth


On Tue, 14 May 2024 at 20:43, Chris Egerton  wrote:

> Hi Vedarth,
>
> This looks great, thank you for helping me thoroughly understand the
> motivation and benefits for the KIP. Looks good to me.
>
> Regarding public interface for the images--everything in the "Public
> Interface" section in KIP-975 does qualify as public interface for the
> images IMO, but I don't think it's comprehensive. If we were asked to, for
> example, change the port in the EXPOSE directive in our Dockerfile, that
> would probably qualify as a change to public interface too. With the strict
> language in the latest draft of this KIP that ensures that any functional
> changes to our Docker images go through another follow-up KIP, we should be
> fine without having to identify a comprehensive list of everything that
> constitutes public interface for our images.
>
> Cheers, and thanks again for the KIP,
>
> Chris
>
> On Mon, May 13, 2024 at 3:07 PM Vedarth Sharma 
> wrote:
>
> > Hey Chris,
> >
> > Once we provide the definitions to docker, they should take care of
> > everything from there. They mentioned here
> > <
> >
> https://github.com/docker-library/official-images?tab=readme-ov-file#library-definition-files
> > >
> > that
> > the image will be rebuilt when the base image is updated. Hence active
> > rebuilds won't require any changes from our side.
> > If we are packaging something which may contain a CVE, like some jar,
> then
> > the onus will be on us to patch it, but it will be upto us whether we
> > consider the threat severe enough to fix and when we want to provide the
> > fixed version. Having Docker Official Image will not impact the frequency
> > of our releases. It will be the Apache Kafka community's call on when a
> > release goes and Docker Official Image will be released accordingly as
> per
> > the KIP. source
> >  >
> >
> > As mentioned in Docker's documentation as well "In essence we strive to
> > heed upstream's recommendations on how they intend for their software to
> be
> > consumed." source
> > <
> >
> https://github.com/docker-library/official-images?tab=readme-ov-file#what-are-official-images
> > >
> > Docker Official Image will rely on upstream's recommendation for
> > functionality. But I do agree that since Docker's stance on this might
> > change in future it makes sense to put a safeguard that will not allow
> any
> > functionality changes get incorporated as part of the vetting process. I
> > have updated the KIP to reflect the same.
> >
> > KIP-975 has a well defined public interface based on how configs can be
> > supplied and how it can be used. I am not sure if we put that label on it
> > during discussions. I am happy to have a separate email thread on it to
> > iron things out.
> >
> > I hope this addresses all of your concerns!
> >
> > Thanks and regards,
> > Vedarth
> >
> > On Mon, May 13, 2024 at 10:55 PM Chris Egerton 
> > wrote:
> >
> > > Thanks both for your responses! Friendly reminder: again, better to
> > provide
> > > a quote instead of just a link :)
> > >
> > > I've seen a bit about image rebuilding to handle CVEs but I'm a little
> > > unclear on how this would work in practice, and I couldn't find any
> > > concrete details in any of the links. Does Docker do this automatically
> > for
> > > DOIs? Or will the onus be on us to put out patched images? Would this
> > lead
> > > to us putting out images more quickly than we put out standard
> releases?
> > As
> > > a plus, it does look like DOIs get the benefit of Docker Scout [1] for
> > > free, which is nice, but it's still unclear who'd be doing the rest of
> > the
> > > work on that front.
> > >
> > > As far as this point from Vedarth goes:
> > >
> > > > By incorporating the source code of the Docker Official Image into
> our
> > > > AK ecosystem, we gain control over its functionality, ensuring
> > alignment
> > > > with the OSS Docker image. This ensures a seamless experience for
> users
> > > who
> > > > may need to transition between these images.
> > >
> > > This captures my concern with the KIP pretty well. If there's any
> > > significant divergence in behavior (not just build methodology) between
> > the
> > > apache/kafka image and what Docker requires for a Kafka DOI, how are we
> > > going to vet these changes moving forward? Under the "Post Release
> > Process
> > > - if Dockerhub folks suggest changes to the Dockerfiles:" header, this
> > KIP
> > > proposes that we port all suggested changes for the DOI to
> > > the docker/jvm/Dockerfile image, but this seems a bit too permissive.
> As
> > an
> > > alternative, we could state that all build-related changes can be done
> > with
> > > a PR on the apache/kafka GitHub repo (which will require 

[VOTE] KIP-950: Tiered Storage Disablement

2024-05-14 Thread Christo Lolov
Heya!

I would like to start a vote on KIP-950: Tiered Storage Disablement in
order to catch the last Kafka release targeting Zookeeper -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement

Best,
Christo


[VOTE] KIP-989: RocksDB Iterator Metrics

2024-05-14 Thread Nick Telford
Hi everyone,

I'd like to call a vote on the Kafka Streams KIP-989: RocksDB Iterator
Metrics:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics

All of the points in the discussion thread have now been addressed.

Regards,

Nick


Re: [DISCUSS] KIP-1028: Docker Official Image for Apache Kafka

2024-05-14 Thread Chris Egerton
Hi Vedarth,

This looks great, thank you for helping me thoroughly understand the
motivation and benefits for the KIP. Looks good to me.

Regarding public interface for the images--everything in the "Public
Interface" section in KIP-975 does qualify as public interface for the
images IMO, but I don't think it's comprehensive. If we were asked to, for
example, change the port in the EXPOSE directive in our Dockerfile, that
would probably qualify as a change to public interface too. With the strict
language in the latest draft of this KIP that ensures that any functional
changes to our Docker images go through another follow-up KIP, we should be
fine without having to identify a comprehensive list of everything that
constitutes public interface for our images.

Cheers, and thanks again for the KIP,

Chris

On Mon, May 13, 2024 at 3:07 PM Vedarth Sharma 
wrote:

> Hey Chris,
>
> Once we provide the definitions to docker, they should take care of
> everything from there. They mentioned here
> <
> https://github.com/docker-library/official-images?tab=readme-ov-file#library-definition-files
> >
> that
> the image will be rebuilt when the base image is updated. Hence active
> rebuilds won't require any changes from our side.
> If we are packaging something which may contain a CVE, like some jar, then
> the onus will be on us to patch it, but it will be upto us whether we
> consider the threat severe enough to fix and when we want to provide the
> fixed version. Having Docker Official Image will not impact the frequency
> of our releases. It will be the Apache Kafka community's call on when a
> release goes and Docker Official Image will be released accordingly as per
> the KIP. source
> 
>
> As mentioned in Docker's documentation as well "In essence we strive to
> heed upstream's recommendations on how they intend for their software to be
> consumed." source
> <
> https://github.com/docker-library/official-images?tab=readme-ov-file#what-are-official-images
> >
> Docker Official Image will rely on upstream's recommendation for
> functionality. But I do agree that since Docker's stance on this might
> change in future it makes sense to put a safeguard that will not allow any
> functionality changes get incorporated as part of the vetting process. I
> have updated the KIP to reflect the same.
>
> KIP-975 has a well defined public interface based on how configs can be
> supplied and how it can be used. I am not sure if we put that label on it
> during discussions. I am happy to have a separate email thread on it to
> iron things out.
>
> I hope this addresses all of your concerns!
>
> Thanks and regards,
> Vedarth
>
> On Mon, May 13, 2024 at 10:55 PM Chris Egerton 
> wrote:
>
> > Thanks both for your responses! Friendly reminder: again, better to
> provide
> > a quote instead of just a link :)
> >
> > I've seen a bit about image rebuilding to handle CVEs but I'm a little
> > unclear on how this would work in practice, and I couldn't find any
> > concrete details in any of the links. Does Docker do this automatically
> for
> > DOIs? Or will the onus be on us to put out patched images? Would this
> lead
> > to us putting out images more quickly than we put out standard releases?
> As
> > a plus, it does look like DOIs get the benefit of Docker Scout [1] for
> > free, which is nice, but it's still unclear who'd be doing the rest of
> the
> > work on that front.
> >
> > As far as this point from Vedarth goes:
> >
> > > By incorporating the source code of the Docker Official Image into our
> > > AK ecosystem, we gain control over its functionality, ensuring
> alignment
> > > with the OSS Docker image. This ensures a seamless experience for users
> > who
> > > may need to transition between these images.
> >
> > This captures my concern with the KIP pretty well. If there's any
> > significant divergence in behavior (not just build methodology) between
> the
> > apache/kafka image and what Docker requires for a Kafka DOI, how are we
> > going to vet these changes moving forward? Under the "Post Release
> Process
> > - if Dockerhub folks suggest changes to the Dockerfiles:" header, this
> KIP
> > proposes that we port all suggested changes for the DOI to
> > the docker/jvm/Dockerfile image, but this seems a bit too permissive. As
> an
> > alternative, we could state that all build-related changes can be done
> with
> > a PR on the apache/kafka GitHub repo (which will require approval from a
> > single committer), but any functional changes will require a KIP.
> >
> > Finally, during KIP-975 was there discussion on what we would count as
> the
> > public interface for the apache/kafka image? If not, it'd be nice to get
> > that ironed out since it may make future discussions around our Docker
> > images quicker, but I don't think this is necessary for KIP-1028.
> >
> > [1] - https://www.docker.com/products/docker-scout/
> >
> > On Mon, May 13, 2024 at 

Community over Code EU 2024: The countdown has started!

2024-05-14 Thread Ryan Skraba
[Note: You're receiving this email because you are subscribed to one
or more project dev@ mailing lists at the Apache Software Foundation.]

We are very close to Community Over Code EU -- check out the amazing
program and the special discounts that we have for you.

Special discounts

You still have the opportunity to secure your ticket for Community
Over Code EU. Explore the various options available, including the
regular pass, the committer and groups pass, and now introducing the
one-day pass tailored for locals in Bratislava.

We also have a special discount for you to attend both Community Over
Code and Berlin Buzzwords from June 9th to 11th. Visit our website to
find out more about this opportunity and contact te...@sg.com.mx to
get the discount code.

Take advantage of the discounts and register now!
https://eu.communityovercode.org/tickets/

Check out the full program!

This year Community Over Code Europe will bring to you three days of
keynotes and sessions that cover topics of interest for ASF projects
and the greater open source ecosystem including data engineering,
performance engineering, search, Internet of Things (IoT) as well as
sessions with tips and lessons learned on building a healthy open
source community.

Check out the program: https://eu.communityovercode.org/program/

Keynote speaker highlights for Community Over Code Europe include:

* Dirk-Willem Van Gulik, VP of Public Policy at the Apache Software
Foundation, will discuss the Cyber Resiliency Act and its impact on
open source (All your code belongs to Policy Makers, Politicians, and
the Law).

* Dr. Sherae Daniel will share the results of her study on the impact
of self-promotion for open source software developers (To Toot or not
to Toot, that is the question).

* Asim Hussain, Executive Director of the Green Software Foundation
will present a framework they have developed for quantifying the
environmental impact of software (Doing for Sustainability what Open
Source did for Software).

* Ruth Ikegah will  discuss the growth of the open source movement in
Africa (From Local Roots to Global Impact: Building an Inclusive Open
Source Community in Africa)

* A discussion panel on EU policies and regulations affecting
specialists working in Open Source Program Offices

Additional activities

* Poster sessions: We invite you to stop by our poster area and see if
the ideas presented ignite a conversation within your team.

* BOF time: Don't miss the opportunity to discuss in person with your
open source colleagues on your shared interests.

* Participants reception: At the end of the first day, we will have a
reception at the event venue. All participants are welcome to attend!

* Spontaneous talks: There is a dedicated room and social space for
having spontaneous talks and sessions. Get ready to share with your
peers.

* Lighting talks: At the end of the event we will have the awaited
Lighting talks, where every participant is welcome to share and
enlighten us.

Please remember:  If you haven't applied for the visa, we will provide
the necessary letter for the process. In the unfortunate case of a
visa rejection, your ticket will be reimbursed.

See you in Bratislava,

Community Over Code EU Team


[jira] [Resolved] (KAFKA-13115) doSend can be blocking

2024-05-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-13115.

Resolution: Fixed

> doSend can be blocking
> --
>
> Key: KAFKA-13115
> URL: https://issues.apache.org/jira/browse/KAFKA-13115
> Project: Kafka
>  Issue Type: Task
>  Components: docs
>Reporter: Ivan Vaskevych
>Priority: Minor
>  Labels: documentation, pull-request-available
>
> https://github.com/apache/kafka/pull/11023



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-05-14 Thread Chris Egerton
Hi Alieh,

Thank you for all the updates! One final question--how will the retry
timeout for unknown topic partition errors be implemented? I think it would
be best if this could be done with an implementation of the error handler,
but I don't see a way to track the necessary information with the
current ProducerExceptionHandler interface.

Cheers,

Chris

On Tue, May 14, 2024 at 9:10 AM Alieh Saeedi 
wrote:

> Thanks Andrew. Done :)
>
> @Chris: I changed the config parameter type from boolean to integer, which
> defines the timeout for retrying. I thought reusing `max.block.ms` was not
> reasonable as you mentioned.
>
> So if the KIP looks good, let 's skip to the good part ;-) VOTING :)
>
> Bests,
> Alieh
>
>
>
>
>
> On Tue, May 14, 2024 at 12:26 PM Andrew Schofield <
> andrew_schofi...@live.com>
> wrote:
>
> > Hi Alieh,
> > Just one final comment.
> >
> > [AJS5] Existing classes use Retriable, not Retryable. For example:
> >
> >
> https://kafka.apache.org/21/javadoc/org/apache/kafka/common/errors/RetriableException.html
> >
> > I suggest RetriableResponse and NonRetriableResponse.
> >
> > Thanks,
> > Andrew
> >
> > > On 13 May 2024, at 23:17, Alieh Saeedi 
> > wrote:
> > >
> > > Hi all,
> > >
> > >
> > > Thanks for all the valid points you listed.
> > >
> > >
> > > KIP updates and addressing concerns:
> > >
> > >
> > > 1) The KIP now suggests two Response types: `RetryableResponse` and
> > > `NonRetryableResponse`
> > >
> > >
> > > 2) `custom.exception.handler` is changed to
> > `custom.exception.handler.class`
> > >
> > >
> > > 3) The KIP clarifies that `In the case of an implemented handler for
> the
> > > specified exception, the handler takes precedence.`
> > >
> > >
> > > 4)  There is now a `default` implementation for both handle() methods.
> > >
> > >
> > > 5)  @Chris: for `UnknownTopicOrPartition`, the default is already
> > retrying
> > > for 60s. (In fact, the default value of `max.block.ms`). If the
> handler
> > > instructs to FAIL or SWALLOW, there will be no retry, and if the
> handler
> > > instructs to RETRY, that will be the default behavior, which follows
> the
> > > values in already existing config parameters such as `max.block.ms`.
> > Does
> > > that make sense?
> > >
> > >
> > > Hope the changes and explanations are convincing :)
> > >
> > >
> > > Cheers,
> > >
> > > Alieh
> > >
> > > On Mon, May 13, 2024 at 6:40 PM Justine Olshan
> > 
> > > wrote:
> > >
> > >> Oh I see. The type isn't the error type but a newly defined type for
> the
> > >> response. Makes sense and works for me.
> > >>
> > >> Justine
> > >>
> > >> On Mon, May 13, 2024 at 9:13 AM Chris Egerton <
> fearthecel...@gmail.com>
> > >> wrote:
> > >>
> > >>> If we have dedicated methods for each kind of exception
> > >>> (handleRecordTooLarge, handleUnknownTopicOrPartition, etc.), doesn't
> > that
> > >>> provide sufficient constraint? I'm not suggesting we eliminate these
> > >>> methods, just that we change their return types to something more
> > >> flexible.
> > >>>
> > >>> On Mon, May 13, 2024, 12:07 Justine Olshan
> >  > >>>
> > >>> wrote:
> > >>>
> >  I'm not sure I agree with the Retriable and NonRetriableResponse
> > >> comment.
> >  This doesn't limit the blast radius or enforce certain errors are
> > used.
> >  I think we might disagree on how controlled these interfaces can
> be...
> > 
> >  Justine
> > 
> >  On Mon, May 13, 2024 at 8:40 AM Chris Egerton
>  > >>>
> >  wrote:
> > 
> > > Hi Alieh,
> > >
> > > Thanks for the updates! I just have a few more thoughts:
> > >
> > > - I don't think a boolean property is sufficient to dictate retries
> > >> for
> > > unknown topic partitions, though. These errors can occur if a topic
> > >> has
> > > just been created, which can occur if, for example, automatic topic
> > > creation is enabled for a multi-task connector. This is why I
> > >> proposed
> > >>> a
> > > timeout instead of a boolean (and see my previous email for why
> > >>> reducing
> > > max.block.ms for a producer is not a viable alternative). If it
> > >> helps,
> >  one
> > > way to reproduce this yourself is to add the line
> > > `fooProps.put(TASKS_MAX_CONFIG, "10");` to the integration test
> here:
> > >
> > >
> > 
> > >>>
> > >>
> >
> https://github.com/apache/kafka/blob/5439914c32fa00d634efa7219699f1bc21add839/connect/runtime/src/test/java/org/apache/kafka/connect/integration/SourceConnectorsIntegrationTest.java#L134
> > > and then check the logs afterward for messages like "Error while
> > >>> fetching
> > > metadata with correlation id  :
> >  {foo-topic=UNKNOWN_TOPIC_OR_PARTITION}".
> > >
> > > - I also don't think we need custom XxxResponse enums for every
> > >>> possible
> > > method; it seems like this will lead to a lot of duplication and
> >  cognitive
> > > overhead if we want to expand the error handler in the future.
> > >>> Something
> > > more 

Re: [VOTE] KIP-932: Queues for Kafka

2024-05-14 Thread Andrew Schofield
Hi,
I have made a small update to the KIP as a result of testing the new
share consumer with client telemetry (KIP-714).

I’ve added telemetry metric names to the table of client metrics and
also updated the metric group names so that the resulting client metrics
sent to the broker have consistent names.

Thanks,
Andrew

> On 8 May 2024, at 12:51, Manikumar  wrote:
>
> Hi Andrew,
>
> Thanks for the KIP.  Great write-up!
>
> +1 (binding)
>
> Thanks,
>
> On Wed, May 8, 2024 at 12:17 PM Satish Duggana  
> wrote:
>>
>> Hi Andrew,
>> Thanks for the nice KIP, it will allow other messaging use cases to be
>> onboarded to Kafka.
>>
>> +1 from me.
>>
>> Satish.
>>
>> On Tue, 7 May 2024 at 03:41, Jun Rao  wrote:
>>>
>>> Hi, Andrew,
>>>
>>> Thanks for the KIP. +1
>>>
>>> Jun
>>>
>>> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar 
>>> wrote:
>>>
 Thanks Andrew,

 +1 (binding)

 Edo

 On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
  wrote:
>
> Hi Andrew
>
> + 1 (Non-Binding)
>
> This will be great addition to Kafka
>
> On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal 
> wrote:
>
>> Hi Andrew,
>> Thanks for writing the KIP. This is indeed going to be a valuable
 addition
>> to the Kafka, excited to see the KIP.
>>
>> + 1 (Non-Binding)
>>
>> Regards,
>> Apoorv Mittal
>> +44 7721681581
>>
>>
>> On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>>
>>> Hi,
>>> I’ve been working to complete KIP-932 over the past few months and
>>> discussions have quietened down.
>>>
>>> I’d like to open the voting for KIP-932:
>>>
>>>
>>>
>>
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
>>>
>>> Thanks,
>>> Andrew
>>




[jira] [Created] (KAFKA-16761) ZkClusterInstance#start does not create ClusterConfigurableIntegrationHarness

2024-05-14 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16761:
--

 Summary: ZkClusterInstance#start does not create 
ClusterConfigurableIntegrationHarness
 Key: KAFKA-16761
 URL: https://issues.apache.org/jira/browse/KAFKA-16761
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


[https://github.com/apache/kafka/blob/440f5f6c09720bb9414524781342bbf35973c281/core/src/test/java/kafka/test/junit/ZkClusterInvocationContext.java#L103]

`ClusterConfigurableIntegrationHarness` is created only in "beforeEach" phase, 
and that makes `ZkClusterInstance#start` does not work as it could cause NPE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1038: Add Custom Error Handler to Producer

2024-05-14 Thread Andrew Schofield
Hi Alieh,
Thanks for the KIP.

+1 (non-binding)

Andrew

> On 7 May 2024, at 16:56, Alieh Saeedi  wrote:
>
> Hi all,
>
> It seems that we have no more comments, discussions, or feedback on
> KIP-1038; therefore, I’d like to open voting for the KIP: Add Custom Error
> Handler to Producer
> 
>
>
> Cheers,
> Alieh



Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-05-14 Thread Alieh Saeedi
Thanks Andrew. Done :)

@Chris: I changed the config parameter type from boolean to integer, which
defines the timeout for retrying. I thought reusing `max.block.ms` was not
reasonable as you mentioned.

So if the KIP looks good, let 's skip to the good part ;-) VOTING :)

Bests,
Alieh





On Tue, May 14, 2024 at 12:26 PM Andrew Schofield 
wrote:

> Hi Alieh,
> Just one final comment.
>
> [AJS5] Existing classes use Retriable, not Retryable. For example:
>
> https://kafka.apache.org/21/javadoc/org/apache/kafka/common/errors/RetriableException.html
>
> I suggest RetriableResponse and NonRetriableResponse.
>
> Thanks,
> Andrew
>
> > On 13 May 2024, at 23:17, Alieh Saeedi 
> wrote:
> >
> > Hi all,
> >
> >
> > Thanks for all the valid points you listed.
> >
> >
> > KIP updates and addressing concerns:
> >
> >
> > 1) The KIP now suggests two Response types: `RetryableResponse` and
> > `NonRetryableResponse`
> >
> >
> > 2) `custom.exception.handler` is changed to
> `custom.exception.handler.class`
> >
> >
> > 3) The KIP clarifies that `In the case of an implemented handler for the
> > specified exception, the handler takes precedence.`
> >
> >
> > 4)  There is now a `default` implementation for both handle() methods.
> >
> >
> > 5)  @Chris: for `UnknownTopicOrPartition`, the default is already
> retrying
> > for 60s. (In fact, the default value of `max.block.ms`). If the handler
> > instructs to FAIL or SWALLOW, there will be no retry, and if the handler
> > instructs to RETRY, that will be the default behavior, which follows the
> > values in already existing config parameters such as `max.block.ms`.
> Does
> > that make sense?
> >
> >
> > Hope the changes and explanations are convincing :)
> >
> >
> > Cheers,
> >
> > Alieh
> >
> > On Mon, May 13, 2024 at 6:40 PM Justine Olshan
> 
> > wrote:
> >
> >> Oh I see. The type isn't the error type but a newly defined type for the
> >> response. Makes sense and works for me.
> >>
> >> Justine
> >>
> >> On Mon, May 13, 2024 at 9:13 AM Chris Egerton 
> >> wrote:
> >>
> >>> If we have dedicated methods for each kind of exception
> >>> (handleRecordTooLarge, handleUnknownTopicOrPartition, etc.), doesn't
> that
> >>> provide sufficient constraint? I'm not suggesting we eliminate these
> >>> methods, just that we change their return types to something more
> >> flexible.
> >>>
> >>> On Mon, May 13, 2024, 12:07 Justine Olshan
>  >>>
> >>> wrote:
> >>>
>  I'm not sure I agree with the Retriable and NonRetriableResponse
> >> comment.
>  This doesn't limit the blast radius or enforce certain errors are
> used.
>  I think we might disagree on how controlled these interfaces can be...
> 
>  Justine
> 
>  On Mon, May 13, 2024 at 8:40 AM Chris Egerton  >>>
>  wrote:
> 
> > Hi Alieh,
> >
> > Thanks for the updates! I just have a few more thoughts:
> >
> > - I don't think a boolean property is sufficient to dictate retries
> >> for
> > unknown topic partitions, though. These errors can occur if a topic
> >> has
> > just been created, which can occur if, for example, automatic topic
> > creation is enabled for a multi-task connector. This is why I
> >> proposed
> >>> a
> > timeout instead of a boolean (and see my previous email for why
> >>> reducing
> > max.block.ms for a producer is not a viable alternative). If it
> >> helps,
>  one
> > way to reproduce this yourself is to add the line
> > `fooProps.put(TASKS_MAX_CONFIG, "10");` to the integration test here:
> >
> >
> 
> >>>
> >>
> https://github.com/apache/kafka/blob/5439914c32fa00d634efa7219699f1bc21add839/connect/runtime/src/test/java/org/apache/kafka/connect/integration/SourceConnectorsIntegrationTest.java#L134
> > and then check the logs afterward for messages like "Error while
> >>> fetching
> > metadata with correlation id  :
>  {foo-topic=UNKNOWN_TOPIC_OR_PARTITION}".
> >
> > - I also don't think we need custom XxxResponse enums for every
> >>> possible
> > method; it seems like this will lead to a lot of duplication and
>  cognitive
> > overhead if we want to expand the error handler in the future.
> >>> Something
> > more flexible like RetriableResponse and NonRetriableResponse could
> > suffice.
> >
> > - Finally, the KIP still doesn't state how the handler will or won't
> >>> take
> > precedence over existing retry properties. If I set `retries` or `
> > delivery.timeout.ms` or `max.block.ms` to low values, will that
> >> cause
> > retries to cease even if my custom handler would otherwise keep
> >>> returning
> > RETRY for an error?
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, May 13, 2024 at 11:02 AM Andrew Schofield <
> > andrew_schofi...@live.com>
> > wrote:
> >
> >> Hi Alieh,
> >> Just a few more comments on the KIP. It is looking much less risky
> >>> now
> > the
> >> scope
> >> is tighter.
> 

Re: [VOTE] KIP-1036: Extend RecordDeserializationException exception

2024-05-14 Thread Lucas Brutschy
Thanks for the KIP!

+1 binding

On Tue, May 14, 2024 at 2:44 PM Frédérik Rouleau
 wrote:
>
> Hi all,
>
> I will keep the vote open for a few more hours as I would like to propose
> the KIP for the next 3.8.0 release (deadline is the 15th May).
> Currently we have +4 binding and +3 non-binding.
> Thanks,


Re: [VOTE] KIP-1036: Extend RecordDeserializationException exception

2024-05-14 Thread Frédérik Rouleau
Hi all,

I will keep the vote open for a few more hours as I would like to propose
the KIP for the next 3.8.0 release (deadline is the 15th May).
Currently we have +4 binding and +3 non-binding.
Thanks,


Re: [VOTE] KIP-1036: Extend RecordDeserializationException exception

2024-05-14 Thread Bruno Cadonna

Thanks!

+1 (binding)

Best,
Bruno

On 5/13/24 8:38 PM, Kirk True wrote:

+1 (non-binding)

Thanks Fred!


On May 13, 2024, at 5:46 AM, Bill Bejeck  wrote:

Thanks for the KIP!

+1 (binding)

-Bill


On Tue, May 7, 2024 at 6:16 PM Sophie Blee-Goldman 
wrote:


+1 (binding)

thanks for the KIP!

On Fri, May 3, 2024 at 9:13 AM Matthias J. Sax  wrote:


+1 (binding)

On 5/3/24 8:52 AM, Federico Valeri wrote:

Hi Fred, this is a useful addition.

+1 non binding

Thanks

On Fri, May 3, 2024 at 4:11 PM Andrew Schofield
 wrote:


Hi Fred,
Thanks for the KIP. It’s turned out nice and elegant I think.

Definitely a worthwhile improvement.


+1 (non-binding)

Thanks,
Andrew


On 30 Apr 2024, at 14:02, Frédérik Rouleau

 wrote:


Hi all,

As there is no more activity for a while on the discuss thread, I

think we

can start a vote.
The KIP is available on




https://cwiki.apache.org/confluence/display/KAFKA/KIP-1036%3A+Extend+RecordDeserializationException+exception



If you have some feedback or suggestions, please participate to the
discussion thread:
https://lists.apache.org/thread/or85okygtfywvnsfd37kwykkq5jq7fy5

Best regards,
Fred










Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-05-14 Thread Andrew Schofield
Hi Alieh,
Just one final comment.

[AJS5] Existing classes use Retriable, not Retryable. For example:
https://kafka.apache.org/21/javadoc/org/apache/kafka/common/errors/RetriableException.html

I suggest RetriableResponse and NonRetriableResponse.

Thanks,
Andrew

> On 13 May 2024, at 23:17, Alieh Saeedi  wrote:
>
> Hi all,
>
>
> Thanks for all the valid points you listed.
>
>
> KIP updates and addressing concerns:
>
>
> 1) The KIP now suggests two Response types: `RetryableResponse` and
> `NonRetryableResponse`
>
>
> 2) `custom.exception.handler` is changed to `custom.exception.handler.class`
>
>
> 3) The KIP clarifies that `In the case of an implemented handler for the
> specified exception, the handler takes precedence.`
>
>
> 4)  There is now a `default` implementation for both handle() methods.
>
>
> 5)  @Chris: for `UnknownTopicOrPartition`, the default is already retrying
> for 60s. (In fact, the default value of `max.block.ms`). If the handler
> instructs to FAIL or SWALLOW, there will be no retry, and if the handler
> instructs to RETRY, that will be the default behavior, which follows the
> values in already existing config parameters such as `max.block.ms`. Does
> that make sense?
>
>
> Hope the changes and explanations are convincing :)
>
>
> Cheers,
>
> Alieh
>
> On Mon, May 13, 2024 at 6:40 PM Justine Olshan 
> wrote:
>
>> Oh I see. The type isn't the error type but a newly defined type for the
>> response. Makes sense and works for me.
>>
>> Justine
>>
>> On Mon, May 13, 2024 at 9:13 AM Chris Egerton 
>> wrote:
>>
>>> If we have dedicated methods for each kind of exception
>>> (handleRecordTooLarge, handleUnknownTopicOrPartition, etc.), doesn't that
>>> provide sufficient constraint? I'm not suggesting we eliminate these
>>> methods, just that we change their return types to something more
>> flexible.
>>>
>>> On Mon, May 13, 2024, 12:07 Justine Olshan >>
>>> wrote:
>>>
 I'm not sure I agree with the Retriable and NonRetriableResponse
>> comment.
 This doesn't limit the blast radius or enforce certain errors are used.
 I think we might disagree on how controlled these interfaces can be...

 Justine

 On Mon, May 13, 2024 at 8:40 AM Chris Egerton >>
 wrote:

> Hi Alieh,
>
> Thanks for the updates! I just have a few more thoughts:
>
> - I don't think a boolean property is sufficient to dictate retries
>> for
> unknown topic partitions, though. These errors can occur if a topic
>> has
> just been created, which can occur if, for example, automatic topic
> creation is enabled for a multi-task connector. This is why I
>> proposed
>>> a
> timeout instead of a boolean (and see my previous email for why
>>> reducing
> max.block.ms for a producer is not a viable alternative). If it
>> helps,
 one
> way to reproduce this yourself is to add the line
> `fooProps.put(TASKS_MAX_CONFIG, "10");` to the integration test here:
>
>

>>>
>> https://github.com/apache/kafka/blob/5439914c32fa00d634efa7219699f1bc21add839/connect/runtime/src/test/java/org/apache/kafka/connect/integration/SourceConnectorsIntegrationTest.java#L134
> and then check the logs afterward for messages like "Error while
>>> fetching
> metadata with correlation id  :
 {foo-topic=UNKNOWN_TOPIC_OR_PARTITION}".
>
> - I also don't think we need custom XxxResponse enums for every
>>> possible
> method; it seems like this will lead to a lot of duplication and
 cognitive
> overhead if we want to expand the error handler in the future.
>>> Something
> more flexible like RetriableResponse and NonRetriableResponse could
> suffice.
>
> - Finally, the KIP still doesn't state how the handler will or won't
>>> take
> precedence over existing retry properties. If I set `retries` or `
> delivery.timeout.ms` or `max.block.ms` to low values, will that
>> cause
> retries to cease even if my custom handler would otherwise keep
>>> returning
> RETRY for an error?
>
> Cheers,
>
> Chris
>
> On Mon, May 13, 2024 at 11:02 AM Andrew Schofield <
> andrew_schofi...@live.com>
> wrote:
>
>> Hi Alieh,
>> Just a few more comments on the KIP. It is looking much less risky
>>> now
> the
>> scope
>> is tighter.
>>
>> [AJS1] It would be nice to have default implementations of the
>> handle
>> methods
>> so an implementor would not need to implement both themselves.
>>
>> [AJS2] Producer configurations which are class names usually end in
>> “.class”.
>> I suggest “custom.exception.handler.class”.
>>
>> [AJS3] If I implemented a handler, and I set a non-default value
>> for
 one
>> of the
>> new configuations, what happens? I would expect that the handler
>>> takes
>> precedence. I wasn’t quite clear what “the control will follow the
> handler
>> instructions” meant.
>>
>> [AJS4] Because you 

[jira] [Created] (KAFKA-16760) alterReplicaLogDirs failed even if responded with none error

2024-05-14 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16760:
-

 Summary: alterReplicaLogDirs failed even if responded with none 
error
 Key: KAFKA-16760
 URL: https://issues.apache.org/jira/browse/KAFKA-16760
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


When firing alterLogDirRequest, it gets error NONE result. But actually, the 
alterLogDir never happened with these errors:
{code:java}
[2024-05-14 16:48:50,796] INFO [ReplicaAlterLogDirsThread-1]: Partition 
topicB-0 has an older epoch (0) than the current leader. Will await the new 
LeaderAndIsr state before resuming fetching. 
(kafka.server.ReplicaAlterLogDirsThread:66)
[2024-05-14 16:48:50,796] WARN [ReplicaAlterLogDirsThread-1]: Partition 
topicB-0 marked as failed (kafka.server.ReplicaAlterLogDirsThread:70)
{code}
Note: It's under KRaft mode. This can be reproduced in this 
[branch|https://github.com/showuon/kafka/tree/alterLogDirTest] and running this 
test:


{code:java}
./gradlew cleanTest storage:test --tests 
org.apache.kafka.tiered.storage.integration.AlterLogDirTest
{code}

The complete logs can be found here: 
https://gist.github.com/showuon/b16cdb05a125a7c445cc6e377a2b7923



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16759) Invalid client telemetry transition on consumer close

2024-05-14 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16759:


 Summary: Invalid client telemetry transition on consumer close
 Key: KAFKA-16759
 URL: https://issues.apache.org/jira/browse/KAFKA-16759
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Andrew Schofield
Assignee: Andrew Schofield
 Fix For: 3.8.0


Using the console consumer with client telemetry enabled, I hit an invalid 
state transition when closing the consumer with CTRL-C. The consumer sends a 
final "terminating" telemetry push which puts the client telemetry reporter 
into TERMINATING_PUSH_IN_PROGRESS state. When it receives a response in this 
state, it attempts an invalid state transition.

 
{noformat}
[2024-05-13 19:19:35,804] WARN Error updating client telemetry state, disabled 
telemetry (org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter)
java.lang.IllegalStateException: Invalid telemetry state transition from 
TERMINATING_PUSH_IN_PROGRESS to PUSH_NEEDED; the valid telemetry state 
transitions from TERMINATING_PUSH_IN_PROGRESS are: TERMINATED
at 
org.apache.kafka.common.telemetry.ClientTelemetryState.validateTransition(ClientTelemetryState.java:163)
at 
org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.maybeSetState(ClientTelemetryReporter.java:827)
at 
org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.handleResponse(ClientTelemetryReporter.java:520)
at 
org.apache.kafka.clients.NetworkClient$TelemetrySender.handleResponse(NetworkClient.java:1321)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:948)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:594)
at 
org.apache.kafka.clients.consumer.internals.NetworkClientDelegate.poll(NetworkClientDelegate.java:130)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.sendUnsentRequests(ConsumerNetworkThread.java:262)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:275)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:95)
[2024-05-13 19:19:35,805] WARN Unable to transition state after successful push 
telemetry from state TERMINATING_PUSH_IN_PROGRESS 
(org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16694) Remove rack aware code in assignors temporarily due to performance

2024-05-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16694.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Remove rack aware code in assignors temporarily due to performance
> --
>
> Key: KAFKA-16694
> URL: https://issues.apache.org/jira/browse/KAFKA-16694
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Minor
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)