Re: [VOTE] KIP-1038: Add Custom Error Handler to Producer

2024-05-20 Thread Kirk True
+1 (non-binding)

Thanks Alieh!

> On May 20, 2024, at 6:00 AM, Walker Carlson  
> wrote:
> 
> Hey Alieh,
> 
> Thanks for the KIP.
> 
> +1 binding
> 
> Walker
> 
> On Tue, May 7, 2024 at 10:57 AM Alieh Saeedi 
> wrote:
> 
>> Hi all,
>> 
>> It seems that we have no more comments, discussions, or feedback on
>> KIP-1038; therefore, I’d like to open voting for the KIP: Add Custom Error
>> Handler to Producer
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1038%3A+Add+Custom+Error+Handler+to+Producer
>>> 
>> 
>> 
>> Cheers,
>> Alieh
>> 



[jira] [Created] (KAFKA-16787) Remove TRACE level logging from AsyncKafkaConsumer hot path

2024-05-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16787:
-

 Summary: Remove TRACE level logging from AsyncKafkaConsumer hot 
path
 Key: KAFKA-16787
 URL: https://issues.apache.org/jira/browse/KAFKA-16787
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, logging
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1036: Extend RecordDeserializationException exception

2024-05-13 Thread Kirk True
+1 (non-binding)

Thanks Fred!

> On May 13, 2024, at 5:46 AM, Bill Bejeck  wrote:
> 
> Thanks for the KIP!
> 
> +1 (binding)
> 
> -Bill
> 
> 
> On Tue, May 7, 2024 at 6:16 PM Sophie Blee-Goldman 
> wrote:
> 
>> +1 (binding)
>> 
>> thanks for the KIP!
>> 
>> On Fri, May 3, 2024 at 9:13 AM Matthias J. Sax  wrote:
>> 
>>> +1 (binding)
>>> 
>>> On 5/3/24 8:52 AM, Federico Valeri wrote:
 Hi Fred, this is a useful addition.
 
 +1 non binding
 
 Thanks
 
 On Fri, May 3, 2024 at 4:11 PM Andrew Schofield
  wrote:
> 
> Hi Fred,
> Thanks for the KIP. It’s turned out nice and elegant I think.
>>> Definitely a worthwhile improvement.
> 
> +1 (non-binding)
> 
> Thanks,
> Andrew
> 
>> On 30 Apr 2024, at 14:02, Frédérik Rouleau
>>>  wrote:
>> 
>> Hi all,
>> 
>> As there is no more activity for a while on the discuss thread, I
>>> think we
>> can start a vote.
>> The KIP is available on
>> 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1036%3A+Extend+RecordDeserializationException+exception
>> 
>> 
>> If you have some feedback or suggestions, please participate to the
>> discussion thread:
>> https://lists.apache.org/thread/or85okygtfywvnsfd37kwykkq5jq7fy5
>> 
>> Best regards,
>> Fred
> 
>>> 
>> 



[jira] [Created] (KAFKA-16642) Update KafkaConsumerTest to show parameters in test lists

2024-04-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16642:
-

 Summary: Update KafkaConsumerTest to show parameters in test lists
 Key: KAFKA-16642
 URL: https://issues.apache.org/jira/browse/KAFKA-16642
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


{{KafkaConsumerTest}} was recently updated to make many of its tests 
parameterized to exercise both the {{CLASSIC}} and {{CONSUMER}} group 
protocols. However, in some of the tools in which [lists of tests are 
provided|https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=org.apache.kafka.clients.consumer.KafkaConsumerTest=FLAKY],
 say, for analysis, the group protocol information is not exposed. For example, 
one test ({{{}testReturnRecordsDuringRebalance{}}}) is flaky, but it's 
difficult to know at a glance which group protocol is causing the problem 
because the list simply shows:
{quote}{{testReturnRecordsDuringRebalance(GroupProtocol)[1]}}
{quote}
Ideally, it would expose more information, such as:
{quote}{{testReturnRecordsDuringRebalance(GroupProtocol=CONSUMER)}}
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16460) New consumer times out consuming records in consumer_test.py system test

2024-04-29 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16460.
---
Resolution: Duplicate

> New consumer times out consuming records in consumer_test.py system test
> 
>
> Key: KAFKA-16460
> URL: https://issues.apache.org/jira/browse/KAFKA-16460
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>    Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{consumer_test.py}} system test fails with the following errors:
> {quote}
> * Timed out waiting for consumption
> {quote}
> Affected tests:
> * {{test_broker_failure}}
> * {{test_consumer_bounce}}
> * {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16609) Update parse_describe_topic to support new topic describe output

2024-04-26 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16609.
---
  Reviewer: Lucas Brutschy
Resolution: Fixed

> Update parse_describe_topic to support new topic describe output
> 
>
> Key: KAFKA-16609
> URL: https://issues.apache.org/jira/browse/KAFKA-16609
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, system tests
>Affects Versions: 3.8.0
>    Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: system-test-failure
> Fix For: 3.8.0
>
>
> It appears that recent changes to the describe topic output has broken the 
> system test's ability to parse the output.
> {noformat}
> test_id:
> kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=False.reassign_from_offset_zero=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   50.333 seconds
> IndexError('list index out of range')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
>  line 175, in test_reassign_partitions
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.reassign_partitions(bounce_brokers))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 105, in run_produce_consume_validate
> core_test_action(*args)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
>  line 175, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.reassign_partitions(bounce_brokers))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
>  line 82, in reassign_partitions
> partition_info = 
> self.kafka.parse_describe_topic(self.kafka.describe_topic(self.topic))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 1400, in parse_describe_topic
> fields = list(map(lambda x: x.split(" ")[1], fields))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 1400, in 
> fields = list(map(lambda x: x.split(" ")[1], fields))
> IndexError: list index out of range
> {noformat} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-04-25 Thread Kirk True
Hi Alieh,

Thanks for the updates!

Comments inline...


> On Apr 25, 2024, at 1:10 PM, Alieh Saeedi  
> wrote:
> 
> Hi all,
> 
> Thanks a lot for the constructive feedbacks!
> 
> 
> 
> Addressing some of the main concerns:
> 
> 
> - The `RecordTooLargeException` can be thrown by broker, producer and
> consumer. Of course, the `ProducerExceptionHandler` interface is introduced
> to affect only the exceptions thrown from the producer. This KIP very
> specifically means to provide a possibility to manage the
> `RecordTooLargeException` thrown from the Producer.send() method. Please
> see “Proposed Changes” section for more clarity. I investigated the issue
> there thoroughly. I hope it can explain the concern about how we handle the
> errors as well.
> 
> 
> 
> - The problem with Callback: Methods of Callback are called when the record
> sent to the server is acknowledged, while this is not the desired time for
> all exceptions. We intend to handle exceptions beforehand.

I guess it makes sense to keep the expectation for when Callback is invoked 
as-is vs. shoehorning more into it.

> - What if the custom handler returns RETRY for `RecordTooLargeException`? I
> assume changing the producer configuration at runtime is possible. If so,
> RETRY for a too large record is valid because maybe in the next try, the
> too large record is not poisoning any more. I am not 100% sure about the
> technical details, though. Otherwise, we can consider the RETRY as FAIL for
> this exception. Another solution would be to consider a constant number of
> times for RETRY which can be useful for other exceptions as well.

It’s not presently possible to change the configuration of an existing Producer 
at runtime. So if a record hits a RecordTooLargeException once, no amount of 
retrying (with the current Producer) will change that fact. So I’m still a 
little stuck on how to handle a response of RETRY for an “oversized” record. 

> - What if the handle() method itself throws an exception? I think
> rationally and pragmatically, the behaviour must be exactly like when no
> custom handler is defined since the user actually did not have a working
> handler.

I’m not convinced that ignoring an errant handler is the right choice. It then 
becomes a silent failure that might have repercussions, depending on the 
business logic. A user would have to proactively trawls through the logs for 
WARN/ERROR messages to catch it.

Throwing a hard error is pretty draconian, though…

> - Why not use config parameters instead of an interface? As explained in
> the “Rejected Alternatives” section, we assume that the handler will be
> used for a greater number of exceptions in the future. Defining a
> configuration parameter for each exception may make the configuration a bit
> messy. Moreover, the handler offers more flexibility.

Agreed that the logic-via-configuration approach is weird and limiting. Forget 
I ever suggested it ;)

I’d think additional background in the Motivation section would help me 
understand how users might use this feature beyond a) skipping “oversized” 
records, and b) not retrying missing topics. 

> Small change:
> 
> -ProductionExceptionHandlerResponse -> Response for brevity and simplicity.
> Could’ve been HandlerResponse too I think!

The name change sounds good to me.

Thanks Alieh!

> 
> 
> I thank you all again for your useful questions/suggestions.
> 
> I would be happy to hear more of your concerns, as stated in some feedback.
> 
> Cheers,
> Alieh
> 
> On Wed, Apr 24, 2024 at 12:31 AM Justine Olshan
>  wrote:
> 
>> Thanks Alieh for the updates.
>> 
>> I'm a little concerned about the design pattern here. It seems like we want
>> specific usages, but we are packaging it as a generic handler.
>> I think we tried to narrow down on the specific errors we want to handle,
>> but it feels a little clunky as we have a generic thing for two specific
>> errors.
>> 
>> I'm wondering if we are using the right patterns to solve these problems. I
>> agree though that we will need something more than the error classes I'm
>> proposing if we want to have different handling be configurable.
>> My concern is that the open-endedness of a handler means that we are
>> creating more problems than we are solving. It is still unclear to me how
>> we expect to handle the errors. Perhaps we could include an example? It
>> seems like there is a specific use case in mind and maybe we can make a
>> design that is tighter and supports that case.
>> 
>> Justine
>> 
>> On Tue, Apr 23, 2024 at 3:06 PM Kirk True  wrote:
>> 
>>> Hi Alieh,
>>> 
>>> Thanks for the KIP!
>>> 
>>> A few questions:
>>>

[jira] [Created] (KAFKA-16623) KafkaAsyncConsumer system tests warn about revoking partitions that weren't previously assigned

2024-04-25 Thread Kirk True (Jira)
Kirk True created KAFKA-16623:
-

 Summary: KafkaAsyncConsumer system tests warn about revoking 
partitions that weren't previously assigned
 Key: KAFKA-16623
 URL: https://issues.apache.org/jira/browse/KAFKA-16623
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


When running system tests for the KafkaAsyncConsumer, we occasionally see this 
warning:

{noformat}
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
 line 38, in _protected_worker
self._worker(idx, node)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/verifiable_consumer.py",
 line 304, in _worker
handler.handle_partitions_revoked(event, node, self.logger)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/verifiable_consumer.py",
 line 163, in handle_partitions_revoked
(tp, node.account.hostname)
AssertionError: Topic partition TopicPartition(topic='test_topic', partition=0) 
cannot be revoked from worker20 as it was not previously assigned to that 
consumer
{noformat}

It is unclear what is causing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16609) Update parse_describe_topic to support new topic describe output

2024-04-23 Thread Kirk True (Jira)
Kirk True created KAFKA-16609:
-

 Summary: Update parse_describe_topic to support new topic describe 
output
 Key: KAFKA-16609
 URL: https://issues.apache.org/jira/browse/KAFKA-16609
 Project: Kafka
  Issue Type: Bug
  Components: admin, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


It appears that recent changes to the describe topic output has broken the 
system test's ability to parse the output.

{noformat}
test_id:
kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=False.reassign_from_offset_zero=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   50.333 seconds


IndexError('list index out of range')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 175, in test_reassign_partitions
self.run_produce_consume_validate(core_test_action=lambda: 
self.reassign_partitions(bounce_brokers))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 105, in run_produce_consume_validate
core_test_action(*args)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 175, in 
self.run_produce_consume_validate(core_test_action=lambda: 
self.reassign_partitions(bounce_brokers))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 82, in reassign_partitions
partition_info = 
self.kafka.parse_describe_topic(self.kafka.describe_topic(self.topic))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
 line 1400, in parse_describe_topic
fields = list(map(lambda x: x.split(" ")[1], fields))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
 line 1400, in 
fields = list(map(lambda x: x.split(" ")[1], fields))
IndexError: list index out of range
{noformat} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-04-23 Thread Kirk True
Hi Alieh,

Thanks for the KIP!

A few questions:

K1. What is the expected behavior for the producer if it generates a 
RecordTooLargeException, but the handler returns RETRY?
K2. How do we determine which Record was responsible for the 
UnknownTopicOrPartitionException since we get that response when sending  a 
batch of records?
K3. What is the expected behavior if the handle() method itself throws an error?
K4. What is the downside of adding an onError() method to the Producer’s 
Callback interface vs. a new mechanism?
K5. Can we change “ProducerExceptionHandlerResponse" to just “Response” given 
that it’s an inner enum?
K6. Any recommendation for callback authors to handle different behavior for 
different topics?

I’ll echo what others have said, it would help me understand why we want 
another handler class if there were more examples in the Motivation section. As 
it stands now, I agree with Chris that the stated issues could be solved by 
adding two new configuration options:

oversized.record.behavior=fail
retry.on.unknown.topic.or.partition=true

What I’m not yet able to wrap my head around is: what exactly would the logic 
in the handler be? I’m not very imaginative, so I’m assuming they’d mostly be 
if-this-then-that. However, if they’re more complicated, I’d have other 
concerns.

Thanks,
Kirk

> On Apr 22, 2024, at 7:38 AM, Alieh Saeedi  
> wrote:
> 
> Thank you all for the feedback!
> 
> Addressing the main concern: The KIP is about giving the user the ability
> to handle producer exceptions, but to be more conservative and avoid future
> issues, we decided to be limited to a short list of exceptions. I included
> *RecordTooLargeExceptin* and *UnknownTopicOrPartitionException. *Open to
> suggestion for adding some more ;-)
> 
> KIP Updates:
> - clarified the way that the user should configure the Producer to use the
> custom handler. I think adding a producer config property is the cleanest
> one.
> - changed the *ClientExceptionHandler* to *ProducerExceptionHandler* to be
> closer to what we are changing.
> - added the ProducerRecord as the input parameter of the handle() method as
> well.
> - increased the response types to 3 to have fail and two types of continue.
> - The default behaviour is having no custom handler, having the
> corresponding config parameter set to null. Therefore, the KIP provides no
> default implementation of the interface.
> - We follow the interface solution as described in the
> Rejected Alternetives section.
> 
> 
> Cheers,
> Alieh
> 
> 
> On Thu, Apr 18, 2024 at 8:11 PM Matthias J. Sax  wrote:
> 
>> Thanks for the KIP Alieh! It addresses an important case for error
>> handling.
>> 
>> I agree that using this handler would be an expert API, as mentioned by
>> a few people. But I don't think it would be a reason to not add it. It's
>> always a tricky tradeoff what to expose to users and to avoid foot guns,
>> but we added similar handlers to Kafka Streams, and have good experience
>> with it. Hence, I understand, but don't share the concern raised.
>> 
>> I also agree that there is some responsibility by the user to understand
>> how such a handler should be implemented to not drop data by accident.
>> But it seem unavoidable and acceptable.
>> 
>> While I understand that a "simpler / reduced" API (eg via configs) might
>> also work, I personally prefer a full handler. Configs have the same
>> issue that they could be miss-used potentially leading to incorrectly
>> dropped data, but at the same time are less flexible (and thus maybe
>> ever harder to use correctly...?). Base on my experience, there is also
>> often weird corner case for which it make sense to also drop records for
>> other exceptions, and a full handler has the advantage of full
>> flexibility and "absolute power!".
>> 
>> To be fair: I don't know the exact code paths of the producer in
>> details, so please keep me honest. But my understanding is, that the KIP
>> aims to allow users to react to internal exception, and decide to keep
>> retrying internally, swallow the error and drop the record, or raise the
>> error?
>> 
>> Maybe the KIP would need to be a little bit more precises what error we
>> want to cover -- I don't think this list must be exhaustive, as we can
>> always do follow up KIP to also apply the handler to other errors to
>> expand the scope of the handler. The KIP does mention examples, but it
>> might be good to explicitly state for what cases the handler gets applied?
>> 
>> I am also not sure if CONTINUE and FAIL are enough options? Don't we
>> need three options? Or would `CONTINUE` have different meaning depending
>> on the type of error? Ie, for a retryable error `CONTINUE` would mean
>> keep retrying internally, but for a non-retryable error `CONTINUE` means
>> swallow the error and drop the record? This semantic overload seems
>> tricky to reason about by users, so it might better to split `CONTINUE`
>> into two cases -> `RETRY` and `SWALLOW` (or some better 

Re: [DISCUSS] KIP-1036: Extend RecordDeserializationException exception

2024-04-23 Thread Kirk True
Hi Fred,

Thanks for the updates!

Questions:

K11. Can we reconsider the introduction of two new exception subclasses? 
Perhaps I don’t understand the benefit? Technically both the key and the value 
could have deserialization errors, right?

K12. Is there a benefit to exposing both the ByteBuffer and byte[]?

K13. Isn’t it possible for the key() and/or value() calls to throw NPE? (Or 
maybe I don’t understand why we need the old constructor :(

Thanks,
Kirk

> On Apr 23, 2024, at 12:45 AM, Frédérik Rouleau 
>  wrote:
> 
> Hi Andrew,
> 
> A1. I will change the order of arguments to match.
> A2 and A3, Yes the KIP is not updated yet as I do not have a wiki account.
> So I must rely on some help to do those changes, which add some delay. I
> will try to find someone available ASAP.
> A4. I had the same thought. Using keyBuffer and valueBuffer for the
> constructor seems fine for me. If no one disagrees, I will do that change.
> 
> Thanks,
> Fred



[jira] [Resolved] (KAFKA-16462) New consumer fails with timeout in security_test.py system test

2024-04-23 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16462.
---
Resolution: Duplicate

> New consumer fails with timeout in security_test.py system test
> ---
>
> Key: KAFKA-16462
> URL: https://issues.apache.org/jira/browse/KAFKA-16462
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>    Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{security_test.py}} system test fails with the following error:
> {noformat}
> test_id:
> kafkatest.tests.core.security_test.SecurityTest.test_client_ssl_endpoint_validation_failure.security_protocol=SSL.interbroker_security_protocol=PLAINTEXT.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   1 minute 30.885 seconds
> TimeoutError('')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/security_test.py",
>  line 142, in test_client_ssl_endpoint_validation_failure
> wait_until(lambda: self.producer_consumer_have_expected_error(error), 
> timeout_sec=30)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
>  line 58, in wait_until
> raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
> last_exception
> ducktape.errors.TimeoutError
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16464) New consumer fails with timeout in replication_replica_failure_test.py system test

2024-04-23 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16464.
---
Resolution: Duplicate

> New consumer fails with timeout in replication_replica_failure_test.py system 
> test
> --
>
> Key: KAFKA-16464
> URL: https://issues.apache.org/jira/browse/KAFKA-16464
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{replication_replica_failure_test.py}} system test fails with the 
> following error:
> {noformat}
> test_id:
> kafkatest.tests.core.replication_replica_failure_test.ReplicationReplicaFailureTest.test_replication_with_replica_failure.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   1 minute 20.972 seconds
> TimeoutError('Timed out after 30s while awaiting initial record delivery 
> of 5 records')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/replication_replica_failure_test.py",
>  line 97, in test_replication_with_replica_failure
> self.await_startup()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/end_to_end.py",
>  line 125, in await_startup
> (timeout_sec, min_records))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
>  line 58, in wait_until
> raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
> last_exception
> ducktape.errors.TimeoutError: Timed out after 30s while awaiting initial 
> record delivery of 5 records
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16579) Revert changes to consumer_rolling_upgrade_test.py for the new async Consumer

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16579:
-

 Summary: Revert changes to consumer_rolling_upgrade_test.py for 
the new async Consumer
 Key: KAFKA-16579
 URL: https://issues.apache.org/jira/browse/KAFKA-16579
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated a 
slew of system tests to run both the "old" and "new" implementations. 
KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it 
could test the new consumer with Connect. However, we are not supporting 
Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the Connect 
system tests with the new {{{}AsyncKafkaConsumer{}}}, we get errors like the 
following:
{code:java}
test_id:
kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   6 minutes 3.899 seconds


InsufficientResourcesError('Not enough nodes available to allocate. linux 
nodes requested: 1. linux nodes available: 0')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
 line 919, in test_exactly_once_source
consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, 
self.source.topic, consumer_timeout_ms=1000, print_key=True)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py",
 line 97, in __init__
BackgroundThreadService.__init__(self, context, num_nodes)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
 line 26, in __init__
super(BackgroundThreadService, self).__init__(context, num_nodes, 
cluster_spec, *args, **kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 107, in __init__
self.allocate_nodes()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 217, in allocate_nodes
self.nodes = self.cluster.alloc(self.cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py",
 line 54, in alloc
allocated = self.do_alloc(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py",
 line 31, in do_alloc
allocated = self._available_nodes.remove_spec(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py",
 line 117, in remove_spec
raise InsufficientResourcesError("Not enough nodes available to allocate. " 
+ msg)
ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes 
available to allocate. linux nodes requested: 1. linux nodes available: 0
{code}
The task here is to revert the changes made in KAFKA-16272 [PR 
15576|https://github.com/apache/kafka/pull/15576].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16578:
-

 Summary: Revert changes to connect_distributed_test.py for the new 
async Consumer
 Key: KAFKA-16578
 URL: https://issues.apache.org/jira/browse/KAFKA-16578
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated a 
slew of system tests to run both the "old" and "new" implementations. 
KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it 
could test the new consumer with Connect. However, we are not supporting 
Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the Connect 
system tests with the new {{{}AsyncKafkaConsumer{}}}, we get errors like the 
following:
{code:java}
test_id:
kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   6 minutes 3.899 seconds


InsufficientResourcesError('Not enough nodes available to allocate. linux 
nodes requested: 1. linux nodes available: 0')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
 line 919, in test_exactly_once_source
consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, 
self.source.topic, consumer_timeout_ms=1000, print_key=True)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py",
 line 97, in __init__
BackgroundThreadService.__init__(self, context, num_nodes)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
 line 26, in __init__
super(BackgroundThreadService, self).__init__(context, num_nodes, 
cluster_spec, *args, **kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 107, in __init__
self.allocate_nodes()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 217, in allocate_nodes
self.nodes = self.cluster.alloc(self.cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py",
 line 54, in alloc
allocated = self.do_alloc(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py",
 line 31, in do_alloc
allocated = self._available_nodes.remove_spec(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py",
 line 117, in remove_spec
raise InsufficientResourcesError("Not enough nodes available to allocate. " 
+ msg)
ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes 
available to allocate. linux nodes requested: 1. linux nodes available: 0
{code}
The task here is to revert the changes made in KAFKA-16272 [PR 
15576|https://github.com/apache/kafka/pull/15576].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16577) New consumer fails with stop timeout in consumer_test.py’s test_consumer_bounce system test

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16577:
-

 Summary: New consumer fails with stop timeout in 
consumer_test.py’s test_consumer_bounce system test
 Key: KAFKA-16577
 URL: https://issues.apache.org/jira/browse/KAFKA-16577
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test intermittently fails with the following 
error:

{code}
test_id:
kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_failure.clean_shutdown=True.enable_autocommit=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   42.582 seconds


AssertionError()
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/consumer_test.py",
 line 399, in test_consumer_failure
assert partition_owner is not None
AssertionError
Notify
{code}

Affected tests:
 * {{test_consumer_failure}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16576) New consumer fails with assert in consumer_test.py’s test_consumer_failure system test

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16576:
-

 Summary: New consumer fails with assert in consumer_test.py’s 
test_consumer_failure system test
 Key: KAFKA-16576
 URL: https://issues.apache.org/jira/browse/KAFKA-16576
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test fails with the following errors:

{quote}
* Timed out waiting for consumption
{quote}

Affected tests:

* {{test_broker_failure}}
* {{test_consumer_bounce}}
* {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16405) Mismatch assignment error when running consumer rolling upgrade system tests

2024-04-17 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16405.
---
  Reviewer: Lucas Brutschy
Resolution: Fixed

> Mismatch assignment error when running consumer rolling upgrade system tests
> 
>
> Key: KAFKA-16405
> URL: https://issues.apache.org/jira/browse/KAFKA-16405
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> relevant to [https://github.com/apache/kafka/pull/15578]
>  
> We are seeing:
> {code:java}
> 
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.11.4
> session_id:   2024-03-21--001
> run time: 3 minutes 24.632 seconds
> tests run:7
> passed:   5
> flaky:0
> failed:   2
> ignored:  0
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=COMBINED_KRAFT.use_new_coordinator=True.group_protocol=classic
> status: PASS
> run time:   24.599 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=COMBINED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   26.638 seconds
> AssertionError("Mismatched assignment: {frozenset(), 
> frozenset({TopicPartition(topic='test_topic', partition=3), 
> TopicPartition(topic='test_topic', partition=0), 
> TopicPartition(topic='test_topic', partition=1), 
> TopicPartition(topic='test_topic', partition=2)})}")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
> data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 246, in run_test
> return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py",
>  line 77, in rolling_update_test
> self._verify_range_assignment(consumer)
>   File 
> "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py",
>  line 38, in _verify_range_assignment
> assert assignment == set([
> AssertionError: Mismatched assignment: {frozenset(), 
> frozenset({TopicPartition(topic='test_topic', partition=3), 
> TopicPartition(topic='test_topic', partition=0), 
> TopicPartition(topic='test_topic', partition=1), 
> TopicPartition(topic='test_topic', partition=2)})}
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=False
> status: PASS
> run time:   29.815 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True
> status: PASS
> run time:   29.766 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=classic
> status: PASS
> run time:   30.086 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   35.965 seconds
> AssertionError("Mismatched assignment: {frozenset(), 
> frozenset({TopicPartition(to

[jira] [Created] (KAFKA-16565) IncrementalAssignmentConsumerEventHandler throws error when attempting to remove a partition that isn't assigned

2024-04-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16565:
-

 Summary: IncrementalAssignmentConsumerEventHandler throws error 
when attempting to remove a partition that isn't assigned
 Key: KAFKA-16565
 URL: https://issues.apache.org/jira/browse/KAFKA-16565
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
 Fix For: 3.8.0


In {{{}verifiable_consumer.py{}}}, the Incremental

 
{code:java}
def handle_partitions_revoked(self, event):
self.revoked_count += 1
self.state = ConsumerState.Rebalancing
self.position = {}
for topic_partition in event["partitions"]:
topic = topic_partition["topic"]
partition = topic_partition["partition"]
self.assignment.remove(TopicPartition(topic, partition))
 {code}
If the {{self.assignment.remove()}} call is passed a {{TopicPartition}} that 
isn't in the list, an error is thrown. For now, we should first check that the 
{{TopicPartition}} is in the list, and if not, log a warning or something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16558) Implement HeartbeatRequestState.toStringBase()

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16558:
-

 Summary: Implement HeartbeatRequestState.toStringBase()
 Key: KAFKA-16558
 URL: https://issues.apache.org/jira/browse/KAFKA-16558
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The code incorrectly overrides the {{toString()}} method instead of overriding 
{{{}toStringBase(){}}}. This affects debugging and troubleshooting consumer 
issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16557) Fix CommitRequestManager’s OffsetFetchRequestState.toString()

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16557:
-

 Summary: Fix CommitRequestManager’s 
OffsetFetchRequestState.toString()
 Key: KAFKA-16557
 URL: https://issues.apache.org/jira/browse/KAFKA-16557
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The code incorrectly overrides the {{toString()}} method instead of overriding 
{{{}toStringBase(){}}}. This affects debugging and troubleshooting consumer 
issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16556) Race condition between ConsumerRebalanceListener and SubscriptionState

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16556:
-

 Summary: Race condition between ConsumerRebalanceListener and 
SubscriptionState
 Key: KAFKA-16556
 URL: https://issues.apache.org/jira/browse/KAFKA-16556
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


There appears to be a race condition between invoking the 
{{ConsumerRebalanceListener}} callbacks on reconciliation and 
{{initWithCommittedOffsetsIfNeeded}} in the consumer.
 
The membership manager adds the newly assigned partitions to the 
{{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. 
Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, 
the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set 
all of those partitions' 'pending' flag to false.
 
During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to 
call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already 
cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the 
subscription's {{initializingPartitions}} method to get a set of the partitions 
for which to fetch their committed offsets. However, 
{{SubscriptionState.initializingPartitions()}} only returns partitions that 
have the {{pendingOnAssignedCallback}} flag set to to false.
 
The result is: * If the {{MembershipManagerImpl.assignPartitions()}} future  is 
completed on the background thread first, the 'pending' flag is set to false. 
On the application thread, when {{SubscriptionState.initializingPartitions()}} 
is called, it returns the partition, and we fetch its committed offsets
 * If instead the application thread calls 
{{SubscriptionState.initializingPartitions()}} first, the partitions's 
'pending' flag is still set to false, and so the partition is omitted from the 
returned set. The {{updateFetchPositions()}} method then continues on and 
re-initializes the partition's fetch offset to 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16555) Consumer's RequestState has incorrect logic to determine if inflight

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16555:
-

 Summary: Consumer's RequestState has incorrect logic to determine 
if inflight
 Key: KAFKA-16555
 URL: https://issues.apache.org/jira/browse/KAFKA-16555
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


When running system tests for the new consumer, I've hit an issue where the 
{{HeartbeatRequestManager}} is sending out multiple concurrent 
{{CONSUMER_GROUP_REQUEST}} RPCs. The effect is the coordinator creates multiple 
members which causes downstream assignment problems.

Here's the order of events:

* Time 202: {{HearbeatRequestManager.poll()}} determines it's OK to send a 
request. In so doing, it updates the {{RequestState}}'s {{lastSentMs}} to the 
current timestamp, 202
* Time 236: the response is received and response handler is invoked, setting 
the {{RequestState}}'s {{lastReceivedMs}} to the current timestamp, 236
* Time 236: {{HearbeatRequestManager.poll()}} is invoked again, and it sees 
that it's OK to send a request. It creates another request, once again updating 
the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 236
* Time 237:  {{HearbeatRequestManager.poll()}} is invoked again, and 
ERRONEOUSLY decides it's OK to send another request, despite one already in 
flight.

Here's the problem with {{requestInFlight()}}:

{code:java}
public boolean requestInFlight() {
return this.lastSentMs > -1 && this.lastReceivedMs < this.lastSentMs;
}
{code}

On our case, {{lastReceivedMs}} is 236 and {{lastSentMs}} is _also_ 236. So the 
received timestamp is _equal_ to the sent timestamp, not _less_.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1036: Extend RecordDeserializationException exception

2024-04-10 Thread Kirk True
Hi Fred,

Thanks for the KIP!

Questions/comments:

How do we handle the case where CompletedFetch.parseRecord isn’t able to 
construct a well-formed ConsumerRecord (i.e. the values it needs are 
missing/corrupted/etc.)?
Please change RecordDeserializationException’s getConsumerRecord() method to be 
named consumerRecord() to be consistent.   
Should we change the return type of consumerRecord() to be 
Optional> in the cases where even a “raw” 
ConsumerRecord can’t be created?
To avoid the above, does it make sense to include a Record object instead of a 
ConsumerRecord? The former doesn’t include the leaderEpoch or TimestampType, 
but maybe that’s OK?

Thanks,
Kirk

> On Apr 10, 2024, at 8:47 AM, Frédérik Rouleau  
> wrote:
> 
> Hi everyone,
> 
> To make implementation of DLQ in consumer easier, I would like to add the
> raw ConsumerRecord into the RecordDeserializationException.
> 
> Details are in KIP-1036
> 
> .
> 
> Thanks for your feedback.
> 
> Regards,
> Fred



Re: [VOTE] KIP-1025: Optionally URL-encode clientID and clientSecret in authorization header

2024-04-07 Thread Kirk True
+1 (non-binding)

Apologies. I thought I’d already voted :(

> On Apr 7, 2024, at 10:48 AM, Nelson B.  wrote:
> 
> Hi all,
> 
> Just wanted to bump up this thread for visibility.
> 
> Thanks!
> 
> On Thu, Mar 28, 2024 at 3:40 AM Doğuşcan Namal 
> wrote:
> 
>> Thanks for checking it out Nelson. Yeah I think it makes sense to leave it
>> for the users who want to use it for testing.
>> 
>> On Mon, 25 Mar 2024 at 20:44, Nelson B.  wrote:
>> 
>>> Hi Doğuşcan,
>>> 
>>> Thanks for your vote!
>>> 
>>> Currently, the usage of TLS depends on the protocol used by the
>>> authorization server which is configured
>>> through the "sasl.oauthbearer.token.endpoint.url" option. So, if the
>>> URL address uses simple http (not https)
>>> then secrets will be transmitted in plaintext. I think it's possible to
>>> enforce using only https but I think any
>>> production-grade authorization server uses https anyway and maybe users
>> may
>>> want to test using http in the dev environment.
>>> 
>>> Thanks,
>>> 
>>> On Thu, Mar 21, 2024 at 3:56 PM Doğuşcan Namal >> 
>>> wrote:
>>> 
 Hi Nelson, thanks for the KIP.
 
 From the RFC:
 ```
 The authorization server MUST require the use of TLS as described in
   Section 1.6 when sending requests using password authentication.
 ```
 
 I believe we already have an enforcement for OAuth to be enabled only
>> in
 SSLChannel but would be good to double check. Sending secrets over
 plaintext is a security bad practice :)
 
 +1 (non-binding) from me.
 
 On Tue, 19 Mar 2024 at 16:00, Nelson B. 
>> wrote:
 
> Hi all,
> 
> I would like to start a vote on KIP-1025
> <
> 
 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1025%3A+Optionally+URL-encode+clientID+and+clientSecret+in+authorization+header
>> ,
> which would optionally URL-encode clientID and clientSecret in the
> authorization header.
> 
> I feel like all possible issues have been addressed in the discussion
> thread.
> 
> Thanks,
> 
 
>>> 
>> 



[jira] [Created] (KAFKA-16465) New consumer does not invoke rebalance callbacks as expected in consumer_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16465:
-

 Summary: New consumer does not invoke rebalance callbacks as 
expected in consumer_test.py system test
 Key: KAFKA-16465
 URL: https://issues.apache.org/jira/browse/KAFKA-16465
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{replication_replica_failure_test.py}} system test fails with the 
following error:

{noformat}
test_id:
kafkatest.tests.core.replication_replica_failure_test.ReplicationReplicaFailureTest.test_replication_with_replica_failure.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   1 minute 20.972 seconds


TimeoutError('Timed out after 30s while awaiting initial record delivery of 
5 records')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/replication_replica_failure_test.py",
 line 97, in test_replication_with_replica_failure
self.await_startup()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/end_to_end.py",
 line 125, in await_startup
(timeout_sec, min_records))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
 line 58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: Timed out after 30s while awaiting initial record 
delivery of 5 records
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16464) New consumer fails with timeout in replication_replica_failure_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16464:
-

 Summary: New consumer fails with timeout in 
replication_replica_failure_test.py system test
 Key: KAFKA-16464
 URL: https://issues.apache.org/jira/browse/KAFKA-16464
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{security_test.py}} system test fails with the following error:

{noformat}
test_id:
kafkatest.tests.core.security_test.SecurityTest.test_client_ssl_endpoint_validation_failure.security_protocol=SSL.interbroker_security_protocol=PLAINTEXT.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   1 minute 30.885 seconds


TimeoutError('')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/security_test.py",
 line 142, in test_client_ssl_endpoint_validation_failure
wait_until(lambda: self.producer_consumer_have_expected_error(error), 
timeout_sec=30)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
 line 58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16462) New consumer fails with timeout in security_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16462:
-

 Summary: New consumer fails with timeout in security_test.py 
system test
 Key: KAFKA-16462
 URL: https://issues.apache.org/jira/browse/KAFKA-16462
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{security_test.py}} system test fails with the following error:

{quote}
* Consumer failed to consume up to offsets
{quote}

Affected test:

* {{test_client_ssl_endpoint_validation_failure}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16461) New consumer fails to consume records in security_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16461:
-

 Summary: New consumer fails to consume records in security_test.py 
system test
 Key: KAFKA-16461
 URL: https://issues.apache.org/jira/browse/KAFKA-16461
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test fails with the following errors:

{quote}
* Timed out waiting for consumption
{quote}

Affected tests:

* {{test_broker_failure}}
* {{test_consumer_bounce}}
* {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16460) New consumer times out system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16460:
-

 Summary: New consumer times out system test
 Key: KAFKA-16460
 URL: https://issues.apache.org/jira/browse/KAFKA-16460
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test fails with two different errors related to 
consumers joining the consumer group in a timely fashion.

{quote}
* Consumers failed to join in a reasonable amount of time
* Timed out waiting for consumers to join, expected total X joined, but only 
see Y joined fromnormal consumer group and Z from conflict consumer group{quote}

Affected tests:

 * {{test_fencing_static_consumer}}
 * {{test_static_consumer_bounce}}
 * {{test_static_consumer_persisted_after_rejoin}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16459) consumer_test.py’s static membership tests fail with new consumer

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16459:
-

 Summary: consumer_test.py’s static membership tests fail with new 
consumer
 Key: KAFKA-16459
 URL: https://issues.apache.org/jira/browse/KAFKA-16459
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Philip Nee
 Fix For: 3.8.0


The following error is reported when running the {{test_valid_assignment}} test 
from {{consumer_test.py}}:

 {code}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
584, in test_valid_assignment
wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
consumer.current_assignment()),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])]
{code}

To reproduce, create a system test suite file named 
{{test_valid_assignment.yml}} with these contents:

{code:yaml}
failures:
  - 
'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}'
{code}

Then set the the {{TC_PATHS}} environment variable to include that test suite 
file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16444) Run KIP-848 unit tests under code coverage

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16444:
-

 Summary: Run KIP-848 unit tests under code coverage
 Key: KAFKA-16444
 URL: https://issues.apache.org/jira/browse/KAFKA-16444
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16443) Update streams_static_membership_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16443:
-

 Summary: Update streams_static_membership_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16443
 URL: https://issues.apache.org/jira/browse/KAFKA-16443
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 4.0.0


This task is to update the test method(s) in 
{{streams_standby_replica_test.py}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16442) Update streams_standby_replica_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16442:
-

 Summary: Update streams_standby_replica_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16442
 URL: https://issues.apache.org/jira/browse/KAFKA-16442
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 4.0.0


This task is to update the test method(s) in {{security_test.py}} to support 
the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16440) Update security_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16440:
-

 Summary: Update security_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16440
 URL: https://issues.apache.org/jira/browse/KAFKA-16440
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in 
{{replication_replica_failure_test.py}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16441) Update streams_broker_down_resilience_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16441:
-

 Summary: Update streams_broker_down_resilience_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16441
 URL: https://issues.apache.org/jira/browse/KAFKA-16441
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{security_test.py}} to support 
the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16439) Update replication_replica_failure_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16439:
-

 Summary: Update replication_replica_failure_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16439
 URL: https://issues.apache.org/jira/browse/KAFKA-16439
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{replica_scale_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16438) Update consumer_test.py’s static tests to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16438:
-

 Summary: Update consumer_test.py’s static tests to support 
KIP-848’s group protocol config
 Key: KAFKA-16438
 URL: https://issues.apache.org/jira/browse/KAFKA-16438
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True


This task is to update the following test method(s) in {{consumer_test.py}} to 
support the {{group.protocol}} configuration:

* {{test_fencing_static_consumer}}
* {{test_static_consumer_bounce}}
* {{test_static_consumer_persisted_after_rejoin}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16271) Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16271.
---
  Reviewer: Lucas Brutschy
Resolution: Fixed

> Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol 
> config
> --
>
> Key: KAFKA-16271
> URL: https://issues.apache.org/jira/browse/KAFKA-16271
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> This task is to update the test method(s) in 
> {{consumer_rolling_upgrade_test.py}} to support the {{group.protocol}} 
> configuration introduced in 
> [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
>  by adding an optional {{group_protocol}} argument to the tests and matrixes.
> See KAFKA-16231 as an example of how the test parameters can be changed.
> The tricky wrinkle here is that the existing test relies on client-side 
> assignment strategies that aren't applicable with the new KIP-848-enabled 
> consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New committer: Christo Lolov

2024-03-26 Thread Kirk True
Congratulations Christo!

> On Mar 26, 2024, at 7:27 AM, Satish Duggana  wrote:
> 
> Congratulations Christo!
> 
> On Tue, 26 Mar 2024 at 19:20, Ivan Yurchenko  wrote:
>> 
>> Congrats!
>> 
>> On Tue, Mar 26, 2024, at 14:48, Lucas Brutschy wrote:
>>> Congrats!
>>> 
>>> On Tue, Mar 26, 2024 at 2:44 PM Federico Valeri  
>>> wrote:
 
 Congrats!
 
 On Tue, Mar 26, 2024 at 2:27 PM Mickael Maison  
 wrote:
> 
> Congratulations Christo!
> 
> On Tue, Mar 26, 2024 at 2:26 PM Chia-Ping Tsai  wrote:
>> 
>> Congrats Christo!
>> 
>> Chia-Ping
>>> 



[jira] [Resolved] (KAFKA-14246) Update threading model for Consumer

2024-03-25 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-14246.
---
Resolution: Fixed

> Update threading model for Consumer
> ---
>
> Key: KAFKA-14246
> URL: https://issues.apache.org/jira/browse/KAFKA-14246
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Philip Nee
>    Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Hi community,
>  
> We are refactoring the current KafkaConsumer and making it more asynchronous. 
>  This is the master Jira to track the project's progress; subtasks will be 
> linked to this ticket.  Please review the design document and feel free to 
> use this thread for discussion. 
>  
> The design document is here: 
> [https://cwiki.apache.org/confluence/display/KAFKA/Proposal%3A+Consumer+Threading+Model+Refactor]
>  
> The original email thread is here: 
> [https://lists.apache.org/thread/13jvwzkzmb8c6t7drs4oj2kgkjzcn52l]
>  
> I will continue to update the 1pager as reviews and comments come.
>  
> Thanks, 
> P



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16389) consumer_test.py’s test_valid_assignment fails with new consumer

2024-03-19 Thread Kirk True (Jira)
Kirk True created KAFKA-16389:
-

 Summary: consumer_test.py’s test_valid_assignment fails with new 
consumer
 Key: KAFKA-16389
 URL: https://issues.apache.org/jira/browse/KAFKA-16389
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


The following error is reported when running the {{test_valid_assignment}} test 
from {{consumer_test.py}}:

 {code}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
584, in test_valid_assignment
wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
consumer.current_assignment()),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])]
{code}

To reproduce, create a system test suite file named 
{{test_valid_assignment.yml}} with these contents:

{code:yaml}
failures:
  - 
'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}'
{code}

Then run set the the {{TC_PATHS}} environment variable to include that test 
suite file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1025: Optionally URL-encode clientID and clientSecret in authorization header

2024-03-19 Thread Kirk True
Hi Nelson,

Piggybacking on KIP-1030 seems like the perfect solution.

The configuration name change sounds good, too.

Thanks,
Kirk

> On Mar 18, 2024, at 2:21 PM, Nelson B.  wrote:
> 
> Hi Kirk,
> 
> Thanks for your comments!
> 
> 1. I think we can use KIP-1030
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations>
> as
> an opportunity to update the default value to "true" starting from version
> 4.0.
> 2. I've updated the config name to "sasl.oauthbearer.header.urlencode" in
> the KIP, I'm gonna update PR once KIP is accepted.
> 
> Thanks,
> 
> On Tue, Mar 19, 2024 at 3:45 AM Kirk True  wrote:
> 
>> Hi Nelson,
>> 
>> Thank you for writing up the KIP! My apologies for the delay in response :(
>> 
>> Questions:
>> 
>> 1. Is the long-term plan to keep the configuration default set to “false"?
>> I understand the short-term benefits, but in general, configuration
>> defaults should prefer compliance with standards (e.g. RFCs).
>> 2. Can we change “sasl.oauthbearer.header.urlencode.enable” to be a little
>> shorter? Maybe “sasl.oauthbearer.header.urlencode” or even
>> “sasl.oauthbearer.urlencode”? I’m looking at the configuration names that I
>> introduced in KIP-768 with a bit of cringe at their length :) This is a
>> total nit, so I won’t make a stink about it if everyone else is cool with
>> it :)
>> 
>> Thanks,
>> Kirk
>> 
>>> On Mar 13, 2024, at 5:31 AM, Nelson B.  wrote:
>>> 
>>> Hi all,
>>> 
>>> I just wanted to bump up this thread.
>>> 
>>> The KIP introduces a really small change and PR is already ready and only
>>> waiting for this KIP to get approved to be merged.
>>> 
>>> Thanks,
>>> 
>>> On Wed, Mar 6, 2024 at 12:26 PM Nelson B. 
>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I would like to start a discussion on KIP-1025, which would optionally
>>>> URL-encode clientID and clientSecret in the authorization header
>>>> 
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1025%3A+Optionally+URL-encode+clientID+and+clientSecret+in+authorization+header
>>>> 
>>>> Best,
>>>> Nelson B.
>>>> 
>> 
>> 



Re: [DISCUSS] KIP-1025: Optionally URL-encode clientID and clientSecret in authorization header

2024-03-18 Thread Kirk True
Hi Nelson,

Thank you for writing up the KIP! My apologies for the delay in response :(

Questions:

1. Is the long-term plan to keep the configuration default set to “false"? I 
understand the short-term benefits, but in general, configuration defaults 
should prefer compliance with standards (e.g. RFCs).
2. Can we change “sasl.oauthbearer.header.urlencode.enable” to be a little 
shorter? Maybe “sasl.oauthbearer.header.urlencode” or even 
“sasl.oauthbearer.urlencode”? I’m looking at the configuration names that I 
introduced in KIP-768 with a bit of cringe at their length :) This is a total 
nit, so I won’t make a stink about it if everyone else is cool with it :)

Thanks,
Kirk

> On Mar 13, 2024, at 5:31 AM, Nelson B.  wrote:
> 
> Hi all,
> 
> I just wanted to bump up this thread.
> 
> The KIP introduces a really small change and PR is already ready and only
> waiting for this KIP to get approved to be merged.
> 
> Thanks,
> 
> On Wed, Mar 6, 2024 at 12:26 PM Nelson B.  wrote:
> 
>> Hi all,
>> 
>> I would like to start a discussion on KIP-1025, which would optionally
>> URL-encode clientID and clientSecret in the authorization header
>> 
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1025%3A+Optionally+URL-encode+clientID+and+clientSecret+in+authorization+header
>> 
>> Best,
>> Nelson B.
>> 



[jira] [Reopened] (KAFKA-15691) Add new system tests to use new consumer

2024-03-14 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-15691:
---

> Add new system tests to use new consumer
> 
>
> Key: KAFKA-15691
> URL: https://issues.apache.org/jira/browse/KAFKA-15691
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, system tests
>Reporter: Kirk True
>    Assignee: Kirk True
>Priority: Major
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16315) Investigate propagating metadata updates via queues

2024-02-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16315:
-

 Summary: Investigate propagating metadata updates via queues
 Key: KAFKA-16315
 URL: https://issues.apache.org/jira/browse/KAFKA-16315
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 4.0.0


Some of the new {{AsyncKafkaConsumer}} logic enqueues events for the network 
I/O thread then issues a call to update the {{ConsumerMetadata}} via 
{{requestUpdate()}}. If the event ends up stuck in the queue for a long time, 
it is possible that the metadata is not updated at the correct time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Newbie need some help

2024-02-28 Thread Kirk True
Hi Chia,

Welcome!

One suggestion is to look at the list of open Jiras that are marked with the 
“newbie” label:

https://issues.apache.org/jira/issues/?jql=project %3D KAFKA AND labels IN 
(newbie%2C "newbie%2B%2B") AND status in (Open%2C Reopened) ORDER BY created 
DESC 


Thanks,
Kirk

> On Feb 27, 2024, at 5:00 PM, Chia-Chuan Yu  wrote:
> 
> Hi, team
> I’m a new member just joined the community. I work in the KLA as a software
> engineer. We use Kafka as part of our data pipeline. I’m currently looking
> for some issues to work on as my first step of contributing. Any
> recommending task I can look into it? Please let me know.
> 
> JIRA username:chiacyu
> 
> Best,
> Chia Chuan Yu



Re: Shortened URLs for KIPs?

2024-02-28 Thread Kirk True
I just found https://s.apache.org/, which is an Apache shortened URL service. 
That might provide the needed infrastructure, but it requires a login, so 
someone (a committer(?)) to create that for each KIP :(

> On Feb 28, 2024, at 2:40 PM, Kirk True  wrote:
> 
> Hi all,
> 
> Is it possible to set up shortened URLs for KIPs? So instead of, say:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
> 
> We could refer to it as:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966
> 
> Or maybe even go so far as to have something like:
> 
> https://kafka.apache.org/kips/KIP-966
> 
> I know the wiki has a way to generate a short URL (e.g. 
> https://cwiki.apache.org/confluence/x/mpOzDw), but, IMO, it’s so opaque as to 
> be nearly worthless. 
> 
> Pros:
> 
> 1. Succinct: great for written documentation
> 2. Discoverability: it’s predictable and easy to find
> 3. Robust: the URL doesn’t break when the KIP title changes
> 
> Cons:
> 
> 1. Time
> 2. Money
> 3. Perpetual maintenance: requires 100% commitment indefinitely
> 
> I know the list of cons is probably much more than I realize. At this point 
> I’m just wondering if it’s even a desired mechanism.
> 
> Thoughts?
> 
> Thanks,
> Kirk



Shortened URLs for KIPs?

2024-02-28 Thread Kirk True
Hi all,

Is it possible to set up shortened URLs for KIPs? So instead of, say:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas

We could refer to it as:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-966

Or maybe even go so far as to have something like:

https://kafka.apache.org/kips/KIP-966

I know the wiki has a way to generate a short URL (e.g. 
https://cwiki.apache.org/confluence/x/mpOzDw), but, IMO, it’s so opaque as to 
be nearly worthless. 

Pros:

1. Succinct: great for written documentation
2. Discoverability: it’s predictable and easy to find
3. Robust: the URL doesn’t break when the KIP title changes

Cons:

1. Time
2. Money
3. Perpetual maintenance: requires 100% commitment indefinitely

I know the list of cons is probably much more than I realize. At this point I’m 
just wondering if it’s even a desired mechanism.

Thoughts?

Thanks,
Kirk

Re: [ANNOUNCE] Apache Kafka 3.7.0

2024-02-28 Thread Kirk True
Thanks Stanislav

> On Feb 27, 2024, at 10:01 AM, Stanislav Kozlovski 
>  wrote:
> 
> The Apache Kafka community is pleased to announce the release of
> Apache Kafka 3.7.0
> 
> This is a minor release that includes new features, fixes, and
> improvements from 296 JIRAs
> 
> An overview of the release and its notable changes can be found in the
> release blog post:
> https://kafka.apache.org/blog#apache_kafka_370_release_announcement
> 
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/3.7.0/RELEASE_NOTES.html
> 
> You can download the source and binary release (Scala 2.12, 2.13) from:
> https://kafka.apache.org/downloads#3.7.0
> 
> ---
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> ** The Producer API allows an application to publish a stream of records to
> one or more Kafka topics.
> 
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
> 
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
> 
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
> 
> ** Building real-time streaming applications that transform or react
> to the streams of data.
> 
> 
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
> 
> A big thank you to the following 146 contributors to this release!
> (Please report an unintended omission)
> 
> Abhijeet Kumar, Akhilesh Chaganti, Alieh, Alieh Saeedi, Almog Gavra,
> Alok Thatikunta, Alyssa Huang, Aman Singh, Andras Katona, Andrew
> Schofield, Anna Sophie Blee-Goldman, Anton Agestam, Apoorv Mittal,
> Arnout Engelen, Arpit Goyal, Artem Livshits, Ashwin Pankaj,
> ashwinpankaj, atu-sharm, bachmanity1, Bob Barrett, Bruno Cadonna,
> Calvin Liu, Cerchie, chern, Chris Egerton, Christo Lolov, Colin
> Patrick McCabe, Colt McNealy, Crispin Bernier, David Arthur, David
> Jacot, David Mao, Deqi Hu, Dimitar Dimitrov, Divij Vaidya, Dongnuo
> Lyu, Eaugene Thomas, Eduwer Camacaro, Eike Thaden, Federico Valeri,
> Florin Akermann, Gantigmaa Selenge, Gaurav Narula, gongzhongqiang,
> Greg Harris, Guozhang Wang, Gyeongwon, Do, Hailey Ni, Hanyu Zheng, Hao
> Li, Hector Geraldino, hudeqi, Ian McDonald, Iblis Lin, Igor Soarez,
> iit2009060, Ismael Juma, Jakub Scholz, James Cheng, Jason Gustafson,
> Jay Wang, Jeff Kim, Jim Galasyn, John Roesler, Jorge Esteban Quilcate
> Otoya, Josep Prat, José Armando García Sancio, Jotaniya Jeel, Jouni
> Tenhunen, Jun Rao, Justine Olshan, Kamal Chandraprakash, Kirk True,
> kpatelatwork, kumarpritam863, Laglangyue, Levani Kokhreidze, Lianet
> Magrans, Liu Zeyu, Lucas Brutschy, Lucia Cerchie, Luke Chen, maniekes,
> Manikumar Reddy, mannoopj, Maros Orsak, Matthew de Detrich, Matthias
> J. Sax, Max Riedel, Mayank Shekhar Narula, Mehari Beyene, Michael
> Westerby, Mickael Maison, Nick Telford, Nikhil Ramakrishnan, Nikolay,
> Okada Haruki, olalamichelle, Omnia G.H Ibrahim, Owen Leung, Paolo
> Patierno, Philip Nee, Phuc-Hong-Tran, Proven Provenzano, Purshotam
> Chauhan, Qichao Chu, Matthias J. Sax, Rajini Sivaram, Renaldo Baur
> Filho, Ritika Reddy, Robert Wagner, Rohan, Ron Dagostino, Roon, runom,
> Ruslan Krivoshein, rykovsi, Sagar Rao, Said Boudjelda, Satish Duggana,
> shuoer86, Stanislav Kozlovski, Taher Ghaleb, Tang Yunzi, TapDang,
> Taras Ledkov, tkuramoto33, Tyler Bertrand, vamossagar12, Vedarth
> Sharma, Viktor Somogyi-Vass, Vincent Jiang, Walker Carlson,
> Wuzhengyu97, Xavier Léauté, Xiaobing Fang, yangy, Ritika Reddy,
> Yanming Zhou, Yash Mayya, yuyli, zhaohaidao, Zihao Lin, Ziming Deng
> 
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
> 
> Thank you!
> 
> 
> Regards,
> 
> Stanislav Kozlovski
> Release Manager for Apache Kafka 3.7.0



[jira] [Created] (KAFKA-16312) ConsumerRebalanceListener.onPartitionsAssigned should be called after joining, even if empty

2024-02-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16312:
-

 Summary: ConsumerRebalanceListener.onPartitionsAssigned should be 
called after joining, even if empty
 Key: KAFKA-16312
 URL: https://issues.apache.org/jira/browse/KAFKA-16312
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Lianet Magrans
 Fix For: 3.8.0


There is a difference between the {{LegacyKafkaConsumer}} and 
{{AsyncKafkaConsumer}} respecting when the 
{{ConsumerRebalanceListener.onPartitionsAssigned()}} method is invoked.

 
For example, with {{onPartitionsAssigned()}}:

* {{LegacyKafkaConsumer}}: the listener method is invoked when the consumer 
joins the group, even if that consumer was not assigned any partitions. In this 
case it's passed an empty list.
* {{AsyncKafkaConsumer}}: the listener method is only invoked after the 
consumer joins the group iff it has assigned partitions

 
This difference is affecting the system tests. The system tests use a Java 
class named {{VerifiableConsumer}} which uses a {{ConsumerRebalanceListener}} 
that logs when the callbacks are invoked. The system tests then read from that 
log to determine when the callbacks are invoked. This coordination is used by 
the system tests to determine the lifecycle and status of the consumers.
 
The system tests rely heavily on the listener behavior of the 
{{LegacyKafkaConsumer}}. It invokes the {{onPartitionsAssigned()}} method when 
the consumer joins the group, and the system tests use that to determine when 
the consumer is actively a member of the group. This validation of membership 
is used as an assertion throughout the consumer-related tests.
 
In the system test I'm executing from {{consumer_test.py}}, there's a test that 
creates three consumers to read from a single topic with a single partition. 
It's a bit of an oddball test, but it demonstrates the issue.
 
Here are the logs pulled from the test run when executed using the 
{{LegacyKafkaConsumer}}:
 
Node 1:
 
{code:java}
[2024-02-15 00:43:52,400] INFO Adding newly assigned partitions:  
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}
 
Node 2:
 
{code:java}
[2024-02-15 00:43:52,401] INFO Adding newly assigned partitions: test_topic-0 
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}
 
Node 3:
 
{code:java}
[2024-02-15 00:43:52,399] INFO Adding newly assigned partitions:  
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}

Here are the logs when executing the same test using the {{AsyncKafkaConsumer}}:

Node 1:

{code:java}
[2024-02-15 01:15:46,576] INFO Adding newly assigned partitions: test_topic-0 
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}

Node 2:

{code:java}n/a{code}

Node 3:

{code:java}n/a{code}

As a result of this change, the existing system tests do not work with the new 
consumer. However, even more importantly, this change in behavior may adversely 
affect existing users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15475) Request might retry forever even if the user API timeout expires

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15475.
---
Resolution: Fixed

> Request might retry forever even if the user API timeout expires
> 
>
> Key: KAFKA-15475
> URL: https://issues.apache.org/jira/browse/KAFKA-15475
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>    Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> If the request timeout in the background thread, it will be completed with 
> TimeoutException, which is Retriable.  In the TopicMetadataRequestManager and 
> possibly other managers, the request might continue to be retried forever.
>  
> There are two ways to fix this
>  # Pass a timer to the manager to remove the inflight requests when it is 
> expired.
>  # Pass the future to the application layer and continue to retry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-16200:
---

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>    Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16199) Prune the event queue if event timeout expired before starting

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16199.
---
Resolution: Duplicate

> Prune the event queue if event timeout expired before starting
> --
>
> Key: KAFKA-16199
> URL: https://issues.apache.org/jira/browse/KAFKA-16199
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>    Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16200.
---
Resolution: Duplicate

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>    Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16019) Some of the tests in PlaintextConsumer can't seem to deterministically invoke and verify the consumer callback

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16019.
---
Resolution: Fixed

> Some of the tests in PlaintextConsumer can't seem to deterministically invoke 
> and verify the consumer callback
> --
>
> Key: KAFKA-16019
> URL: https://issues.apache.org/jira/browse/KAFKA-16019
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> I was running the PlaintextConsumer to test the async consumer; however, a 
> few tests were failing with not being able to verify the listener is invoked 
> correctly
> For example `testPerPartitionLeadMetricsCleanUpWithSubscribe`
> Around 50% of the time, the listener's callsToAssigned was never incremented 
> correctly.  Event changing it to awaitUntilTrue it was still the same case
> {code:java}
> consumer.subscribe(List(topic, topic2).asJava, listener)
> val records = awaitNonEmptyRecords(consumer, tp)
> assertEquals(1, listener.callsToAssigned, "should be assigned once") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16023) PlaintextConsumerTest needs to wait for reconciliation to complete before proceeding

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16023.
---
Resolution: Fixed

{{testPerPartitionLagMetricsCleanUpWithSubscribe}} is now passing consistently, 
so marking this as fixed.

> PlaintextConsumerTest needs to wait for reconciliation to complete before 
> proceeding
> 
>
> Key: KAFKA-16023
> URL: https://issues.apache.org/jira/browse/KAFKA-16023
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> Several tests in PlaintextConsumerTest.scala (such as 
> testPerPartitionLagMetricsCleanUpWithSubscribe) uses:
> assertEquals(1, listener.callsToAssigned, "should be assigned once")
> However, as the timing for reconciliation completion is not deterministic due 
> to asynchronous processing. We actually need to wait until the condition to 
> happen.
> However, another issue is the timeout - some of these tasks might not 
> complete within the 600ms timeout, so the tests are deemed to be flaky.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15993) Enable max poll integration tests that depend on callback invocation

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15993.
---
Resolution: Duplicate

> Enable max poll integration tests that depend on callback invocation
> 
>
> Key: KAFKA-15993
> URL: https://issues.apache.org/jira/browse/KAFKA-15993
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>    Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> We will enable integration tests using the async consumer in KAFKA-15971.  
> However, we should also enable tests that rely on rebalance listeners after 
> KAFKA-15628 is closed.  One example would be testMaxPollIntervalMs, that I 
> relies on the listener to verify the correctness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16208) Design new Consumer timeout policy

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16208.
---
Resolution: Duplicate

> Design new Consumer timeout policy
> --
>
> Key: KAFKA-16208
> URL: https://issues.apache.org/jira/browse/KAFKA-16208
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer, documentation
>Affects Versions: 3.7.0
>    Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> This task is to design and document the timeout policy for the new Consumer 
> implementation.
> The documentation lives here: 
> https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16287) Implement example test for common rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)
Kirk True created KAFKA-16287:
-

 Summary: Implement example test for common rebalance callback 
scenarios
 Key: KAFKA-16287
 URL: https://issues.apache.org/jira/browse/KAFKA-16287
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer
Reporter: Kirk True
Assignee: Lucas Brutschy
 Fix For: 3.8.0


There is justified concern that the new threading model may not play well with 
"tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide some 
assurance that it will support complicated patterns.
 # Design and implement test scenarios
 # Update and document any design changes with the callback sub-system where 
needed
 # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16276) Update transactions_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16276:
-

 Summary: Update transactions_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16276
 URL: https://issues.apache.org/jira/browse/KAFKA-16276
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{transactions_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

The wrinkle here is that {{transactions_test.py}}  was not able to run as-is. 
That might deprioritize this until whatever is causing that is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16275) Update kraft_upgrade_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16275:
-

 Summary: Update kraft_upgrade_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16275
 URL: https://issues.apache.org/jira/browse/KAFKA-16275
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{kraft_upgrade_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16274) Update replica_scale_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16274:
-

 Summary: Update replica_scale_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16274
 URL: https://issues.apache.org/jira/browse/KAFKA-16274
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{replica_scale_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16272) Update connect_distributed_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16272:
-

 Summary: Update connect_distributed_test.py to support KIP-848’s 
group protocol config
 Key: KAFKA-16272
 URL: https://issues.apache.org/jira/browse/KAFKA-16272
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{connect_distributed_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16273) Update consume_bench_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16273:
-

 Summary: Update consume_bench_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16273
 URL: https://issues.apache.org/jira/browse/KAFKA-16273
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{consume_bench_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16271) Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16271:
-

 Summary: Update consumer_rolling_upgrade_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16271
 URL: https://issues.apache.org/jira/browse/KAFKA-16271
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in 
{{consumer_rolling_upgrade_test.py}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

The tricky wrinkle here is that the existing test relies on client-side 
assignment strategies that aren't applicable with the new KIP-848-enabled 
consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16270) Update snapshot_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16270:
-

 Summary: Update snapshot_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16270
 URL: https://issues.apache.org/jira/browse/KAFKA-16270
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{snapshot_test.py}} to support 
the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16269) Update reassign_partitions_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16269:
-

 Summary: Update reassign_partitions_test.py to support KIP-848’s 
group protocol config
 Key: KAFKA-16269
 URL: https://issues.apache.org/jira/browse/KAFKA-16269
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{reassign_partitions_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16268) Update fetch_from_follower_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16268:
-

 Summary: Update fetch_from_follower_test.py to support KIP-848’s 
group protocol config
 Key: KAFKA-16268
 URL: https://issues.apache.org/jira/browse/KAFKA-16268
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{fetch_from_follower_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16267) Update consumer_group_command_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16267:
-

 Summary: Update consumer_group_command_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16267
 URL: https://issues.apache.org/jira/browse/KAFKA-16267
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{consumer_group_command_test.py}} 
to support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16256) Update ConsumerConfig to validate use of group.remote.assignor and partition.assignment.strategy based on group.protocol

2024-02-14 Thread Kirk True (Jira)
Kirk True created KAFKA-16256:
-

 Summary: Update ConsumerConfig to validate use of 
group.remote.assignor and partition.assignment.strategy based on group.protocol
 Key: KAFKA-16256
 URL: https://issues.apache.org/jira/browse/KAFKA-16256
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


{{ConsumerConfig}} supports both the {{group.remote.assignor}} and 
{{partition.assignment.strategy}} configuration options. These, however, should 
not be used together; the former is applicable only when the {{group.protocol}} 
is set to {{consumer}} and the latter when the {{group.protocol}} is set to 
{{{}classic{}}}. We should emit a warning if the user specifies the incorrect 
configuration based on the value of {{{}group.protocol{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16255) AsyncKafkaConsumer should not use partition.assignment.strategy

2024-02-14 Thread Kirk True (Jira)
Kirk True created KAFKA-16255:
-

 Summary: AsyncKafkaConsumer should not use 
partition.assignment.strategy
 Key: KAFKA-16255
 URL: https://issues.apache.org/jira/browse/KAFKA-16255
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The partition.assignment.strategy configuration is used to specify a list of 
zero or more 
ConsumerPartitionAssignor instances. However, that interface is not applicable 
for the KIP-848-based protocol on top of which AsyncKafkaConsumer is built. 
Therefore, the use of ConsumerPartitionAssignor is in appropriate and should be 
removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16231) Update consumer_test.py to support KIP-848’s group protocol config

2024-02-06 Thread Kirk True (Jira)
Kirk True created KAFKA-16231:
-

 Summary: Update consumer_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16231
 URL: https://issues.apache.org/jira/browse/KAFKA-16231
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update {{verifiable_consumer.py}} to support the 
{{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument. It will default to classic 
and we will take a separate task (Jira) to update the callers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16230) Update verifiable_consumer to support KIP-848’s group protocol config

2024-02-06 Thread Kirk True (Jira)
Kirk True created KAFKA-16230:
-

 Summary: Update verifiable_consumer to support KIP-848’s group 
protocol config
 Key: KAFKA-16230
 URL: https://issues.apache.org/jira/browse/KAFKA-16230
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update {{VerifiableConsumer}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{--group-protocol}} command line option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15691) Add new system tests to use new consumer

2024-02-01 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15691.
---
Resolution: Duplicate

> Add new system tests to use new consumer
> 
>
> Key: KAFKA-15691
> URL: https://issues.apache.org/jira/browse/KAFKA-15691
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, system tests
>Reporter: Kirk True
>    Assignee: Kirk True
>Priority: Major
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16208) Design new Consumer timeout policy

2024-01-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16208:
-

 Summary: Design new Consumer timeout policy
 Key: KAFKA-16208
 URL: https://issues.apache.org/jira/browse/KAFKA-16208
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, documentation
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to design and document the timeout policy for the new Consumer 
implementation.

The documentation lives here: 
https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16200) Ensure RequestManager handling of expired timeouts are consistent

2024-01-26 Thread Kirk True (Jira)
Kirk True created KAFKA-16200:
-

 Summary: Ensure RequestManager handling of expired timeouts are 
consistent
 Key: KAFKA-16200
 URL: https://issues.apache.org/jira/browse/KAFKA-16200
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16199) Prune the event queue if events have expired before starting

2024-01-26 Thread Kirk True (Jira)
Kirk True created KAFKA-16199:
-

 Summary: Prune the event queue if events have expired before 
starting
 Key: KAFKA-16199
 URL: https://issues.apache.org/jira/browse/KAFKA-16199
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16191) Clean up of consumer client internal events

2024-01-24 Thread Kirk True (Jira)
Kirk True created KAFKA-16191:
-

 Summary: Clean up of consumer client internal events
 Key: KAFKA-16191
 URL: https://issues.apache.org/jira/browse/KAFKA-16191
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


There are a few minor issues with the event sub-classes in the 
org.apache.kafka.clients.consumer.internals package that should be cleaned up:
  # Update the names of subclasses to remove "Application" or "Background"
 # Make toString() final in the base classes and clean up the implementations 
of toStringBase()
 # Fix minor whitespace inconsistencies



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup

2024-01-18 Thread Kirk True (Jira)
Kirk True created KAFKA-16167:
-

 Summary: Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup
 Key: KAFKA-16167
 URL: https://issues.apache.org/jira/browse/KAFKA-16167
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLeadMetricsCleanUpWithSubscribe

2024-01-17 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16151.
---
Resolution: Duplicate

> Fix PlaintextConsumerTest.testPerPartitionLeadMetricsCleanUpWithSubscribe
> -
>
> Key: KAFKA-16151
> URL: https://issues.apache.org/jira/browse/KAFKA-16151
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 3.7.0
>    Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, kip-848-client-support
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe

2024-01-17 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16150.
---
Resolution: Duplicate

> Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
> 
>
> Key: KAFKA-16150
> URL: https://issues.apache.org/jira/browse/KAFKA-16150
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 3.7.0
>    Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, kip-848-client-support
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16152) Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16152:
-

 Summary: Fix 
PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart
 Key: KAFKA-16152
 URL: https://issues.apache.org/jira/browse/KAFKA-16152
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16151:
-

 Summary: Fix 
PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe
 Key: KAFKA-16151
 URL: https://issues.apache.org/jira/browse/KAFKA-16151
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16150:
-

 Summary: Fix 
PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
 Key: KAFKA-16150
 URL: https://issues.apache.org/jira/browse/KAFKA-16150
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16149) Aggressively expire unused client connections

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16149:
-

 Summary: Aggressively expire unused client connections
 Key: KAFKA-16149
 URL: https://issues.apache.org/jira/browse/KAFKA-16149
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, producer 
Reporter: Kirk True
Assignee: Kirk True


The goal is to minimize the number of connections from the client to the 
brokers.

On the Java client, there are potentially two types of network connections to 
brokers:
 # Connections for metadata requests
 # Connections for fetch, produce, etc. requests

The idea is to apply a much shorter idle time to client connections that have 
_only_ served metadata (type 1 above) so that they become candidates for 
expiration more quickly.

Alternatively (or additionally), a change to the way metadata requests are 
routed could be made to reduce the number of connections.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16142) Update metrics documentation for errors and new metrics

2024-01-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16142:
-

 Summary: Update metrics documentation for errors and new metrics
 Key: KAFKA-16142
 URL: https://issues.apache.org/jira/browse/KAFKA-16142
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16143) New metrics for KIP-848 protocol

2024-01-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16143:
-

 Summary: New metrics for KIP-848 protocol
 Key: KAFKA-16143
 URL: https://issues.apache.org/jira/browse/KAFKA-16143
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Philip Nee
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.7.0 RC2

2024-01-15 Thread Kirk True
Hi Stanislav,

On Sun, Jan 14, 2024, at 1:17 PM, Stanislav Kozlovski wrote:
> Hey Kirk and Chris,
> 
> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
> improper closing. And the PR description implies this has been present
> since 3.5. While annoying, I don't see a strong reason for this to block
> the release.

I would imagine that it would result in concerned users reporting the issue.

I took another look, and the code that causes the issue was indeed changed in 
3.7. It is easily reproducible.

The PR is ready for review: https://github.com/apache/kafka/pull/15186

Thanks,
Kirk

Re: [VOTE] 3.7.0 RC2

2024-01-12 Thread Kirk True
Hi Chris/Stanislav,

I'm working on the 'Unable to find FetchSessionHandler' log problem 
(KAFKA-16029) and have put out a draft PR 
(https://github.com/apache/kafka/pull/15186). I will use the quickstart 
approach as a second means to reproduce/verify while I wait for the PR's 
Jenkins job to finish.   

Thanks,
Kirk

On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote:
> Hi Stanislav,
> 
> 
> Thanks for running this release!
> 
> To verify, I:
> - Built from source using Java 11 with both:
> - - the 3.7.0-rc2 tag on GitHub
> - - the kafka-3.7.0-src.tgz artifact from
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> - Checked signatures and checksums
> - Ran the quickstart using both:
> - - The kafka_2.13-3.7.0.tgz artifact from
> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ with Java 11
> and Scala 13 in KRaft mode
> - - Our shiny new broker Docker image, apache/kafka:3.7.0-rc2
> - Ran all unit tests
> - Ran all integration tests for Connect and MM2
> 
> 
> I found two minor areas for concern:
> 
> 1. (Possibly a blocker)
> When running the quickstart, I noticed this ERROR-level log message being
> emitted frequently (not not every time) when I killed my console consumer
> via ctrl-C:
> 
> > [2024-01-12 11:00:31,088] ERROR [Consumer clientId=console-consumer,
> groupId=console-consumer-74388] Unable to find FetchSessionHandler for node
> 1. Ignoring fetch response
> (org.apache.kafka.clients.consumer.internals.AbstractFetch)
> 
> I see that this error message is already reported in
> https://issues.apache.org/jira/browse/KAFKA-16029. I think we should
> prioritize fixing it for this release. I know it's probably benign but it's
> really not a good look for us when basic operations log error messages, and
> it may give new users some headaches.
> 
> 
> 2. (Probably not a blocker)
> The following unit tests failed the first time around, and all of them
> passed the second time I ran them:
> 
> - (clients) ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
> - (clients) SelectorTest.testConnectionsByClientMetric()
> - (clients) Tls13SelectorTest.testConnectionsByClientMetric()
> - (connect) TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound (I
> thought I fixed this one! 郎郎)
> - (core) ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2]
> 
> 
> Thanks again for your work on this release, and congratulations to Kafka
> Streams for having zero flaky unit tests during my highly-experimental
> single laptop run!
> 
> 
> Cheers,
> 
> Chris
> 
> On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski
>  wrote:
> 
> > Hello Kafka users, developers, and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 3.7.0.
> >
> > Note it's named "RC2" because I had a few "failed" RCs that I had
> > cut/uploaded but ultimately had to scrap prior to announcing due to new
> > blockers arriving before I could even announce them.
> >
> > Further - I haven't yet been able to set up the system tests successfully.
> > And the integration/unit tests do have a few failures that I have to spend
> > time triaging. I would appreciate any help in case anyone notices any tests
> > failing that they're subject matters experts in. Expect me to follow up in
> > a day or two with more detailed analysis.
> >
> > Major changes include:
> > - Early Access to KIP-848 - the next generation of the consumer rebalance
> > protocol
> > - KIP-858: Adding JBOD support to KRaft
> > - KIP-714: Observability into Client metrics via a standardized interface
> >
> > Check more information in the WIP blog post:
> > https://github.com/apache/kafka-site/pull/578
> >
> > Release notes for the 3.7.0 release:
> >
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Thursday, January 18, 9am PT ***
> >
> > Usually these deadlines tend to be 2-3 days, but due to this being the
> > first RC and the tests not having ran yet, I am giving it a bit more time.
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/
> >
> > * Docker release artifact to be voted upon:
> > apache/kafka:3.7.0-rc2
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag:
> > https://github.com/apache/kafka/releases/tag/3.7.0-rc2
> >
> > * Documentation:
> > https://kafka.apache.org/37/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/37/protocol.html
> >
> > * Successful Jenkins builds for the 3.7 branch:
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/
> > There are failing 

[jira] [Created] (KAFKA-16112) Review JMX metrics in Async Consumer and determine the missing ones

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16112:
-

 Summary: Review JMX metrics in Async Consumer and determine the 
missing ones
 Key: KAFKA-16112
 URL: https://issues.apache.org/jira/browse/KAFKA-16112
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Philip Nee
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16111) Implement tests for tricky rebalance callbacks scenarios

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16111:
-

 Summary: Implement tests for tricky rebalance callbacks scenarios
 Key: KAFKA-16111
 URL: https://issues.apache.org/jira/browse/KAFKA-16111
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16110) Implement consumer performance tests

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16110:
-

 Summary: Implement consumer performance tests
 Key: KAFKA-16110
 URL: https://issues.apache.org/jira/browse/KAFKA-16110
 Project: Kafka
  Issue Type: New Feature
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16109) Ensure system tests cover the "simple consumer + commit" use case

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16109:
-

 Summary: Ensure system tests cover the "simple consumer + commit" 
use case
 Key: KAFKA-16109
 URL: https://issues.apache.org/jira/browse/KAFKA-16109
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15250) DefaultBackgroundThread is running tight loop

2024-01-09 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-15250:
---
  Assignee: Kirk True  (was: Philip Nee)

> DefaultBackgroundThread is running tight loop
> -
>
> Key: KAFKA-15250
> URL: https://issues.apache.org/jira/browse/KAFKA-15250
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>    Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor
>
> The DefaultBackgroundThread is running tight loops and wasting CPU cycles.  I 
> think we need to reexamine the timeout pass to networkclientDelegate.poll.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Kirk True
Congrats Divij!!!

On Wed, Dec 27, 2023, at 1:44 PM, Jorge Esteban Quilcate Otoya wrote:
> Congratulations Divij!!
> 
> On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:
> 
> > Congratulations!
> >
> > On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
> >
> > > congrats divij!
> > >
> > > On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Congratulations Divij!
> > > >
> > > > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> > wrote:
> > > >
> > > > > Congratulations Divij!
> > > > >
> > > > > Regards,
> > > > > Gaurav
> > > > >
> > > > > > On 27-Dec-2023, at 17:44, Mickael Maison  > >
> > > > > wrote:
> > > > > >
> > > > > > Congratulations Divij!
> > > > > >
> > > > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> > > > > wrote:
> > > > > >>
> > > > > >> Congrats Divij! Absolutely well deserved !
> > > > > >>
> > > > > >> Thanks!
> > > > > >> Sagar.
> > > > > >>
> > > > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen 
> > > wrote:
> > > > > >>>
> > > > > >>> Hi, Everyone,
> > > > > >>>
> > > > > >>> Divij has been a Kafka committer since June, 2023. He has
> > remained
> > > > very
> > > > > >>> active and instructive in the community since becoming a
> > committer.
> > > > > It's my
> > > > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > > > >>>
> > > > > >>> Congratulations Divij!
> > > > > >>>
> > > > > >>> Luke
> > > > > >>> on behalf of Apache Kafka PMC
> > > > > >>>
> > > > >
> > > >
> > >
> >
> 


[jira] [Created] (KAFKA-16037) Upgrade existing system tests to use new consumer

2023-12-20 Thread Kirk True (Jira)
Kirk True created KAFKA-16037:
-

 Summary: Upgrade existing system tests to use new consumer
 Key: KAFKA-16037
 URL: https://issues.apache.org/jira/browse/KAFKA-16037
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Reporter: Kirk True
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16029) Investigate cause of "Unable to find FetchSessionHandler for node X" in logs

2023-12-18 Thread Kirk True (Jira)
Kirk True created KAFKA-16029:
-

 Summary: Investigate cause of "Unable to find FetchSessionHandler 
for node X" in logs
 Key: KAFKA-16029
 URL: https://issues.apache.org/jira/browse/KAFKA-16029
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16011) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnClose

2023-12-13 Thread Kirk True (Jira)
Kirk True created KAFKA-16011:
-

 Summary: Fix 
PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnClose
 Key: KAFKA-16011
 URL: https://issues.apache.org/jira/browse/KAFKA-16011
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True


The integration test 
{{PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling}} is 
failing when using the {{AsyncKafkaConsumer}}.

The error is:

{code}
org.opentest4j.AssertionFailedError: Did not get valid assignment for 
partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-5, topic1-1, topic1-0, 
topic1-3] after one consumer left
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
at 
kafka.api.AbstractConsumerTest.validateGroupAssignment(AbstractConsumerTest.scala:286)
at 
kafka.api.PlaintextConsumerTest.runMultiConsumerSessionTimeoutTest(PlaintextConsumerTest.scala:1883)
at 
kafka.api.PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling(PlaintextConsumerTest.scala:1281)
{code}

The logs include these lines:
 
{code}
[2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
[2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
[2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
{code} 

I don't know if that's related or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16010) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling

2023-12-13 Thread Kirk True (Jira)
Kirk True created KAFKA-16010:
-

 Summary: Fix 
PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling
 Key: KAFKA-16010
 URL: https://issues.apache.org/jira/browse/KAFKA-16010
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True


The integration test 
{{PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation}} is failing 
when using the {{AsyncKafkaConsumer}}.

The error is:

{code}
org.opentest4j.AssertionFailedError: Timed out before expected rebalance 
completed
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
at 
kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317)
at 
kafka.api.PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation(PlaintextConsumerTest.scala:235)
{code}

The logs include this line:
 
{code}
[2023-12-13 15:20:27,824] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
{code} 

I don't know if that's related or not.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >