Re: Permission to contribute to Apache Kafka

2024-05-21 Thread Matthias J. Sax

You should be all set.

On 5/21/24 8:30 AM, Harry Fallows wrote:

Hello,

I am following the[Getting Started guide for writing 
KIPs](https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals).
 Could someone give me the permissions to write a KIP please? My Wiki ID is 
harryfallows, my Jira ID is hfallows, and my email address is 
harryfall...@protonmail.com.

Thanks,Harry


Re: Request for Authorization to Create KIP

2024-05-21 Thread Matthias J. Sax

You should be all set.

On 5/21/24 6:58 PM, 黃竣陽 wrote:

I want to create a KIP, and my
wiki id : m1a2st and Jira id : m1a2st, Thanks for your help.


jiang dou  於 2024年5月22日 上午9:01 寫道:

You should send your jira ID and wiki ID,
Please refer to this address :
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

黃竣陽  于2024年5月21日周二 22:42写道:


I am writing to request authorization to create a KIP.

Currently, I do not have the necessary permissions to access the 'Create
KIP' function. My account email is s7133...@gmail.com.

Could you please grant me the required permissions to create a KIP? Thanks
for your help.






Re: [External] Re: DescribeLogDirs in Kafka 3.3.1 returns all topics instead of one provided in the request. Bug or "bad user error"?

2024-05-21 Thread Maxim Senin
Thanks, all.

/Maxim

From: Chia-Ping Tsai 
Date: Tuesday, May 21, 2024 at 11:27 PM
To: dev@kafka.apache.org 
Subject: [External] Re: DescribeLogDirs in Kafka 3.3.1 returns all topics 
instead of one provided in the request. Bug or "bad user error"?
Dear all,

I file 
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-16807&data=05%7C02%7Cmsenin%40cogility.com%7Cb08ee603d92d4e07e4f208dc7a2846d1%7C14f6e30c9a2641829b617fc5c281b3bf%7C0%7C0%7C638519560622844994%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=682lviHeXd2H1%2BXK7Gz%2F5j%2F61PMygSWAaTs9i5r7%2BZ4%3D&reserved=0
 to fix it.

Thanks to Maxim for this nice finding. Also, thanks to Gaurav for the quick
response/dig-in

Cheers,
Chia-Ping

Gaurav Narula  於 2024年5月21日 週二 下午2:56寫道:

> Hello Maxim,
>
> Thanks for sharing this.
>
> I had a look and it seems like the behaviour on the wire changed with
> KAFKA-9435. I believe this change [0] in ReplicaManager causes all topics
> in
> online log dirs to be a part of the response inadvertently. We do however
> set
> partition information only for the queried topic. I'd suggest creating a
> JIRA
> for this.
>
> As for an option to specify ALL_PARTITIONS, I reckon that would require a
> KIP
> since it would be a change to the public interface.
>
> [0]:
>
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Fpull%2F7972%2Ffiles%23diff-78812e247ffeae6f8c49b1b22506434701b1e1bafe7f92ef8f8708059e292bf0R674&data=05%7C02%7Cmsenin%40cogility.com%7Cb08ee603d92d4e07e4f208dc7a2846d1%7C14f6e30c9a2641829b617fc5c281b3bf%7C0%7C0%7C638519560622857300%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=EWrrzTWQlEBkzchEQynw1IL%2BnOQzgqrmkhaDY6y2TvQ%3D&reserved=0
>
> Regards,
> Gaurav
>
> On Mon, May 20, 2024 at 10:35 PM Maxim Senin 
> wrote:
>
> > Hello.
> >
> > I’m having a problem with Kafka protocol API.
> >
> > Requests:
> > DescribeLogDirs Request (Version: 0) => [topics]
> >   topics => topic [partitions]
> > topic => STRING
> > partitions => INT32
> >
> > My request contains `[{topic: “blah”, partitions:
> > [0,1,2,3,4,5,6,7,8,9]}]`, but the result
> >
> > Responses:
> > DescribeLogDirs Response (Version: 0) => throttle_time_ms [results]
> >   throttle_time_ms => INT32
> >   results => error_code log_dir [topics]
> > error_code => INT16
> > log_dir => STRING
> > topics => name [partitions]
> >   name => STRING
> >   partitions => partition_index partition_size offset_lag
> is_future_key
> > partition_index => INT32
> > partition_size => INT64
> > offset_lag => INT64
> > is_future_key => BOOLEAN
> >
> >
> >
> >  contains entries for *all* topics. My workaround had been to filter the
> > returned list by topic name to find the one I was requesting the data
> for,
> > but I don’t understand why it’s not limiting the results to just the
> topic
> > I requested in the first place.
> >
> > Also, I think there should be an option to just specify ALL_PARTITIONS
> > because that would save me from having to retrieve topic metadata from
> the
> > broker to count the number of partitions. Kafka server would probably
> have
> > means to do that more efficiently.
> >
> > Is this a bug or am I doing something wrong?
> >
> > Thanks,
> > Maxim
> >
> > 
> >
> > COGILITY SOFTWARE CORPORATION LEGAL DISCLAIMER: The information in this
> > email is confidential and is intended solely for the addressee. Access to
> > this email by anyone else is unauthorized. If you are not the intended
> > recipient, any disclosure, copying, distribution or any action taken or
> > omitted to be taken in reliance on it, is prohibited and may be unlawful.
> >
>


[jira] [Created] (KAFKA-16810) Improve kafka-consumer-perf-test to benchmark single partition

2024-05-21 Thread Harsh Panchal (Jira)
Harsh Panchal created KAFKA-16810:
-

 Summary: Improve kafka-consumer-perf-test to benchmark single 
partition
 Key: KAFKA-16810
 URL: https://issues.apache.org/jira/browse/KAFKA-16810
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Reporter: Harsh Panchal


kafka-consumer-perf-test is a great tool to quickly check raw consumer 
performance. Currently, It subscribes to all the partitions and gives overall 
cluster performance, however If we want to test performance of single 
broker/partition, existing tool does not support.

We can introduce two optional flags --partitions and --offsets which gives 
flexibility to benchmark only specific partitions optionally from specified 
offsets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16783) Migrate RemoteLogMetadataManagerTest to new test infra

2024-05-21 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16783.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Migrate RemoteLogMetadataManagerTest to new test infra
> --
>
> Key: KAFKA-16783
> URL: https://issues.apache.org/jira/browse/KAFKA-16783
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: storage_test
> Fix For: 3.8.0
>
>
> as title
> `TopicBasedRemoteLogMetadataManagerWrapperWithHarness` could be replaced by 
> `RemoteLogMetadataManagerTestUtils#builder`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Request for Authorization to Create KIP

2024-05-21 Thread 黃竣陽
I want to create a KIP, and my
wiki id : m1a2st and Jira id : m1a2st, Thanks for your help.

> jiang dou  於 2024年5月22日 上午9:01 寫道:
> 
> You should send your jira ID and wiki ID,
> Please refer to this address :
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> 
> 黃竣陽  于2024年5月21日周二 22:42写道:
> 
>> I am writing to request authorization to create a KIP.
>> 
>> Currently, I do not have the necessary permissions to access the 'Create
>> KIP' function. My account email is s7133...@gmail.com.
>> 
>> Could you please grant me the required permissions to create a KIP? Thanks
>> for your help.
>> 
>> 



Re: Request for Authorization to Create KIP

2024-05-21 Thread jiang dou
You should send your jira ID and wiki ID,
Please refer to this address :
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

黃竣陽  于2024年5月21日周二 22:42写道:

> I am writing to request authorization to create a KIP.
>
> Currently, I do not have the necessary permissions to access the 'Create
> KIP' function. My account email is s7133...@gmail.com.
>
> Could you please grant me the required permissions to create a KIP? Thanks
> for your help.
>
>


Re: [DISCUSS] KIP-1027 Add MockFixedKeyProcessorContext

2024-05-21 Thread Shashwat Pandey
Looking at the ticket and the sample code, I think it would be possible to
continue using `InternalFixedKeyRecordFactory` as the avenue to create
`FixedKeyRecord`s in tests. As long as there was a
MockFixedKeyProcessorContext, I think we would be able to test
FixedKeyProcessors without a Topology.

I created a sample repo with the `MockFixedKeyProcessorContext` here is
what I think the tests would look like:
https://github.com/s7pandey/kafka-processor-tests/blob/main/src/test/java/com/example/demo/MyFixedKeyProcessorTest.java



On Mon, May 20, 2024 at 9:05 PM Matthias J. Sax  wrote:

> Had a discussion on https://issues.apache.org/jira/browse/KAFKA-15242
> and it was pointed out, that we also need to do something about
> `FixedKeyRecord`. It does not have a public constructor (what is
> correct; it should not have one). However, this makes testing
> `FixedKeyProcessor` impossible w/o extending `FixedKeyRecord` manually
> what does not seem to be right (too clumsy).
>
> It seems, we either need some helper builder method (but not clear to me
> where to add it in an elegant way) which would provide us with a
> `FixedKeyRecord`, or add some sub-class to the test-utils module which
> would extend `FixedKeyRecord`? -- Or maybe an even better solution? I
> could not think of something else so far.
>
>
> Thoughts?
>
>
> On 5/3/24 9:46 AM, Matthias J. Sax wrote:
> > Please also update the KIP.
> >
> > To get a wiki account created, please request it via a commet on this
> > ticket: https://issues.apache.org/jira/browse/INFRA-25451
> >
> > After you have the account, please share your wiki id, and we can give
> > you write permission on the wiki.
> >
> >
> >
> > -Matthias
> >
> > On 5/3/24 6:30 AM, Shashwat Pandey wrote:
> >> Hi Matthias,
> >>
> >> Sorry this fell out of my radar for a bit.
> >>
> >> Revisiting the topic, I think you’re right and we accept the duplicated
> >> nesting as an appropriate solution to not affect the larger public API.
> >>
> >> I can update my PR with the change.
> >>
> >> Regards,
> >> Shashwat Pandey
> >>
> >>
> >> On Wed, May 1, 2024 at 11:00 PM Matthias J. Sax 
> wrote:
> >>
> >>> Any updates on this KIP?
> >>>
> >>> On 3/28/24 4:11 AM, Matthias J. Sax wrote:
>  It seems that `MockRecordMetadata` is a private class, and thus not
>  part
>  of the public API. If there are any changes required, we don't need to
>  discuss on the KIP.
> 
> 
>  For `CapturedPunctuator` and `CapturedForward` it's a little bit more
>  tricky. My gut feeling is, that the classes might not need to be
>  changed, but if we use them within `MockProcessorContext` and
>  `MockFixedKeyProcessorContext` it might be weird to keep the current
>  nesting... The problem I see is, that it's not straightforward how to
>  move the classes w/o breaking compatibility, nor if we duplicate
>  them as
>  standalone classes w/o a larger "splash radius". (We would need to add
>  new overloads for MockProcessorContext#scheduledPunctuators() and
>  MockProcessorContext#forwarded()).
> 
>  Might be good to hear from others if we think it's worth this larger
>  changes to get rid of the nesting, or just accept the somewhat not
>  ideal
>  nesting as it technically is not a real issue?
> 
> 
>  -Matthias
> 
> 
>  On 3/15/24 1:47 AM, Shashwat Pandey wrote:
> > Thanks for the feedback Matthias!
> >
> > The reason I proposed the extension of MockProcessorContext was more
> > to do
> > with the internals of the class (MockRecordMetadata,
> > CapturedPunctuator and
> > CapturedForward).
> >
> > However, I do see your point, I would then think to split
> > MockProcessorContext and MockFixedKeyProcessorContext, some of the
> > internal
> > classes should also be extracted i.e. MockRecordMetadata,
> > CapturedPunctuator and probably a new CapturedFixedKeyForward.
> >
> > Let me know what you think!
> >
> >
> > Regards,
> > Shashwat Pandey
> >
> >
> > On Mon, Mar 11, 2024 at 10:09 PM Matthias J. Sax 
> > wrote:
> >
> >> Thanks for the KIP Shashwat. Closing this testing gap is great! It
> >> did
> >> come up a few time already...
> >>
> >> One question: why do you propose to `extend MockProcessorContext`?
> >>
> >> Given how the actual runtime context classes are setup, it seems
> that
> >> the regular context and fixed-key-context are distinct, and thus I
> >> believe both mock-context classes should be distinct, too?
> >>
> >> What I mean is that FixedKeyProcessorContext does not extend
> >> ProcessorContext. Both classes have a common parent
> ProcessINGContext
> >> (note the very similar but different names), but they are "siblings"
> >> only, so why make the mock processor a parent-child relationship?
> >>
> >> It seems better to do
> >>
> >> public class MockFixed

[jira] [Created] (KAFKA-16809) Run javadoc build in CI

2024-05-21 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16809:
---

 Summary: Run javadoc build in CI
 Key: KAFKA-16809
 URL: https://issues.apache.org/jira/browse/KAFKA-16809
 Project: Kafka
  Issue Type: Task
  Components: build, docs
Reporter: Greg Harris
Assignee: Greg Harris


The `javadoc` target isn't run during CI builds, allowing for errors in 
javadocs to leak in.

Instead, we can include javadoc like checkstyle, spotbugs, and import control 
as a pre-test step, to ensure that PRs aren't causing javadoc build regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15242) FixedKeyProcessor testing is unusable

2024-05-21 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-15242.
-
  Assignee: (was: Alexander Aghili)
Resolution: Duplicate

> FixedKeyProcessor testing is unusable
> -
>
> Key: KAFKA-15242
> URL: https://issues.apache.org/jira/browse/KAFKA-15242
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Zlstibor Veljkovic
>Priority: Major
>
> Using mock processor context to get the forwarded message doesn't work.
> Also there is not a well documented way for testing FixedKeyProcessors.
> Please see the repo at [https://github.com/zveljkovic/kafka-repro]
> but most important piece is test file with runtime and compile time errors:
> [https://github.com/zveljkovic/kafka-repro/blob/master/src/test/java/com/example/demo/MyFixedKeyProcessorTest.java]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer

2024-05-21 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16578.
---
Resolution: Won't Fix

Most of the {{connect_distributed_test.py}} system tests were fixed, and 
{{test_exactly_once_source}} was reverted in a separate Jira/PR.

> Revert changes to connect_distributed_test.py for the new async Consumer
> 
>
> Key: KAFKA-16578
> URL: https://issues.apache.org/jira/browse/KAFKA-16578
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer, system tests
>Affects Versions: 3.8.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Critical
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated 
> a slew of system tests to run both the "old" and "new" implementations. 
> KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it 
> could test the new consumer with Connect. However, we are not supporting 
> Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the 
> Connect system tests with the new {{AsyncKafkaConsumer}}, we get errors like 
> the following:
> {code}
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   6 minutes 3.899 seconds
> InsufficientResourcesError('Not enough nodes available to allocate. linux 
> nodes requested: 1. linux nodes available: 0')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 919, in test_exactly_once_source
> consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, 
> self.source.topic, consumer_timeout_ms=1000, print_key=True)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py",
>  line 97, in __init__
> BackgroundThreadService.__init__(self, context, num_nodes)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
>  line 26, in __init__
> super(BackgroundThreadService, self).__init__(context, num_nodes, 
> cluster_spec, *args, **kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
>  line 107, in __init__
> self.allocate_nodes()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
>  line 217, in allocate_nodes
> self.nodes = self.cluster.alloc(self.cluster_spec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py",
>  line 54, in alloc
> allocated = self.do_alloc(cluster_spec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py",
>  line 31, in do_alloc
> allocated = self._available_nodes.remove_spec(cluster_spec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py",
>  line 117, in remove_spec
> raise InsufficientResourcesError("Not enough nodes available to allocate. 
> " + msg)
> ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes 
> available to allocate. linux nodes requested: 1. linux nodes available: 0
> {code}
> The task here is to revert the changes made in KAFKA-16272 [PR 
> 15576|https://github.com/apache/kafka/pull/15576].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Permission to contribute to Apache Kafka

2024-05-21 Thread Harry Fallows
Hello,

I am following the [Getting Started guide for writing 
KIPs](https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals).
 Could someone give me the permissions to write a KIP please? My Wiki ID is 
harryfallows, my Jira ID is hfallows, and my email address is 
harryfall...@protonmail.com.

Thanks,
Harry

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

2024-05-21 Thread Greg Harris
Hi Murali,

I don't have a trie library in mind. I looked at the current
implementation of the StandardAuthorizer and found that we are already
benefiting from the prefix structure in the implementation [1]. The
current implementation appears to be a TreePSet [2].

Now, we've already made this tradeoff once with PREFIX: Prefixes are
less structured than literals, because with literals you can use a
hashing algorithm to jump directly to your relevant ACLs in O(1), but
with a prefix you either need to do multiple lookups, or some sort of
O(log(n)) lookup. And we determined that the ultimate limitation in
performance was worth it for the expressiveness.
We're making this tradeoff again with MATCH acls: wildcards are less
structured than prefixes or literals, because of the reasons I
mentioned earlier. We need to judge now if the ultimate limitation in
performance is worth it.

I think your strategy for using the optimized layout for prefix and
literal matches is smart, because there does seem to be a gap in
performance possible. It makes me wonder why the optimized layout for
literals was not used when prefixes were added. Literal lookups still
go through the tree lookup, when they could be moved to a hash-lookup
instead.
That would allow users to "choose" for themselves on a convenience vs
performance scale: Smaller use-cases can add a single convenient
MATCH, and larger use-cases can add the multiple optimized PREFIXes.

[1] 
https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339
[2] 
https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34

Thanks,
Greg

On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani
 wrote:
>
> @greg which library of trie you were thinking of ? There is one in the
> commons-collection I see.
>
> On Fri, May 17, 2024 at 3:55 PM Claude Warren  wrote:
>
> > >
> > > This has implications for execution complexity: If we can't compute
> > > whether two patterns overlap, then we need to run both of them on each
> > > piece of input to test if they both match. Under the current
> > > LITERAL/PREFIX system, we can optimize execution with a trie, but that
> > > option wouldn't be available to us with MATCH.
> > >
> >
> > If we consider the case of an asterisk representing 1 or more characters
> > then determining if 2 patterns overlap can be fairly simple and very fast.
> > Let's call the text from the ACL the target, and the text from the wildcard
> > matcher the candidate.
> >
> > When a wildcard pattern, excluding '*',  is created:
> >
> >- the candidate text is broken into fragments separated by characters
> >that are not Character.isLetterOrDigit() (See
> >https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).
> >- fragments that contain 1 character are ignored.
> >- fragments that contains 2 or more characters are scanned and every
> >every pair of characters used to create a Hasher (See
> >
> > https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
> > )
> >that hasher is added to a Bloom filter associated with the wildcard
> > pattern.
> >
> > When a target is being evaluated and matching wildcard entries are to be
> > located. Split and create a bloom filter using entry and the same strategy
> > as for the wildcard patterns above.  These bloom filters will have had more
> > pairs of characters added than for a matching wildcard pattern.
> >
> > Each filter contains the pattern for the unique pairs in the fragments of
> > the original text:
> >
> > Now we can check the Bloom filters.
> >
> > To find potential matching patterns we can check to see if the Bloom filter
> > for the pattern is contained within the bloom filter for the entry text.
> > If so then we know that it is likely that the pairs of characters specified
> > in the wild card (non-wildcard text) appear in the entry text.
> >
> > At this point we can evaluate the reduced number of patterns to see which
> > ones actually match.
> >
> > Having reduced the candidates to the matching patterns we can then use a
> > standard measure of similarity like the Levenshtein distance to determine
> > which candidates require the fewest edits to evolve into the target.  The
> > one with the fewest edits is the most specific and should be applied.
> >
> > Now if you want to know which patterns overlap you have a similar process.
> >
> > Each Bloom filter can calculate the estimated number of unique fragments
> > that were added to it.  If the filters are sorted by this estimation, the
> > ones with the highest counts may contain the ones with the lowest counts,
> > but a lower count can never contain a higher count.  The calculation to
> > perform the estimation can be skipped and 

[jira] [Resolved] (KAFKA-7632) Support Compression Level

2024-05-21 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-7632.
---
Fix Version/s: 3.8.0
 Assignee: Mickael Maison  (was: Dongjin Lee)
   Resolution: Fixed

> Support Compression Level
> -
>
> Key: KAFKA-7632
> URL: https://issues.apache.org/jira/browse/KAFKA-7632
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 2.1.0
> Environment: all
>Reporter: Dave Waters
>Assignee: Mickael Maison
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.8.0
>
>
> The compression level for ZSTD is currently set to use the default level (3), 
> which is a conservative setting that in some use cases eliminates the value 
> that ZSTD provides with improved compression. Each use case will vary, so 
> exposing the level as a producer, broker, and topic configuration setting 
> will allow the user to adjust the level.
> Since it applies to the other compression codecs, we should add the same 
> functionalities to them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation

2024-05-21 Thread Justine Olshan
Can you clarify the intended behavior? If we encounter a producer ID we've
not seen before, we are supposed to read from disk and try to find it? I
see the proposal mentions bloom filters, but it seems like it would not be
cheap to search for the producer ID. I would expect the typical case to be
that there is a new producer and we don't need to search state.

And we intend to keep all producers we've ever seen on the cluster? I
didn't see a mechanism to delete any of the information in the snapshots.
Currently the snapshot logic is decoupled from the log retention as of
KIP-360.

Justine

On Mon, May 20, 2024 at 11:20 PM Claude Warren  wrote:

> The LRU cache is just that: a cache, so yes things expire from the cache
> but they are not gone.  As long as a snapshot containing the PID is
> available the PID can be found and reloaded into the cache (which is
> exactly what I would expect it to do).
>
> The question of how long a PID is resolvable then becomes a question of how
> long are snapshots retained.
>
> There are, in my mind, several advantages:
>
>1. The in-memory cache can be smaller, reducing the memory footprint.
>This is not required but is possible.
>2. PIDs are never discarded because they are produced by slow
>producers.  They are discarded when the snapshots containing them
> expire.
>3. The length of time between when a PID is received by the server and
>when it is recorded to a snapshot is significantly reduced.
> Significantly
>reducing the window where PIDs can be lost.
>4. Throttling and other changes you wish to make to the cache are still
>possible.
>
>
> On Mon, May 20, 2024 at 7:32 PM Justine Olshan
> 
> wrote:
>
> > My team has looked at it from a high level, but we haven't had the time
> to
> > come up with a full proposal.
> >
> > I'm not aware if others have worked on it.
> >
> > Justine
> >
> > On Mon, May 20, 2024 at 10:21 AM Omnia Ibrahim 
> > wrote:
> >
> > > Hi Justine are you aware of anyone looking into such new protocol at
> the
> > > moment?
> > >
> > > > On 20 May 2024, at 18:00, Justine Olshan
>  > >
> > > wrote:
> > > >
> > > > I would say I have first hand knowledge of this issue as someone who
> > > > responds to such incidents as part of my work at Confluent over the
> > past
> > > > couple years. :)
> > > >
> > > >> We only persist the information for the length of time we retain
> > > > snapshots.
> > > > This seems a bit contradictory to me. We are going to persist
> > > (potentially)
> > > > useless information if we have no signal if the producer is still
> > active.
> > > > This is the problem we have with old clients. We are always going to
> > have
> > > > to draw the line for how long we allow a producer to have a gap in
> > > > producing vs how long we allow filling up with short-lived producers
> > that
> > > > risk OOM.
> > > >
> > > > With an LRU cache, we run into the same problem, as we will expire
> all
> > > > "well-behaved" infrequent producers that last produced before the
> burst
> > > of
> > > > short-lived clients. The benefit is that we don't have a solid line
> in
> > > the
> > > > sand and we only expire when we need to, but we will still risk
> > expiring
> > > > active producers.
> > > >
> > > > I am willing to discuss some solutions that work with older clients,
> > but
> > > my
> > > > concern is spending too much time on a complicated solution and not
> > > > encouraging movement to newer and better clients.
> > > >
> > > > Justine
> > > >
> > > > On Mon, May 20, 2024 at 9:35 AM Claude Warren 
> > wrote:
> > > >
> > > >>>
> > > >>> Why should we persist useless information
> > > >>> for clients that are long gone and will never use it?
> > > >>
> > > >>
> > > >> We are not.  We only persist the information for the length of time
> we
> > > >> retain snapshots.   The change here is to make the snapshots work as
> > > longer
> > > >> term storage for infrequent producers and others would would be
> > > negatively
> > > >> affected by some of the solutions proposed.
> > > >>
> > > >> Your changes require changes in the clients.   Older clients will
> not
> > be
> > > >> able to participate.  My change does not require client change.
> > > >> There are issues outside of the ones discussed.  I was told of this
> > late
> > > >> last week.  I will endeavor to find someone with first hand
> knowledge
> > of
> > > >> the issue and have them report on this thread.
> > > >>
> > > >> In addition, the use of an LRU amortizes the cache cleanup so we
> don't
> > > need
> > > >> a thread to expire things.  You still have the cache, the point is
> > that
> > > it
> > > >> really is a cache, there is storage behind it.  Let the cache be a
> > > cache,
> > > >> let the snapshots be the storage backing behind the cache.
> > > >>
> > > >> On Fri, May 17, 2024 at 5:26 PM Justine Olshan
> > > >> 
> > > >> wrote:
> > > >>
> > > >>> Respectfully, I don't agree. Why should we persist useless
> > information
> > > >>> for cl

Permission to contribute to Apache Kafka

2024-05-21 Thread Harry Fallows
Hello,

I am following the[Getting Started guide for writing 
KIPs](https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals).
 Could someone give me the permissions to write a KIP please? My Wiki ID is 
harryfallows, my Jira ID is hfallows, and my email address is 
harryfall...@protonmail.com.

Thanks,Harry

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

2024-05-21 Thread Muralidhar Basani
@greg which library of trie you were thinking of ? There is one in the
commons-collection I see.

On Fri, May 17, 2024 at 3:55 PM Claude Warren  wrote:

> >
> > This has implications for execution complexity: If we can't compute
> > whether two patterns overlap, then we need to run both of them on each
> > piece of input to test if they both match. Under the current
> > LITERAL/PREFIX system, we can optimize execution with a trie, but that
> > option wouldn't be available to us with MATCH.
> >
>
> If we consider the case of an asterisk representing 1 or more characters
> then determining if 2 patterns overlap can be fairly simple and very fast.
> Let's call the text from the ACL the target, and the text from the wildcard
> matcher the candidate.
>
> When a wildcard pattern, excluding '*',  is created:
>
>- the candidate text is broken into fragments separated by characters
>that are not Character.isLetterOrDigit() (See
>https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).
>- fragments that contain 1 character are ignored.
>- fragments that contains 2 or more characters are scanned and every
>every pair of characters used to create a Hasher (See
>
> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
> )
>that hasher is added to a Bloom filter associated with the wildcard
> pattern.
>
> When a target is being evaluated and matching wildcard entries are to be
> located. Split and create a bloom filter using entry and the same strategy
> as for the wildcard patterns above.  These bloom filters will have had more
> pairs of characters added than for a matching wildcard pattern.
>
> Each filter contains the pattern for the unique pairs in the fragments of
> the original text:
>
> Now we can check the Bloom filters.
>
> To find potential matching patterns we can check to see if the Bloom filter
> for the pattern is contained within the bloom filter for the entry text.
> If so then we know that it is likely that the pairs of characters specified
> in the wild card (non-wildcard text) appear in the entry text.
>
> At this point we can evaluate the reduced number of patterns to see which
> ones actually match.
>
> Having reduced the candidates to the matching patterns we can then use a
> standard measure of similarity like the Levenshtein distance to determine
> which candidates require the fewest edits to evolve into the target.  The
> one with the fewest edits is the most specific and should be applied.
>
> Now if you want to know which patterns overlap you have a similar process.
>
> Each Bloom filter can calculate the estimated number of unique fragments
> that were added to it.  If the filters are sorted by this estimation, the
> ones with the highest counts may contain the ones with the lowest counts,
> but a lower count can never contain a higher count.  The calculation to
> perform the estimation can be skipped and the cardinality value of each
> Bloom filter can be used instead.  You can then check the smaller filters
> against the larger ones and find where one candidate is contained within
> another.
>
> If you want to know if they intersect the BloomFilter from
> commons-collections has an estimateIntersection as well as an estimateUnion
> method.  Quick tests can be made between filters to see if there is any
> overlap before more complex analysis is performed.
>
> The solution here does not make the final comparisons easier, it simply
> reduces the search space to find the items that need to be compared.
>
>
> On Mon, May 6, 2024 at 9:03 PM Greg Harris 
> wrote:
>
> > Hi Murali,
> >
> > Thanks for the KIP!
> >
> > I think I understand the motivation for this KIP in situations where
> > there are a "cross product" of topics for two or more variables X and
> > Y, and want to write ACLs for each of the variable axes.
> > If you format your topics "X-Y-suffix", it's not easy to write rules
> > that apply to all "Y" topics, because you need to enumerate all of the
> > "X" values, and the problem persists even if you reorder the topic
> > name.
> >
> > In my recent work on KIP-986 I found it necessary to introduce
> > "namespaces" to group topics together, and I was going to replicate
> > the ACL system to specify those namespaces. This change to the ACL
> > system could increase the expressiveness and complexity of that
> > feature, if it is ever implemented.
> > One of the primitives I needed when specifying namespaces was the
> > ability to tell when two namespaces overlapped (i.e. does there exist
> > any topic which is present in both namespaces). This is trivial to do
> > with the current PREFIX and LITERAL system, as we can find the
> > maximum-length common prefix with just some length comparisons and
> > equality checks.
> > I considered specifying namespaces via regular expressions, and found
> > that it was computationally much more difficult. Computing the
> > intersection of two regexes

[jira] [Resolved] (KAFKA-16784) Migrate TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest to new test infra

2024-05-21 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16784.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Migrate TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest to new 
> test infra
> -
>
> Key: KAFKA-16784
> URL: https://issues.apache.org/jira/browse/KAFKA-16784
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
>  Labels: storage_test
> Fix For: 3.8.0
>
>
> as title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Request for Authorization to Create KIP

2024-05-21 Thread 黃竣陽
I am writing to request authorization to create a KIP. 

Currently, I do not have the necessary permissions to access the 'Create KIP' 
function. My account email is s7133...@gmail.com. 

Could you please grant me the required permissions to create a KIP? Thanks for 
your help.



Re: [VOTE] KIP-932: Queues for Kafka

2024-05-21 Thread Andrew Schofield
Hi Jun,
All the client metrics are standard. None are required.

I’ve updated the KIP accordingly.

Thanks,
Andrew

> On 15 May 2024, at 21:36, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the update. Should we mark whether those metrics are
> standard/required for KIP-714?
> 
> Jun
> 
> On Tue, May 14, 2024 at 7:31 AM Andrew Schofield 
> wrote:
> 
>> Hi,
>> I have made a small update to the KIP as a result of testing the new
>> share consumer with client telemetry (KIP-714).
>> 
>> I’ve added telemetry metric names to the table of client metrics and
>> also updated the metric group names so that the resulting client metrics
>> sent to the broker have consistent names.
>> 
>> Thanks,
>> Andrew
>> 
>>> On 8 May 2024, at 12:51, Manikumar  wrote:
>>> 
>>> Hi Andrew,
>>> 
>>> Thanks for the KIP.  Great write-up!
>>> 
>>> +1 (binding)
>>> 
>>> Thanks,
>>> 
>>> On Wed, May 8, 2024 at 12:17 PM Satish Duggana 
>> wrote:
 
 Hi Andrew,
 Thanks for the nice KIP, it will allow other messaging use cases to be
 onboarded to Kafka.
 
 +1 from me.
 
 Satish.
 
 On Tue, 7 May 2024 at 03:41, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the KIP. +1
> 
> Jun
> 
> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar 
> wrote:
> 
>> Thanks Andrew,
>> 
>> +1 (binding)
>> 
>> Edo
>> 
>> On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
>>  wrote:
>>> 
>>> Hi Andrew
>>> 
>>> + 1 (Non-Binding)
>>> 
>>> This will be great addition to Kafka
>>> 
>>> On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal <
>> apoorvmitta...@gmail.com>
>>> wrote:
>>> 
 Hi Andrew,
 Thanks for writing the KIP. This is indeed going to be a valuable
>> addition
 to the Kafka, excited to see the KIP.
 
 + 1 (Non-Binding)
 
 Regards,
 Apoorv Mittal
 +44 7721681581
 
 
 On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
 andrew_schofield_j...@outlook.com> wrote:
 
> Hi,
> I’ve been working to complete KIP-932 over the past few months and
> discussions have quietened down.
> 
> I’d like to open the voting for KIP-932:
> 
> 
> 
 
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
> 
> Thanks,
> Andrew




[jira] [Resolved] (KAFKA-16654) Refactor kafka.test.annotation.Type and ClusterTestExtensions

2024-05-21 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16654.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Refactor kafka.test.annotation.Type and ClusterTestExtensions
> -
>
> Key: KAFKA-16654
> URL: https://issues.apache.org/jira/browse/KAFKA-16654
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: TaiJuWu
>Priority: Minor
> Fix For: 3.8.0
>
>
> It seems to me the refactor could include following tasks.
> 1. change `invocationContexts`, method invoked by `ClusterTemplate`, and 
> generate-related methods in `ClusterTestExtensions` to return a 
> java.util.Collection instead of accepting a `java.util.function.Consumer`. 
> That can brings two benefit. 1) more simple in production: we don't need to 
> create a List and then pass it to be a function to collect stuff. 2)  more 
> easy to write unit test.
> 2. separate `provideTestTemplateInvocationContexts` to multi methods to 
> handle each annotation. That can help us to write tests, and make core more 
> readable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2923

2024-05-21 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-1031: Control offset translation in MirrorSourceConnector

2024-05-21 Thread Omnia Ibrahim
> I have added comments to your PR (
> https://github.com/apache/kafka/pull/15999#pullrequestreview-2066823538)
> 
> in short, `sourcePartition` and `sourceOffset` are unused if
> emit.offset-syncs.enabled=false
I’ll have a look into the PR.
Also regarding my previous comment on `sync.group.offsets.interval.seconds` we 
don’t need to check this if it is -1 as the only way for 
`sync.group.offsets.interval.seconds` or `emit.checkpoints.interval.seconds` to 
be -1 is if emit.checkpoints.enabled and `sync.group.offsets.enabled are both 
false which we check in MirrorCheckpointConnector

Anyway we can continue discussing this on the PR

> BTW, I'm +1 to this KIP, and I noticed my previous comments are related to
> code. Hence, please feel free to open votes. We can have discussion about
> the code later.

It was voted in few weeks ago.

Omnia

> On 21 May 2024, at 14:25, Chia-Ping Tsai  wrote:
> 
>> Which SourceRecord are you referring to here?
> 
> I have added comments to your PR (
> https://github.com/apache/kafka/pull/15999#pullrequestreview-2066823538)
> 
> in short, `sourcePartition` and `sourceOffset` are unused if
> emit.offset-syncs.enabled=false
> 
> BTW, I'm +1 to this KIP, and I noticed my previous comments are related to
> code. Hence, please feel free to open votes. We can have discussion about
> the code later.
> 
> 
> 
> Omnia Ibrahim  於 2024年5月21日 週二 下午9:10寫道:
> 
>> Hi Chia-Ping
>>> It seems we can disable the sync of idle consumers by setting
>> `sync.group.offsets.interval.seconds` to -1, so the fail-fast should
>> include sync.group.offsets.interval.seconds too. For another, maybe we
>> should do fail-fast for MirrorCheckpointConnector even though we don't have
>> this KIP
>> I don’t think we need to fail fast with
>> `sync.group.offsets.interval.seconds` to -1, as `MirrorCheckpointConnector`
>> runs two functionality based on offset-syncs topic that can run separately
>> 1. Write group offset to checkpoints internal topic can be disabled with
>> `emit.checkpoints.interval.seconds` -1
>> 2. Schedule syncing the group offset to __consumer_offsets later can be
>> disabled with  `sync.group.offsets.interval.seconds` to -1
>> 
>> So technically `MirrorCheckpointConnector` can run if only one of these
>> intervals is set to -1 however, if we want to fail fast we should check
>> both `sync.group.offsets.interval.seconds`  and
>> `emit.checkpoints.interval.seconds` not set to -1 as this would be useless.
>> 
>> 
>>> 2) Should we do similar fail-fast for MirrorSourceConnector if user set
>> custom producer configs with emit.offset-syncs.enabled=false? I assume the
>> producer which sending records to offset-syncs topic won't be created if
>> emit.offset-syncs.enabled=false
>> This is a good point I’ll update MirrorSourceConnector’s validate method
>> to address this. I think we should also address
>> `offset-syncs.topic.location` and `offset-syncs.topic.replication.factor`
>> as well as custom consumer, and admin client configs.
>> 
>> 
>>> 3) Should we simplify the SourceRecord if
>> emit.offset-syncs.enabled=false? Maybe that can get a bit performance
>> improvement.
>> Which SourceRecord are you referring to here?
>> 
>> Omnia
>>> On 20 May 2024, at 16:58, Chia-Ping Tsai  wrote:
>>> 
>>> Nice KIP. some minor comments/questions are listed below.
>>> 
>>> 1) It seems we can disable the sync of idle consumers by setting
>> `sync.group.offsets.interval.seconds` to -1, so the fail-fast should
>> include sync.group.offsets.interval.seconds too. For another, maybe we
>> should do fail-fast for MirrorCheckpointConnector even though we don't have
>> this KIP
>>> 
>>> 2) Should we do similar fail-fast for MirrorSourceConnector if user set
>> custom producer configs with emit.offset-syncs.enabled=false? I assume the
>> producer which sending records to offset-syncs topic won't be created if
>> emit.offset-syncs.enabled=false
>>> 
>>> 3) Should we simplify the SourceRecord if
>> emit.offset-syncs.enabled=false? Maybe that can get a bit performance
>> improvement.
>>> 
>>> Best,
>>> Chia-Ping
>>> 
>>> On 2024/04/08 10:03:50 Omnia Ibrahim wrote:
 Hi Chris,
 Validation method is a good call. I updated the KIP to state that the
>> checkpoint connector will fail if the configs aren’t correct. And updated
>> the description of the new config to explain the impact of it on checkpoint
>> connector as well.
 
 If there is no any other feedback from anyone I would like to start the
>> voting thread in few days.
 Thanks
 Omnia
 
> On 8 Apr 2024, at 06:31, Chris Egerton 
>> wrote:
> 
> Hi Omnia,
> 
> Ah, good catch. I think failing to start the checkpoint connector if
>> offset
> syncs are disabled is fine. We'd probably want to do that via the
> Connector::validate [1] method in order to be able to catch invalid
>> configs
> during preflight validation, but it's not necessary to get that
>> specific in
> the KIP (especially since we may add other checks a

Re: [DISCUSS] KIP-1031: Control offset translation in MirrorSourceConnector

2024-05-21 Thread Chia-Ping Tsai
> Which SourceRecord are you referring to here?

I have added comments to your PR (
https://github.com/apache/kafka/pull/15999#pullrequestreview-2066823538)

in short, `sourcePartition` and `sourceOffset` are unused if
emit.offset-syncs.enabled=false

BTW, I'm +1 to this KIP, and I noticed my previous comments are related to
code. Hence, please feel free to open votes. We can have discussion about
the code later.



Omnia Ibrahim  於 2024年5月21日 週二 下午9:10寫道:

> Hi Chia-Ping
> >  It seems we can disable the sync of idle consumers by setting
> `sync.group.offsets.interval.seconds` to -1, so the fail-fast should
> include sync.group.offsets.interval.seconds too. For another, maybe we
> should do fail-fast for MirrorCheckpointConnector even though we don't have
> this KIP
> I don’t think we need to fail fast with
> `sync.group.offsets.interval.seconds` to -1, as `MirrorCheckpointConnector`
> runs two functionality based on offset-syncs topic that can run separately
> 1. Write group offset to checkpoints internal topic can be disabled with
> `emit.checkpoints.interval.seconds` -1
> 2. Schedule syncing the group offset to __consumer_offsets later can be
> disabled with  `sync.group.offsets.interval.seconds` to -1
>
> So technically `MirrorCheckpointConnector` can run if only one of these
> intervals is set to -1 however, if we want to fail fast we should check
> both `sync.group.offsets.interval.seconds`  and
> `emit.checkpoints.interval.seconds` not set to -1 as this would be useless.
>
>
> > 2) Should we do similar fail-fast for MirrorSourceConnector if user set
> custom producer configs with emit.offset-syncs.enabled=false? I assume the
> producer which sending records to offset-syncs topic won't be created if
> emit.offset-syncs.enabled=false
> This is a good point I’ll update MirrorSourceConnector’s validate method
> to address this. I think we should also address
> `offset-syncs.topic.location` and `offset-syncs.topic.replication.factor`
> as well as custom consumer, and admin client configs.
>
>
> > 3) Should we simplify the SourceRecord if
> emit.offset-syncs.enabled=false? Maybe that can get a bit performance
> improvement.
> Which SourceRecord are you referring to here?
>
> Omnia
> > On 20 May 2024, at 16:58, Chia-Ping Tsai  wrote:
> >
> > Nice KIP. some minor comments/questions are listed below.
> >
> > 1) It seems we can disable the sync of idle consumers by setting
> `sync.group.offsets.interval.seconds` to -1, so the fail-fast should
> include sync.group.offsets.interval.seconds too. For another, maybe we
> should do fail-fast for MirrorCheckpointConnector even though we don't have
> this KIP
> >
> > 2) Should we do similar fail-fast for MirrorSourceConnector if user set
> custom producer configs with emit.offset-syncs.enabled=false? I assume the
> producer which sending records to offset-syncs topic won't be created if
> emit.offset-syncs.enabled=false
> >
> > 3) Should we simplify the SourceRecord if
> emit.offset-syncs.enabled=false? Maybe that can get a bit performance
> improvement.
> >
> > Best,
> > Chia-Ping
> >
> > On 2024/04/08 10:03:50 Omnia Ibrahim wrote:
> >> Hi Chris,
> >> Validation method is a good call. I updated the KIP to state that the
> checkpoint connector will fail if the configs aren’t correct. And updated
> the description of the new config to explain the impact of it on checkpoint
> connector as well.
> >>
> >> If there is no any other feedback from anyone I would like to start the
> voting thread in few days.
> >> Thanks
> >> Omnia
> >>
> >>> On 8 Apr 2024, at 06:31, Chris Egerton 
> wrote:
> >>>
> >>> Hi Omnia,
> >>>
> >>> Ah, good catch. I think failing to start the checkpoint connector if
> offset
> >>> syncs are disabled is fine. We'd probably want to do that via the
> >>> Connector::validate [1] method in order to be able to catch invalid
> configs
> >>> during preflight validation, but it's not necessary to get that
> specific in
> >>> the KIP (especially since we may add other checks as well).
> >>>
> >>> [1] -
> >>>
> https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/connector/Connector.html#validate(java.util.Map)
> >>>
> >>> Cheers,
> >>>
> >>> Chris
> >>>
> >>> On Thu, Apr 4, 2024 at 8:07 PM Omnia Ibrahim 
> >>> wrote:
> >>>
>  Thanks Chris for the feedback
> > 1. It'd be nice to mention that increasing the max offset lag to
> INT_MAX
> > could work as a partial workaround for users on existing versions
> (though
> > of course this wouldn't prevent creation of the syncs topic).
>  I updated the KIP
> 
> > 2. Will it be illegal to disable offset syncs if other features that
> rely
> > on offset syncs are explicitly enabled in the connector config? If
>  they're
> > not explicitly enabled then it should probably be fine to silently
>  disable
> > them, but I'd be interested in your thoughts.
>  The rest of the features that relays on this is controlled by
>  emit.checkpoints.enabled (enabled by defau

Re: [DISCUSS] KIP-1031: Control offset translation in MirrorSourceConnector

2024-05-21 Thread Omnia Ibrahim
Hi Chia-Ping 
>  It seems we can disable the sync of idle consumers by setting 
> `sync.group.offsets.interval.seconds` to -1, so the fail-fast should include 
> sync.group.offsets.interval.seconds too. For another, maybe we should do 
> fail-fast for MirrorCheckpointConnector even though we don't have this KIP
I don’t think we need to fail fast with `sync.group.offsets.interval.seconds` 
to -1, as `MirrorCheckpointConnector` runs two functionality based on 
offset-syncs topic that can run separately 
1. Write group offset to checkpoints internal topic can be disabled with  
`emit.checkpoints.interval.seconds` -1
2. Schedule syncing the group offset to __consumer_offsets later can be 
disabled with  `sync.group.offsets.interval.seconds` to -1

So technically `MirrorCheckpointConnector` can run if only one of these 
intervals is set to -1 however, if we want to fail fast we should check both 
`sync.group.offsets.interval.seconds`  and `emit.checkpoints.interval.seconds` 
not set to -1 as this would be useless. 

 
> 2) Should we do similar fail-fast for MirrorSourceConnector if user set 
> custom producer configs with emit.offset-syncs.enabled=false? I assume the 
> producer which sending records to offset-syncs topic won't be created if 
> emit.offset-syncs.enabled=false
This is a good point I’ll update MirrorSourceConnector’s validate method to 
address this. I think we should also address `offset-syncs.topic.location` and 
`offset-syncs.topic.replication.factor` as well as custom consumer, and admin 
client configs.


> 3) Should we simplify the SourceRecord if emit.offset-syncs.enabled=false? 
> Maybe that can get a bit performance improvement.
Which SourceRecord are you referring to here? 

Omnia
> On 20 May 2024, at 16:58, Chia-Ping Tsai  wrote:
> 
> Nice KIP. some minor comments/questions are listed below.
> 
> 1) It seems we can disable the sync of idle consumers by setting 
> `sync.group.offsets.interval.seconds` to -1, so the fail-fast should include 
> sync.group.offsets.interval.seconds too. For another, maybe we should do 
> fail-fast for MirrorCheckpointConnector even though we don't have this KIP
> 
> 2) Should we do similar fail-fast for MirrorSourceConnector if user set 
> custom producer configs with emit.offset-syncs.enabled=false? I assume the 
> producer which sending records to offset-syncs topic won't be created if 
> emit.offset-syncs.enabled=false
> 
> 3) Should we simplify the SourceRecord if emit.offset-syncs.enabled=false? 
> Maybe that can get a bit performance improvement.
> 
> Best,
> Chia-Ping
> 
> On 2024/04/08 10:03:50 Omnia Ibrahim wrote:
>> Hi Chris, 
>> Validation method is a good call. I updated the KIP to state that the 
>> checkpoint connector will fail if the configs aren’t correct. And updated 
>> the description of the new config to explain the impact of it on checkpoint 
>> connector as well. 
>> 
>> If there is no any other feedback from anyone I would like to start the 
>> voting thread in few days. 
>> Thanks 
>> Omnia
>> 
>>> On 8 Apr 2024, at 06:31, Chris Egerton  wrote:
>>> 
>>> Hi Omnia,
>>> 
>>> Ah, good catch. I think failing to start the checkpoint connector if offset
>>> syncs are disabled is fine. We'd probably want to do that via the
>>> Connector::validate [1] method in order to be able to catch invalid configs
>>> during preflight validation, but it's not necessary to get that specific in
>>> the KIP (especially since we may add other checks as well).
>>> 
>>> [1] -
>>> https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/connector/Connector.html#validate(java.util.Map)
>>> 
>>> Cheers,
>>> 
>>> Chris
>>> 
>>> On Thu, Apr 4, 2024 at 8:07 PM Omnia Ibrahim 
>>> wrote:
>>> 
 Thanks Chris for the feedback
> 1. It'd be nice to mention that increasing the max offset lag to INT_MAX
> could work as a partial workaround for users on existing versions (though
> of course this wouldn't prevent creation of the syncs topic).
 I updated the KIP
 
> 2. Will it be illegal to disable offset syncs if other features that rely
> on offset syncs are explicitly enabled in the connector config? If
 they're
> not explicitly enabled then it should probably be fine to silently
 disable
> them, but I'd be interested in your thoughts.
 The rest of the features that relays on this is controlled by
 emit.checkpoints.enabled (enabled by default) and
 sync.group.offsets.enabled (disabled by default) which are part of
 MirrorCheckpointConnector config not MirrorSourceConnector, I was thinking
 that MirrorCheckpointConnector should fail to start if
 emit.offset-syncs.enabled is disabled while emit.checkpoints.enabled and/or
 sync.group.offsets.enabled are enabled as no point of creating this
 connector if the main part is disabled. WDYT?
 
 Thanks
 Omnia
 
> On 3 Apr 2024, at 12:45, Chris Egerton  wrote:
> 
> Hi Omnia,
> 
> Thanks for the KIP! Two small things 

[jira] [Created] (KAFKA-16808) Consumer join Group requests response contains 2 different members

2024-05-21 Thread Badhusha Muhammed (Jira)
Badhusha Muhammed created KAFKA-16808:
-

 Summary: Consumer join Group requests response contains 2 
different members
 Key: KAFKA-16808
 URL: https://issues.apache.org/jira/browse/KAFKA-16808
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 2.8.0
Reporter: Badhusha Muhammed
 Fix For: 2.8.0


Even though there is only one consumer running for a group.id, On group 
(re)-join, We are  getting 2 different members in response, Which is causing 
assignment to assign partition to different members, and only processing half 
of the partition. 


Log for group join and partition assignment 
{code:java}
 24/05/13 10:26:28 WARN ProcessingTimeExecutor: Current batch is falling 
behind. The trigger interval is 155000 milliseconds, but spent 391883 
milliseconds
24/05/13 10:26:28 INFO ConsumerCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Giving away all assigned partitions as lost since generation has been 
reset,indicating that consumer is no longer part of the group
24/05/13 10:26:28 INFO ConsumerCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Lost previously assigned partitions topic-0 topic-1 topic-2 topic-3 topic-4 
topic-5 topic-6 topic-7
24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 (Re-)joining group
24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Group coordinator va2kafka014.va2.pubmatic.local:6667 (id: 2147482646 rack: 
null) is unavailable or invalid due to cause: null.isDisconnected: true. 
Rediscovery will be attempted.
24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Rebalance failed.
org.apache.kafka.common.errors.DisconnectException
24/05/13 10:26:28 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling 
executor 436704.
24/05/13 10:26:28 INFO DAGScheduler: Executor lost: 436704 (epoch 0)
24/05/13 10:26:28 INFO BlockManagerMaster: Removed 436704 successfully in 
removeExecutor
24/05/13 10:26:28 INFO YarnClusterScheduler: Executor 436704 on 
va2aggr2503.va2.pubmatic.local killed by driver.
24/05/13 10:26:28 INFO ExecutorMonitor: Executor 436704 is removed. Remove 
reason statistics: (gracefully decommissioned: 0, decommision unfinished: 0, 
driver killed: 436456, unexpectedly exited: 399).
24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Discovered group coordinator va2kafka014.va2.pubmatic.local:6667 (id: 
2147482646 rack: null)
24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 (Re-)joining group
24/05/13 10:26:31 INFO ConsumerCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Finished assignment for group at generation 6: 
{consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3-d4448c3e-8f23-490b-b800-be15a14efd32=Assignment(partitions=[topic-4,
 topic-5, topic-6, topic-7]), 
consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3-0daf3497-feac-4eee-a5ca-596e2b2e1649=Assignment(partitions=[topic-0,
 topic-1, topic-2, topic-3])}
24/05/13 10:26:31 INFO ConsumerCoordinator: [Consumer 
clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3,
 
groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0]
 Adding newly assigned partitions: topic-0 topic-1 topic-2 topic-3{code}
 

Can this be due to the generation reset that we are doing on rebalancing code 
on 2.8.0 which eventually got fixed on version 2.8.1 
https://issues.apache.org/jira/browse/KAFKA-13214
{code:java}
 else {
final RuntimeException 

Re: DescribeLogDirs in Kafka 3.3.1 returns all topics instead of one provided in the request. Bug or "bad user error"?

2024-05-21 Thread Chia-Ping Tsai
Dear all,

I file https://issues.apache.org/jira/browse/KAFKA-16807 to fix it.

Thanks to Maxim for this nice finding. Also, thanks to Gaurav for the quick
response/dig-in

Cheers,
Chia-Ping

Gaurav Narula  於 2024年5月21日 週二 下午2:56寫道:

> Hello Maxim,
>
> Thanks for sharing this.
>
> I had a look and it seems like the behaviour on the wire changed with
> KAFKA-9435. I believe this change [0] in ReplicaManager causes all topics
> in
> online log dirs to be a part of the response inadvertently. We do however
> set
> partition information only for the queried topic. I'd suggest creating a
> JIRA
> for this.
>
> As for an option to specify ALL_PARTITIONS, I reckon that would require a
> KIP
> since it would be a change to the public interface.
>
> [0]:
>
> https://github.com/apache/kafka/pull/7972/files#diff-78812e247ffeae6f8c49b1b22506434701b1e1bafe7f92ef8f8708059e292bf0R674
>
> Regards,
> Gaurav
>
> On Mon, May 20, 2024 at 10:35 PM Maxim Senin 
> wrote:
>
> > Hello.
> >
> > I’m having a problem with Kafka protocol API.
> >
> > Requests:
> > DescribeLogDirs Request (Version: 0) => [topics]
> >   topics => topic [partitions]
> > topic => STRING
> > partitions => INT32
> >
> > My request contains `[{topic: “blah”, partitions:
> > [0,1,2,3,4,5,6,7,8,9]}]`, but the result
> >
> > Responses:
> > DescribeLogDirs Response (Version: 0) => throttle_time_ms [results]
> >   throttle_time_ms => INT32
> >   results => error_code log_dir [topics]
> > error_code => INT16
> > log_dir => STRING
> > topics => name [partitions]
> >   name => STRING
> >   partitions => partition_index partition_size offset_lag
> is_future_key
> > partition_index => INT32
> > partition_size => INT64
> > offset_lag => INT64
> > is_future_key => BOOLEAN
> >
> >
> >
> >  contains entries for *all* topics. My workaround had been to filter the
> > returned list by topic name to find the one I was requesting the data
> for,
> > but I don’t understand why it’s not limiting the results to just the
> topic
> > I requested in the first place.
> >
> > Also, I think there should be an option to just specify ALL_PARTITIONS
> > because that would save me from having to retrieve topic metadata from
> the
> > broker to count the number of partitions. Kafka server would probably
> have
> > means to do that more efficiently.
> >
> > Is this a bug or am I doing something wrong?
> >
> > Thanks,
> > Maxim
> >
> > 
> >
> > COGILITY SOFTWARE CORPORATION LEGAL DISCLAIMER: The information in this
> > email is confidential and is intended solely for the addressee. Access to
> > this email by anyone else is unauthorized. If you are not the intended
> > recipient, any disclosure, copying, distribution or any action taken or
> > omitted to be taken in reliance on it, is prohibited and may be unlawful.
> >
>


[jira] [Created] (KAFKA-16807) DescribeLogDirsResponseData#results#topics have unexpected topics having empty partitions

2024-05-21 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16807:
--

 Summary: DescribeLogDirsResponseData#results#topics have 
unexpected topics having empty partitions
 Key: KAFKA-16807
 URL: https://issues.apache.org/jira/browse/KAFKA-16807
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


ReplicaManager [0] could generate a response having unexpected topics which 
have empty partitions. The root cause is it always generate the topic 
collection even though they have no matched partitions.

That is not a issue to Kafka clients, since we loop the "partitions" to fill 
all future responses [1]. Hence, those unexpected topics won't be existent in 
the final results.

However, that could be a issue to the users who implement Kafka client based on 
Kafka protocol [2]


[0] 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1252
[1] 
https://github.com/apache/kafka/blob/b5a013e4564ad93026b9c61431e4573a39bec766/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L3145
[2] https://lists.apache.org/thread/lp7ktmm17pbg7nqk7p4s904lcn3mrvhy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Request for filing KIP

2024-05-21 Thread Mickael Maison
Hi,

I granted you permissions in the wiki. You should now be able to create a KIP.

Thanks,
Mickael

On Tue, May 21, 2024 at 5:11 AM Harsh Panchal  wrote:
>
> Dear Apache Kafka Team,
>
> As instructed, I would like to write a KIP for PR -
> https://github.com/apache/kafka/pull/15905.
>
> I see that I don't have access to the "Create KIP" button on confluence. I
> kindly request you to grant access to write up KIP. My user name is: bootmgr
>
> Best Regards,
> Harsh Panchal


[jira] [Resolved] (KAFKA-16794) Can't open videos in streams documentation

2024-05-21 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16794.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Can't open videos in streams documentation
> --
>
> Key: KAFKA-16794
> URL: https://issues.apache.org/jira/browse/KAFKA-16794
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: Kuan Po Tseng
>Assignee: 黃竣陽
>Priority: Minor
> Fix For: 3.8.0
>
> Attachments: IMG_4445.png, image.png
>
>
> Can't open videos in page [https://kafka.apache.org/documentation/streams/]
> Open console in chrome browser and it shows error message:
> {{Refused to frame 'https://www.youtube.com/' because it violates the 
> following Content Security Policy directive: "frame-src 'self'".}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] KAFKA-16794 Move CSP header from _header.htm to .htaccess [kafka-site]

2024-05-21 Thread via GitHub


chia7712 commented on PR #602:
URL: https://github.com/apache/kafka-site/pull/602#issuecomment-2122101896

   @brandboat @raboof @FrankYang0529 thanks for all your reivews! and @m1a2st 
thanks for fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16794 Move CSP header from _header.htm to .htaccess [kafka-site]

2024-05-21 Thread via GitHub


chia7712 merged PR #602:
URL: https://github.com/apache/kafka-site/pull/602


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR - Move CSP header from _header.htm to .htaccess [kafka-site]

2024-05-21 Thread via GitHub


m1a2st commented on code in PR #602:
URL: https://github.com/apache/kafka-site/pull/602#discussion_r1607766854


##
.htaccess:
##
@@ -10,3 +10,4 @@ RewriteRule ^/?(\d+)/javadoc - [S=2]
 RewriteRule ^/?(\d+)/images/ - [S=1]
 RewriteCond $2 !=protocol
 RewriteRule ^/?(\d+)/([a-z]+)(\.html)? /$1/documentation#$2 [R=302,L,NE]
+Header set Content-Security-Policy "frame-src youtube.com www.youtube.com"

Review Comment:
   Thanks for your comment, I change to use https



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR - Move CSP header from _header.htm to .htaccess [kafka-site]

2024-05-21 Thread via GitHub


raboof commented on code in PR #602:
URL: https://github.com/apache/kafka-site/pull/602#discussion_r1607763188


##
.htaccess:
##
@@ -10,3 +10,4 @@ RewriteRule ^/?(\d+)/javadoc - [S=2]
 RewriteRule ^/?(\d+)/images/ - [S=1]
 RewriteCond $2 !=protocol
 RewriteRule ^/?(\d+)/([a-z]+)(\.html)? /$1/documentation#$2 [R=302,L,NE]
+Header set Content-Security-Policy "frame-src youtube.com www.youtube.com"

Review Comment:
   Maybe restrict it to https:
   
   ```suggestion
   Header set Content-Security-Policy "frame-src https://youtube.com 
https://www.youtube.com";
   ```
   
   ... but seems reasonable to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org