Re: request permissions to contribute to Kafka

2024-05-06 Thread Luke Chen
Hi Zhisheng,

I've granted your permission.

Thank you.
Luke

On Tue, May 7, 2024 at 10:25 AM Zhisheng Zhang <31791909...@gmail.com>
wrote:

> Hi
>
> I'd like to request permissions to contribute to Kafka to propose a KIP
>
> Wiki ID:zhangzhisheng
> Jira ID:zhangzhisheng
>
> Thank you
>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2877

2024-05-06 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16677) Replace ClusterType#ALL and ClusterType#DEFAULT by Array

2024-05-06 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16677:
--

 Summary: Replace ClusterType#ALL and ClusterType#DEFAULT by Array
 Key: KAFKA-16677
 URL: https://issues.apache.org/jira/browse/KAFKA-16677
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


Both ClusterType#ALL and ClusterType#DEFAULT are a kind of "tag" instead of 
true "type". It seems to me they can be removed by using Array. For example:

ClusterType#ALL -> {Type.ZK, Type.KRAFT, Type.CO_KRAFT}

ClusterType#DEFAULT -> {}

There are two benefits

1. That is more readable for "ALL type". 
2. We don't throw the awkward "exception" when seeing "DEFAULT".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


request permissions to contribute to Kafka

2024-05-06 Thread Zhisheng Zhang
Hi

I'd like to request permissions to contribute to Kafka to propose a KIP

Wiki ID:zhangzhisheng
Jira ID:zhangzhisheng

Thank you


[jira] [Resolved] (KAFKA-16470) kafka-dump-log --offsets-decoder should support new records

2024-05-06 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16470.

Fix Version/s: 3.8.0
   Resolution: Fixed

> kafka-dump-log --offsets-decoder should support new records
> ---
>
> Key: KAFKA-16470
> URL: https://issues.apache.org/jira/browse/KAFKA-16470
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16608) AsyncKafkaConsumer doesn't honor interrupted thread status on KafkaConsumer.poll(Duration)

2024-05-06 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16608.

Fix Version/s: 3.8.0
   Resolution: Fixed

> AsyncKafkaConsumer doesn't honor interrupted thread status on 
> KafkaConsumer.poll(Duration)
> --
>
> Key: KAFKA-16608
> URL: https://issues.apache.org/jira/browse/KAFKA-16608
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.8.0
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Minor
> Fix For: 3.8.0
>
>
> The behaviour for KafkaConsumer.poll(Duration) when the calling thread is in 
> interrupted state is to throw InterruptException. The AsyncKafkaConsumer 
> doesn't do this. It only throws that exception if the interruption occurs 
> while it is waiting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16356) Remove class-name dispatch in RemoteLogMetadataSerde

2024-05-06 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-16356.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Remove class-name dispatch in RemoteLogMetadataSerde
> 
>
> Key: KAFKA-16356
> URL: https://issues.apache.org/jira/browse/KAFKA-16356
> Project: Kafka
>  Issue Type: Task
>  Components: Tiered-Storage
>Affects Versions: 3.7.0
>Reporter: Greg Harris
>Assignee: Linu Shibu
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.8.0
>
>
> The RemoteLogMetadataSerde#serialize receives a RemoteLogMetadata object, and 
> has to dispatch to one of four serializers depending on it's type. This is 
> done by taking the class name of the RemoteLogMetadata and looking it up in 
> maps to find the corresponding serializer for that class.
> This later requires an unchecked cast, because the RemoteLogMetadataTransform 
> is generic. This is all type-unsafe, and can be replaced with type-safe 
> if-elseif-else statements that may also be faster than the double-indirect 
> map lookups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-932: Queues for Kafka

2024-05-06 Thread Jun Rao
Hi, Andrew,

Thanks for the KIP. +1

Jun

On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar 
wrote:

> Thanks Andrew,
>
> +1 (binding)
>
> Edo
>
> On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
>  wrote:
> >
> > Hi Andrew
> >
> > + 1 (Non-Binding)
> >
> > This will be great addition to Kafka
> >
> > On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal 
> > wrote:
> >
> > > Hi Andrew,
> > > Thanks for writing the KIP. This is indeed going to be a valuable
> addition
> > > to the Kafka, excited to see the KIP.
> > >
> > > + 1 (Non-Binding)
> > >
> > > Regards,
> > > Apoorv Mittal
> > > +44 7721681581
> > >
> > >
> > > On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
> > > andrew_schofield_j...@outlook.com> wrote:
> > >
> > > > Hi,
> > > > I’ve been working to complete KIP-932 over the past few months and
> > > > discussions have quietened down.
> > > >
> > > > I’d like to open the voting for KIP-932:
> > > >
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
> > > >
> > > > Thanks,
> > > > Andrew
> > >
>


Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-06 Thread Jun Rao
Hi, Andrew,

Thanks for addressing all the comments. The KIP looks good to me now.

Jun

On Mon, May 6, 2024 at 2:15 PM Andrew Schofield 
wrote:

> Hi Jun,
> I have removed AdminClient.listGroups and the associated classes and
> interfaces.
>
> Version 6 of the ListGroups RPC remains because it adds support for share
> groups.
> This is needed to list share groups for the admin client and the
> command-line
> tools.
>
> Thanks,
> Andrew
>
> > On 6 May 2024, at 19:26, Jun Rao  wrote:
> >
> > Hi, Andrew,
> >
> > Removing AdminClient.listGroups() and the LisGroups RPC for now sounds
> good
> > to me.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, May 6, 2024 at 11:10 AM Andrew Schofield <
> andrew_schofi...@live.com>
> > wrote:
> >
> >> Hi Jun,
> >> Thanks for the reply.
> >>
> >> 162. I’ve mentioned before that I plan another KIP for administration
> >> of groups. I think this is heading into that territory. I would like
> >> listGroups() to do a comprehensive job of returning all of the groups,
> >> and that does include groups which aren’t currently covered by
> GroupType.
> >> There’s a single namespace for all groups, and if someone is to make
> sense
> >> of the groups in a cluster, I think they need to be able to see them
> all.
> >> This is really just putting an API on top of the ListGroups RPC that
> >> already
> >> exists.
> >>
> >> I don’t think it would be desirable for connect-based groups to
> >> have GroupType.UNKNOWN. There are other custom group types.
> >> This all needs sorting out, I think.
> >>
> >> I propose to remove AdminClient.listGroups() from this KIP, and put
> >> it in the administration KIP.
> >>
> >> Let me know what you think.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>> On 6 May 2024, at 18:04, Jun Rao  wrote:
> >>>
> >>> Hi, Andrew,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> 162. It's fine to start with just the group type. Since ListGroups()
> is a
> >>> generic API, I want to make sure that it covers all existing groups.
> >>> Currently, GroupType only has "classic" and "consumer", both of which
> >> seem
> >>> to be related to groups formed by consumers since it's part of
> >>> ConsumerGroupDescription. Does ListGroup() return connect based groups
> >> and
> >>> if so, what's the GroupType? If ListGroup() doesn't cover all groups,
> >>> should we name it more accurately?
> >>>
> >>> Jun
> >>>
> >>>
> >>> On Fri, May 3, 2024 at 7:51 PM Andrew Schofield <
> >> andrew_schofi...@live.com>
> >>> wrote:
> >>>
>  Hi Jun,
>  Thanks for your reply.
> 
>  161. ShareGroupListing and ShareGroupDescription are using
>  the same pattern as ConsumerGroupListing and
>  ConsumerGroupDescription. I have gone for consistency which
>  I think is probably best here. It’s what I would expect if I had
> >> previously
>  used the admin API for consumer groups and was looking to use it for
>  share groups. I agree it’s a bit weird.
> 
>  162. GroupListing contains the only information which is properly
>  in common between a ConsumerGroupListing and a ShareGroupListing.
>  ListGroupsResponse.ProtocolType is interpreted to provide the
>  group type. I know that the ListGroups RPC also includes the group
>  state, but that’s as a string and there’s no common enum for the
> states
>  of all types of group. As a result, I have exposed group type but not
>  state on this API.
> 
>  Previously in the discussion for this KIP, I mentioned that I would
>  create another KIP for the administration of groups, in particular
>  how the administrator can ensure that particular group IDs
>  are used for the group type they desire. At the moment, I think
>  keeping ListGroups in this KIP is a good idea. If we actually want
>  to make it more sophisticated, perhaps that would be better with
>  the group administration KIP.
> 
>  163. It will be one higher than the latest version at the time we are
>  ready to deliver this feature for real. When we are on the cusp of
>  delivery, I’ll update the KIP with the final value.
> 
>  164. KRaft only. All the RPCs are “broker” only. None of the code will
>  be merged until after 3.8 has branched.
> 
>  Thanks,
>  Andrew
> 
> > On 4 May 2024, at 00:12, Jun Rao  wrote:
> >
> > Hi, Andrew,
> >
> > Thanks for the reply. A few more comments.
> >
> > 161. ShareGroupListing.state() returns an optional, but
> > ShareGroupDescription.state() does not. Should we make them
> consistent?
> > Also, it seems a bit weird to return optional with an UNKNOWN state.
> >
> > 162. Should GroupListing include ProtocolType and GroupState too?
> >
> > 163. What is the value of group.version to gate the queuing feature?
> >
> > 164. Is the queueing feature only supported on KRaft clusters? For
>  example,
> > the feature tool seems to be built only for the KRaft cluster.
> >
> > 

Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-05-06 Thread Sophie Blee-Goldman
Thanks guys. Updated the error codes in both the code and the explanation
under "Public Changes". To sum up, here are the error codes listed in the
KIP:

enum AssignmentError {
NONE,
ACTIVE_TASK_ASSIGNED_MULTIPLE_TIMES,
ACTIVE_AND_STANDBY_TASK_ASSIGNED_TO_SAME_KAFKASTREAMS,
INVALID_STANDBY_TASK,
UNKNOWN_PROCESS_ID,
UNKNOWN_TASK_ID
}

Anything missing?

(also updated all the code block headings, thanks for noticing that Bruno)

On Fri, May 3, 2024 at 9:33 AM Matthias J. Sax  wrote:

> 117f: Good point by Bruno. We should check for this, and could have an
> additional `INVALID_STANDBY_TASK` error code?
>
>
> -Matthias
>
> On 5/3/24 5:52 AM, Guozhang Wang wrote:
> > Hi Sophie,
> >
> > Re: As for the return type of the TaskAssignmentUtils, I think that
> > makes sense. LGTM.
> >
> > On Fri, May 3, 2024 at 2:26 AM Bruno Cadonna  wrote:
> >>
> >> Hi Sophie,
> >>
> >> 117f:
> >> I think, removing the STATEFUL and STATELESS types is not enough to
> >> avoid the error Guozhang mentioned. The StreamsPartitionAssignor passes
> >> the information whether a task is stateless or stateful into the task
> >> assignor. However, the task assignor can return a standby task for a
> >> stateless task which is inconsistent.
> >>
> >> Echoing Matthias' statement about the missing UNKNOWN_TASK_ID error.
> >>
> >> nit:
> >> The titles of some code blocks in the KIP are not consistent with their
> >> content, e.g., KafkaStreamsState <-> NodeState
> >>
> >>
> >> Best,
> >> Bruno
> >>
> >> On 5/3/24 2:43 AM, Matthias J. Sax wrote:
> >>> Thanks Sophie. My bad. You are of course right about `TaskAssignment`
> >>> and the StreamsPartitionAssignor's responsibitliy to map tasks of a
> >>> instance to consumers. When I wrote my reply, I forgot about this
> detail.
> >>>
> >>> Seems you did not add `UNKNOWN_TASK_ID` error yet as proposed by
> Guozhang?
> >>>
> >>> Otherwise LGTM.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 5/2/24 4:20 PM, Sophie Blee-Goldman wrote:
>  Guozhang:
> 
>  117. All three additions make sense to me. However, while thinking
> about
>  how users would actually produce an assignment, I realized that it
> seems
>  silly to make it their responsibility to distinguish between a
> stateless
>  and stateful task when they return the assignment. The
>  StreamsPartitionAssignor already knows which tasks are stateful vs
>  stateless, so there's no need to add this extra step for users to
>  figure it
>  out themselves, and potentially make a mistake.
> 
>  117f: So, rather than add a new error type for "inconsistent task
> types",
>  I'm proposing to just flatten the AssignedTask.Type enum to only
> "ACTIVE"
>  and "STANDBY", and remove the "STATEFUL" and "STATELESS" types
>  altogether.
>  Any objections?
> 
>  -
> 
>  -Thanks, fixed the indentation of headers under "User APIs" and
>  "Read-Only
>  APIs"
> 
>  -As for the return type of the TaskAssignmentUtils methods, I don't
>  personally feel too strongly about this, but the reason for the return
>  type
>  being a Map rather than a
>  TaskAssignment
>  is because they are meant to be used iteratively/to create a part of
> the
>  full assignment, and not necessarily a full assignment for each.
> Notice
>  that they all have an input parameter of the same type: Map  KafkaStreamsAssignment>. The idea is you can take the output of any of
>  these and pass it in to another to generate or optimize another piece
> of
>  the overall assignment. For example, if you want to perform the
>  rack-aware
>  optimization on both active and standby tasks, you would need to call
>  #optimizeRackAwareActiveTasks and then forward the output to
>  #optimizeRackAwareStandbyTasks to get the final assignment. If we
>  return a
>  TaskAssignment, it will usually need to be unwrapped right away.
> Perhaps
>  more importantly, I worry that returning a TaskAssignment will make it
>  seem
>  like each of these utility methods return a "full" and final
> assignment
>  that can just be returned as-is from the TaskAssignor's #assign
> method.
>  Whereas they are each just a single step in the full assignment
> process,
>  and not the final product. Does that make sense?
> 
>  On Thu, May 2, 2024 at 3:50 PM Sophie Blee-Goldman
>  
>  wrote:
> 
> > Matthias:
> >
> > Thanks for the naming suggestions for the error codes. I was
> > definitely not happy with my original naming but couldn't think of
> > anything
> > better.  I like your proposals though, will update the KIP names.
> > I'll also
> > add a "NONE" option as well -- much better than just passing in null
> > for no
> > error.
> >
> >> OVERLAPPING_CLIENT : multiple KafkaStreams clients assigned with the
> >> same active task
> >
> >Would also be an error 

Re: [DISCUSS] KIP-936 Throttle number of active PIDs

2024-05-06 Thread Justine Olshan
Hi Claude,

I can clarify my comments.

Just to clarify -- my understanding is that we don't intend to throttle any
new producer IDs at the beginning. I believe this amount is specified by
`producer_ids_rate`, but you can see this as a number of producer IDs per
hour.

So consider a case where there is a storm for a given principal. We could
have a large mass of short lived producers in addition to some
"well-behaved" ones. My understanding is that if the "well-behaved" one
doesn't produce as frequently ie less than once per hour, it will also get
throttled when a storm of short-lived producers leads the principal to hit
the given rate necessary for throttling. The idea of the filter is that we
don't throttle existing producers, but in this case, we will.

Note -- one thing that wasn't totally clear from the KIP was whether we
throttle all new produce requests from the client or just the ones with
unseen IDs. If we throttle them all, perhaps this point isn't a huge deal.

The other concern that I brought up is that when we throttle, we will
likely continue to throttle until the storm stops. This is because we will
have to wait 1 day or so for IDs to expire, and we will likely replace them
at a pretty fast rate. This can be acceptable if we believe that it is
helpful to getting the behavior to stop, but I just wanted to call out that
the user will likely not be able to start clients in the meantime.

Justine

On Sun, May 5, 2024 at 6:35 AM Claude Warren  wrote:

> Justine,
>
> I am new here so please excuse the ignorance.
>
> When you talk about "seen" producers I assume you mean the PIDs that the
> Bloom filter has seen.
> When you say "producer produces every 2 hours" are you the producer writes
> to a topic every 2 hours and uses the same PID?
> When you say "hitting the limit" what limit is reached?
>
> Given the default setup, A producer that produces a PID every 2 hours,
> regardless of whether or not it is a new PID, will be reported as a new PID
> being seen.  But I would expect the throttling system to accept that as a
> new PID for the producer and look at the frequency of PIDs and accept
> without throttling.
>
> If the actual question is "how many PIDs did this Principal produce in the
> last hour"  Or "Has this Principal produced more than X PIDs in the last
> hour", there are probably cleaner ways to do this.  If this is the
> question, I would use CPC from Apache Data Sketches [1] and keep multiple
> CPC (say every 15 minutes -- to match the KIP-936 proposal) for each
> Principal.  You could then do a quick check on the current CPC to see if it
> exceeds hour-limit / 4 and if so check the hour rate (by summing the 4
> 15-minute CPCs).  Then the code could simply notify when to throttle and
> when to stop throttling.
>
> Claude
>
>
> https://datasketches.apache.org/docs/CPC/CpcPerformance.html
>
> On Fri, May 3, 2024 at 4:21 PM Justine Olshan  >
> wrote:
>
> > Hey folks,
> >
> > I shared this with Omnia offline:
> > One concern I have is with the length of time we keep "seen" producer
> IDs.
> > It seems like the default is 1 hour. If a producer produces every 2 hours
> > or so, and we are hitting the limit, it seems like we will throttle it
> even
> > though we've seen it before and have state for it on the server. Then, it
> > seems like we will have to wait for the natural expiration of producer
> ids
> > (via producer.id.expiration.ms) before we allow new or idle producers to
> > join again without throttling. I think this proposal is a step in the
> right
> > direction when it comes to throttling the "right" clients, but I want to
> > make sure we have reasonable defaults. Keep in mind that idempotent
> > producers are the default, so most folks won't be tuning these values out
> > of the box.
> >
> > As for Igor's questions about InitProducerId -- I think the main reason
> we
> > have avoided that solution is that there is no state stored for
> idempotent
> > producers when grabbing an ID. My concern there is either storing too
> much
> > state to track this or throttling before we need to.
> >
> > Justine
> >
> > On Thu, May 2, 2024 at 2:36 PM Claude Warren, Jr
> >  wrote:
> >
> > > There is some question about whether or not we need the configuration
> > > options.  My take on them is as follows:
> > >
> > > producer.id.quota.window.num  No opinion.  I don't know what this is
> used
> > > for, but I suspect that there is a good reason to have it.  It is not
> > used
> > > within the Bloom filter caching mechanism
> > > producer.id.quota.window.size.seconds Leave it as it is one of the most
> > > effective ways to tune the filter and determines how long a PID is
> > > recognized.
> > > producer.id.quota.cache.cleanup.scheduler.interval.ms  Remove it
> unless
> > > there is another use for it.   We can get a better calculation for
> > > internals.
> > > producer.id.quota.cache.layer.count Leave it as it is one of the most
> > > effective ways to tune the filter.
> > > 

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

2024-05-06 Thread Greg Harris
Hi Murali,

Thanks for the KIP!

I think I understand the motivation for this KIP in situations where
there are a "cross product" of topics for two or more variables X and
Y, and want to write ACLs for each of the variable axes.
If you format your topics "X-Y-suffix", it's not easy to write rules
that apply to all "Y" topics, because you need to enumerate all of the
"X" values, and the problem persists even if you reorder the topic
name.

In my recent work on KIP-986 I found it necessary to introduce
"namespaces" to group topics together, and I was going to replicate
the ACL system to specify those namespaces. This change to the ACL
system could increase the expressiveness and complexity of that
feature, if it is ever implemented.
One of the primitives I needed when specifying namespaces was the
ability to tell when two namespaces overlapped (i.e. does there exist
any topic which is present in both namespaces). This is trivial to do
with the current PREFIX and LITERAL system, as we can find the
maximum-length common prefix with just some length comparisons and
equality checks.
I considered specifying namespaces via regular expressions, and found
that it was computationally much more difficult. Computing the
intersection of two regexes appears to be exponential in the length of
the regexes, leading me to avoid adding it.

I understand that you're not suggesting full REGEX support, and that
"namespaces" don't need to support MATCH, but I think MATCH may run
into related difficulties. Any MATCH can overlap with any other MATCH
or PREFIX if it includes a sufficient number of wildcards. For
example:
MATCH *-accounts-* has overlap with PREFIX nl as they can both match
"nl-accounts-localtopic", but that isn't sensitive to the contents
"nl", it is true for any PREFIX.
MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
both match "nl-accounts-localtopic", but that isn't actually sensitive
to the contents "localtopic", it's true for any MATCH which includes a
wildcard at the beginning.

This has implications for execution complexity: If we can't compute
whether two patterns overlap, then we need to run both of them on each
piece of input to test if they both match. Under the current
LITERAL/PREFIX system, we can optimize execution with a trie, but that
option wouldn't be available to us with MATCH.

The current system makes users evaluate a trade-off:
1. Optimize the number of ACLs by organizing topics according to
prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts",
PREFIX "accounts-localtopic")
2. Use less-structured topic names, with a corresponding ACL scheme
that has more individual rules.
The system currently informs users of this tradeoff by making them
write multiple ACLs, and making them think "there has got to be a
better way!". Perhaps we can find a better way to surface this best
practice, or better inform users about it.

I understand that there are going to be situations more complex than
your example, where multiple individual rules will always be necessary
with only PREFIX evaluation. I think even in those situations, a
number of efficient-to-evaluate rules is preferable to just one
expensive-to-evaluate rule.

One alternative that I thought of could be "PARAMETERIZED" ACLs which
are like PREFIXED, but allow some parameter substitution. For example
PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
but this isn't actually a regex, and wouldn't allow arbitrary numbers
of characters, or the * or + operators.
In the background it could evaluate exactly like the 3 individual
PREFIX rules, but be easier to evaluate on the backend, and support
the intersection query I mentioned earlier. It could also support
[a-zA-Z] notation in case the parameter values aren't known ahead of
time, but have a fixed length.

Thanks,
Greg

On Mon, May 6, 2024 at 11:17 AM Claude Warren  wrote:
>
> I have an idea for how to reduce the time for ACL lookups in general and
> particularly where wildcards are involved using sequence
> characterization techniques from bioinformatics.  But I need a set of ACL
> patterns and associated topics to test with.
>
> On Fri, May 3, 2024 at 2:45 PM Haruki Okada  wrote:
>
> > Hi, Murali.
> >
> > First, could you add the KIP-1042 to the index (
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > )
> > as well so that everyone can find it easily?
> >
> > I took a look at the KIP, then I have 2 questions:
> >
> > 1. Though the new MATCH resource pattern type may reduce the effort of
> > adding ACLs in some cases, do you have any concrete use case you are in
> > mind? (When prefixed ACL was introduced in KIP-290, there was a use-case
> > that using it for implementing multi-tenancy)
> >
> > 2. As you may know, ACL lookup is in the hot-path which the performance is
> > very important. (
> >
> > 

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-06 Thread Jun Rao
Hi, Andrew,

Removing AdminClient.listGroups() and the LisGroups RPC for now sounds good
to me.

Thanks,

Jun

On Mon, May 6, 2024 at 11:10 AM Andrew Schofield 
wrote:

> Hi Jun,
> Thanks for the reply.
>
> 162. I’ve mentioned before that I plan another KIP for administration
> of groups. I think this is heading into that territory. I would like
> listGroups() to do a comprehensive job of returning all of the groups,
> and that does include groups which aren’t currently covered by GroupType.
> There’s a single namespace for all groups, and if someone is to make sense
> of the groups in a cluster, I think they need to be able to see them all.
> This is really just putting an API on top of the ListGroups RPC that
> already
> exists.
>
> I don’t think it would be desirable for connect-based groups to
> have GroupType.UNKNOWN. There are other custom group types.
> This all needs sorting out, I think.
>
> I propose to remove AdminClient.listGroups() from this KIP, and put
> it in the administration KIP.
>
> Let me know what you think.
>
> Thanks,
> Andrew
>
>
> > On 6 May 2024, at 18:04, Jun Rao  wrote:
> >
> > Hi, Andrew,
> >
> > Thanks for the reply.
> >
> > 162. It's fine to start with just the group type. Since ListGroups() is a
> > generic API, I want to make sure that it covers all existing groups.
> > Currently, GroupType only has "classic" and "consumer", both of which
> seem
> > to be related to groups formed by consumers since it's part of
> > ConsumerGroupDescription. Does ListGroup() return connect based groups
> and
> > if so, what's the GroupType? If ListGroup() doesn't cover all groups,
> > should we name it more accurately?
> >
> > Jun
> >
> >
> > On Fri, May 3, 2024 at 7:51 PM Andrew Schofield <
> andrew_schofi...@live.com>
> > wrote:
> >
> >> Hi Jun,
> >> Thanks for your reply.
> >>
> >> 161. ShareGroupListing and ShareGroupDescription are using
> >> the same pattern as ConsumerGroupListing and
> >> ConsumerGroupDescription. I have gone for consistency which
> >> I think is probably best here. It’s what I would expect if I had
> previously
> >> used the admin API for consumer groups and was looking to use it for
> >> share groups. I agree it’s a bit weird.
> >>
> >> 162. GroupListing contains the only information which is properly
> >> in common between a ConsumerGroupListing and a ShareGroupListing.
> >> ListGroupsResponse.ProtocolType is interpreted to provide the
> >> group type. I know that the ListGroups RPC also includes the group
> >> state, but that’s as a string and there’s no common enum for the states
> >> of all types of group. As a result, I have exposed group type but not
> >> state on this API.
> >>
> >> Previously in the discussion for this KIP, I mentioned that I would
> >> create another KIP for the administration of groups, in particular
> >> how the administrator can ensure that particular group IDs
> >> are used for the group type they desire. At the moment, I think
> >> keeping ListGroups in this KIP is a good idea. If we actually want
> >> to make it more sophisticated, perhaps that would be better with
> >> the group administration KIP.
> >>
> >> 163. It will be one higher than the latest version at the time we are
> >> ready to deliver this feature for real. When we are on the cusp of
> >> delivery, I’ll update the KIP with the final value.
> >>
> >> 164. KRaft only. All the RPCs are “broker” only. None of the code will
> >> be merged until after 3.8 has branched.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 4 May 2024, at 00:12, Jun Rao  wrote:
> >>>
> >>> Hi, Andrew,
> >>>
> >>> Thanks for the reply. A few more comments.
> >>>
> >>> 161. ShareGroupListing.state() returns an optional, but
> >>> ShareGroupDescription.state() does not. Should we make them consistent?
> >>> Also, it seems a bit weird to return optional with an UNKNOWN state.
> >>>
> >>> 162. Should GroupListing include ProtocolType and GroupState too?
> >>>
> >>> 163. What is the value of group.version to gate the queuing feature?
> >>>
> >>> 164. Is the queueing feature only supported on KRaft clusters? For
> >> example,
> >>> the feature tool seems to be built only for the KRaft cluster.
> >>>
> >>> Jun
> >>>
> >>> On Fri, May 3, 2024 at 10:32 AM Andrew Schofield <
> >> andrew_schofi...@live.com>
> >>> wrote:
> >>>
>  Hi Jun,
>  Thanks for your reply.
> 
>  147. Yes, I see what you mean. The rebalance latency will indeed
>  be very short by comparison. I have removed the rebalance latency
>  metrics from the client and retained the rebalance count and rate.
> 
>  150. Yes, I think so. I have tweaked the text so that the simple
>  assignor will take into account existing assignment information when
>  it has it, which would just minimise unnecessary churn of (b).
> 
>  158. I’ve changed it to ReadShareGroupStateSummary.
> 
>  Thanks,
>  Andrew
> 
> 
> > On 3 May 2024, at 22:17, Jun Rao  wrote:
> >
> > Hi, Andrew,
> >
> 

[jira] [Created] (KAFKA-16676) Security docs missing RPCs from KIP-714 and KIP-1000

2024-05-06 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16676:


 Summary: Security docs missing RPCs from KIP-714 and KIP-1000
 Key: KAFKA-16676
 URL: https://issues.apache.org/jira/browse/KAFKA-16676
 Project: Kafka
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.7.0, 3.8.0
Reporter: Andrew Schofield
Assignee: Andrew Schofield
 Fix For: 3.8.0


KIPs 714 and 1000 introduced 3 new RPCs to do with client metrics. None of them 
was added to the list of RPCs in the security documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

2024-05-06 Thread Claude Warren
I have an idea for how to reduce the time for ACL lookups in general and
particularly where wildcards are involved using sequence
characterization techniques from bioinformatics.  But I need a set of ACL
patterns and associated topics to test with.

On Fri, May 3, 2024 at 2:45 PM Haruki Okada  wrote:

> Hi, Murali.
>
> First, could you add the KIP-1042 to the index (
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> )
> as well so that everyone can find it easily?
>
> I took a look at the KIP, then I have 2 questions:
>
> 1. Though the new MATCH resource pattern type may reduce the effort of
> adding ACLs in some cases, do you have any concrete use case you are in
> mind? (When prefixed ACL was introduced in KIP-290, there was a use-case
> that using it for implementing multi-tenancy)
>
> 2. As you may know, ACL lookup is in the hot-path which the performance is
> very important. (
>
> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
> ).
> Do you have ideas how do we update `matchingAcls` to support MATCH-type ACL
> without introducing performance issue?
>
>
> Thanks,
>
> 2024年5月3日(金) 19:51 Claude Warren, Jr :
>
> > As I wrote in [1], the ACL evaluation algorithm needs to be specified
> with
> > respect to the specificity of the pattern so that we know exactly which
> if
> > *-accounts-* takes precedence over nl-accounts-* or visa versa.
> >
> > I think that we should spell out that precedence is evaluated as follows:
> >
> > 1. Remove patterns that do not match
> > 2. More specific patterns take precedence over less specific patterns
> > 3. for patterns of the same precedence DENY overrides ALLOW
> >
> > Determining specificity:
> >
> > Specificity is based on the Levenshtein distance between the pattern and
> > the text being evaluated. The lower the distance the more specific the
> > rule.
> > Using the topic name: nl-accounts-localtopic we can evaluate the
> > Levenshtein distance for various patterns.
> > nl-accounts-localtopic = 0
> > *-accounts-localtopic = 2
> > nl-accounts-local* = 5
> > *-accounts-local* = 7
> > nl-accounts-* = 10
> > *-accounts-* = 12
> >
> > In the special case of matching principles User matches are more specific
> > than Group matches.
> >
> > I don't know if this should be added to KIP-1042 or presented as a new
> KIP.
> >
> > Claude
> >
> > [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > >
> >
> > On Fri, May 3, 2024 at 12:18 PM Claude Warren  wrote:
> >
> > > Took me awhile to find it but the link to the KIP is
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > >
> > > On Fri, May 3, 2024 at 10:13 AM Murali Basani  >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I'd like to propose a suggestion to our resource patterns in Kafka
> > ACLs.
> > > >
> > > > Currently, when adding new ACLs in Kafka, we have two types of
> resource
> > > > patterns for topics:
> > > >
> > > >- LITERAL
> > > >- PREFIXED
> > > >
> > > > However, when it comes to listing or removing ACLs, we have a couple
> > more
> > > > options:
> > > >
> > > >- MATCH
> > > >- ANY (will match any pattern type)
> > > >
> > > >
> > > > If we can extend creating acls as well with 'MATCH' pattern type, it
> > > would
> > > > be very beneficial. Even though this kind of acl should be created
> with
> > > > utmost care, it will help organizations streamline their ACL
> management
> > > > processes.
> > > >
> > > > Example scenarios :
> > > >
> > > > Let's say we need to create ACLs for the following six topics:
> > > > nl-accounts-localtopic, nl-accounts-remotetopic,
> > de-accounts-localtopic,
> > > > de-accounts-remotetopic, cz-accounts-localtopic,
> > cz-accounts-remotetopic
> > > >
> > > > Currently, we achieve this using existing functionality by creating
> > three
> > > > prefixed ACLs as shown below:
> > > >
> > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > --add \
> > > > > --allow-principal
> > > > >
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > \
> > > > > --producer \
> > > > > --topic nl-accounts- \
> > > > > --resource-pattern-type prefixed
> > > >
> > > >
> > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > --add \
> > > > > --allow-principal
> > > > >
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > \
> > > > > --producer \
> > > > > --topic de-accounts- \
> > > > > --resource-pattern-type prefixed
> > > >
> > > >
> > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > --add \
> > > > > --allow-principal
> > > > >
> > > >
> > >
> >
> 

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-06 Thread Jun Rao
Hi, Andrew,

Thanks for the reply.

162. It's fine to start with just the group type. Since ListGroups() is a
generic API, I want to make sure that it covers all existing groups.
Currently, GroupType only has "classic" and "consumer", both of which seem
to be related to groups formed by consumers since it's part of
ConsumerGroupDescription. Does ListGroup() return connect based groups and
if so, what's the GroupType? If ListGroup() doesn't cover all groups,
should we name it more accurately?

Jun


On Fri, May 3, 2024 at 7:51 PM Andrew Schofield 
wrote:

> Hi Jun,
> Thanks for your reply.
>
> 161. ShareGroupListing and ShareGroupDescription are using
> the same pattern as ConsumerGroupListing and
> ConsumerGroupDescription. I have gone for consistency which
> I think is probably best here. It’s what I would expect if I had previously
> used the admin API for consumer groups and was looking to use it for
> share groups. I agree it’s a bit weird.
>
> 162. GroupListing contains the only information which is properly
> in common between a ConsumerGroupListing and a ShareGroupListing.
> ListGroupsResponse.ProtocolType is interpreted to provide the
> group type. I know that the ListGroups RPC also includes the group
> state, but that’s as a string and there’s no common enum for the states
> of all types of group. As a result, I have exposed group type but not
> state on this API.
>
> Previously in the discussion for this KIP, I mentioned that I would
> create another KIP for the administration of groups, in particular
> how the administrator can ensure that particular group IDs
> are used for the group type they desire. At the moment, I think
> keeping ListGroups in this KIP is a good idea. If we actually want
> to make it more sophisticated, perhaps that would be better with
> the group administration KIP.
>
> 163. It will be one higher than the latest version at the time we are
> ready to deliver this feature for real. When we are on the cusp of
> delivery, I’ll update the KIP with the final value.
>
> 164. KRaft only. All the RPCs are “broker” only. None of the code will
> be merged until after 3.8 has branched.
>
> Thanks,
> Andrew
>
> > On 4 May 2024, at 00:12, Jun Rao  wrote:
> >
> > Hi, Andrew,
> >
> > Thanks for the reply. A few more comments.
> >
> > 161. ShareGroupListing.state() returns an optional, but
> > ShareGroupDescription.state() does not. Should we make them consistent?
> > Also, it seems a bit weird to return optional with an UNKNOWN state.
> >
> > 162. Should GroupListing include ProtocolType and GroupState too?
> >
> > 163. What is the value of group.version to gate the queuing feature?
> >
> > 164. Is the queueing feature only supported on KRaft clusters? For
> example,
> > the feature tool seems to be built only for the KRaft cluster.
> >
> > Jun
> >
> > On Fri, May 3, 2024 at 10:32 AM Andrew Schofield <
> andrew_schofi...@live.com>
> > wrote:
> >
> >> Hi Jun,
> >> Thanks for your reply.
> >>
> >> 147. Yes, I see what you mean. The rebalance latency will indeed
> >> be very short by comparison. I have removed the rebalance latency
> >> metrics from the client and retained the rebalance count and rate.
> >>
> >> 150. Yes, I think so. I have tweaked the text so that the simple
> >> assignor will take into account existing assignment information when
> >> it has it, which would just minimise unnecessary churn of (b).
> >>
> >> 158. I’ve changed it to ReadShareGroupStateSummary.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>> On 3 May 2024, at 22:17, Jun Rao  wrote:
> >>>
> >>> Hi, Andrew,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> 147. There seems to be some difference between consumer groups and
> share
> >>> groups. In the consumer groups, if a client receives a heartbeat
> response
> >>> to revoke some partitions, it may have to commit offsets before
> revoking
> >>> partitions or it may have to call the rebalance callbacks provided by
> the
> >>> user. This may take some time and can be reflected in the rebalance
> time
> >>> metric. In the share groups, none of that exists. If a client receives
> >> some
> >>> added/revoked partitions, it accepts them immediately, right? So, does
> >> that
> >>> practically make the rebalance time always 0?
> >>>
> >>> 150. I guess in the common case, there will be many more members than
> >>> partitions. So the need for (b) will be less common. We can probably
> >> leave
> >>> the persisting of the assignment out for now.
> >>>
> >>> 158. The new name sounds good to me.
> >>>
> >>> Jun
> >>>
> >>> On Thu, May 2, 2024 at 10:21 PM Andrew Schofield <
> >> andrew_schofi...@live.com>
> >>> wrote:
> >>>
>  Hi Jun,
>  Thanks for the response.
> 
>  147. I am trying to get a correspondence between the concepts and
>  metrics of consumer groups and share groups. In both cases,
>  the client doesn’t strictly know when the rebalance starts. All it
> knows
>  is when it has work to do in order to perform its part of a rebalance.
> 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2875

2024-05-06 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16675) Move rebalance callback test for positions to callbacks test file

2024-05-06 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-16675:
--

 Summary: Move rebalance callback test for positions to callbacks 
test file
 Key: KAFKA-16675
 URL: https://issues.apache.org/jira/browse/KAFKA-16675
 Project: Kafka
  Issue Type: Task
  Components: consumer
Reporter: Lianet Magrans
Assignee: Lianet Magrans


Integration test 
testGetPositionOfNewlyAssignedPartitionFromPartitionsAssignedCallback was added 
to the PlaintextConsumerTest.scala in this PR 
https://github.com/apache/kafka/pull/15856, as there was no specific file for 
testing callbacks at the moment. Another PR is in-flight, adding the file for 
callback-related tests, https://github.com/apache/kafka/pull/15408. Once 15408 
gets merged, we should move 
testGetPositionOfNewlyAssignedPartitionFromPartitionsAssignedCallback to it.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16674) Adjust classicGroupJoinToConsumerGroup to add subscription model

2024-05-06 Thread Dongnuo Lyu (Jira)
Dongnuo Lyu created KAFKA-16674:
---

 Summary: Adjust classicGroupJoinToConsumerGroup to add 
subscription model
 Key: KAFKA-16674
 URL: https://issues.apache.org/jira/browse/KAFKA-16674
 Project: Kafka
  Issue Type: Sub-task
Reporter: Dongnuo Lyu


[https://github.com/apache/kafka/pull/15785] adds subscription model to the 
group state that affects `classicGroupJoinToConsumerGroup`. We'll need to make 
adjustment to comply with the change once #15785 is merged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16673) Optimize toTopicPartitions with ConsumerProtocolSubscription

2024-05-06 Thread Dongnuo Lyu (Jira)
Dongnuo Lyu created KAFKA-16673:
---

 Summary: Optimize toTopicPartitions with 
ConsumerProtocolSubscription
 Key: KAFKA-16673
 URL: https://issues.apache.org/jira/browse/KAFKA-16673
 Project: Kafka
  Issue Type: Sub-task
Reporter: Dongnuo Lyu


https://github.com/apache/kafka/pull/15798#discussion_r1582981154



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1035: StateStore managed changelog offsets

2024-05-06 Thread Bruno Cadonna

Hi Matthias,

I see what you mean.

To sum up:

With this KIP the .checkpoint file is written when the store closes. 
That is when:

1. a task moves away from Kafka Streams client
2. Kafka Streams client shuts down

A Kafka Streams client needs the information in the .checkpoint file
1. on startup because it does not have any open stores yet.
2. during rebalances for non-empty state directories of tasks that are 
not assigned to the Kafka Streams client.


With hard crashes, i.e., when the Streams client is not able to close 
its state stores and write the .checkpoint file, the .checkpoint file 
might be quite stale. That influences the next rebalance after failover 
negatively.



My conclusion is that Kafka Streams either needs to open the state 
stores at start up or we write the checkpoint file more often.


Writing the .checkpoint file during processing more often without 
controlling the flush to disk would work. However, Kafka Streams would 
checkpoint offsets that are not yet persisted on disk by the state 
store. That is with a hard crash the offsets in the .checkpoint file 
might be larger than the offsets checkpointed in the state store. That 
might not be a problem if Kafka Streams uses the .checkpoint file only 
to compute the task lag. The downside is that it makes the managing of 
checkpoints more complex because now we have to maintain two 
checkpoints: one for restoration and one for computing the task lag.
I think we should explore the option where Kafka Streams opens the state 
stores at start up to get the offsets.


I also checked when Kafka Streams needs the checkpointed offsets to 
compute the task lag during a rebalance. Turns out Kafka Streams needs 
them before sending the join request. Now, I am wondering if opening the 
state stores of unassigned tasks whose state directory exists locally is 
actually such a big issue due to the expected higher latency since it 
happens actually before the Kafka Streams client joins the rebalance.


Best,
Bruno







On 5/4/24 12:05 AM, Matthias J. Sax wrote:
That's good questions... I could think of a few approaches, but I admit 
it might all be a little bit tricky to code up...


However if we don't solve this problem, I think this KIP does not really 
solve the core issue we are facing? In the end, if we rely on the 
`.checkpoint` file to compute a task assignment, but the `.checkpoint` 
file can be arbitrary stale after a crash because we only write it on a 
clean close, there would be still a huge gap that this KIP does not close?


For the case in which we keep the checkpoint file, this KIP would still 
help for "soft errors" in which KS can recover, and roll back the store. 
A significant win for sure. -- But hard crashes would still be an 
problem? We might assign tasks to "wrong" instance, ie, which are not 
most up to date, as the checkpoint information could be very outdated? 
Would we end up with a half-baked solution? Would this be good enough to 
justify the introduced complexity? In the, for soft failures it's still 
a win. Just want to make sure we understand the limitations and make an 
educated decision.


Or do I miss something?


-Matthias

On 5/3/24 10:20 AM, Bruno Cadonna wrote:

Hi Matthias,


200:
I like the idea in general. However, it is not clear to me how the 
behavior should be with multiple stream threads in the same Kafka 
Streams client. What stream thread opens which store? How can a stream 
thread pass an open store to another stream thread that got the 
corresponding task assigned? How does a stream thread know that a task 
was not assigned to any of the stream threads of the Kafka Streams 
client? I have the feeling we should just keep the .checkpoint file on 
close for now to unblock this KIP and try to find a solution to get 
totally rid of it later.



Best,
Bruno



On 5/3/24 6:29 PM, Matthias J. Sax wrote:
101: Yes, but what I am saying is, that we don't need to flush the 
.position file to disk periodically, but only maintain it in main 
memory, and only write it to disk on close() to preserve it across 
restarts. This way, it would never be ahead, but might only lag? But 
with my better understanding about (102) it might be mood anyway...



102: Thanks for clarifying. Looked into the code now. Makes sense. 
Might be something to be worth calling out explicitly in the KIP 
writeup. -- Now that I realize that the position is tracked inside 
the store (not outside as the changelog offsets) it makes much more 
sense to pull position into RocksDB itself. In the end, it's actually 
a "store implementation" detail how it tracks the position (and kinda 
leaky abstraction currently, that we re-use the checkpoint file 
mechanism to track it and flush to disk).



200: I was thinking about this a little bit more, and maybe it's not 
too bad? When KS starts up, we could upon all stores we find on local 
disk pro-actively, and keep them all open until the first rebalance 
finishes: For tasks we get assigned, we 

[jira] [Resolved] (KAFKA-16393) SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly

2024-05-06 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16393.

Fix Version/s: 3.8.0
   Resolution: Fixed

> SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly
> ---
>
> Key: KAFKA-16393
> URL: https://issues.apache.org/jira/browse/KAFKA-16393
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Minor
> Fix For: 3.8.0
>
>
> As of Kafka 3.7.0, SslTransportLayer.write(ByteBuffer[], int, int) is 
> implemented like below:
> {code:java}
> public long write(ByteBuffer[] srcs, int offset, int length) throws 
> IOException {
> ...
> int i = offset;
> while (i < length) {
> if (srcs[i].hasRemaining() || hasPendingWrites()) {
> 
> {code}
> The loop index starts at `offset` and ends with `length`.
> However this isn't correct because end-index should be `offset + length`.
> Let's say we have the array of ByteBuffer with length = 5 and try calling 
> this method with offset = 3, length = 1.
> In current code, `write(srcs, 3, 1)` doesn't attempt any write because the 
> loop condition is immediately false.
> For now, seems this method is only called with args offset = 0, length = 
> srcs.length in Kafka code base so not causing any problem though, we should 
> fix this because this could introduce subtle bug if use this method with 
> different args in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-1018: Introduce max remote fetch timeout config

2024-05-06 Thread Kamal Chandraprakash
Hi all,

We would like to start a voting thread for KIP-1018: Introduce
max remote fetch timeout config for DelayedRemoteFetch requests.

The KIP is available on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests

If you have any suggestions, feel free to participate in the discussion
thread:
https://lists.apache.org/thread/9x21hzpxzmrt7xo4vozl17d70fkg3chk

--
Kamal


Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-05-06 Thread Kamal Chandraprakash
Hi all,

If there are no more comments, I'll open a VOTE thread.

--
Kamal

On Sat, May 4, 2024 at 8:39 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Luke,
>
> Thanks for the review!
>
> DelayedFetch and DelayedRemoteFetch are orthogonal calls
> 
> .
> Only one of them will be active in a given FETCH request.
>
> The fetch request might take more than 500 ms when the time
> taken to read data from remote storage exceeds 500 ms and
> `remote.fetch.max.wait.ms` is configured higher than 500 ms.
>
> --
> Kamal
>
>
> On Fri, May 3, 2024 at 1:55 PM Luke Chen  wrote:
>
>> Hi Kamal,
>>
>> Thanks for the KIP!
>> Sorry for the late review.
>>
>> Overall LGTM! Just 1 question:
>>
>> If one fetch request contains 2 partitions: [p1, p2]
>> fetch.max.wait.ms: 500, remote.fetch.max.wait.ms: 1000
>>
>> And now, p1 fetch offset is the log end offset and has no new data coming,
>> and p2 fetch offset is to fetch from remote storage.
>> And suppose the fetch from remote storage takes 1000ms.
>> So, question:
>> Will this fetch request return in 500ms or 1000ms?
>> And what will be returned?
>>
>> I think before this change, it'll return within 500ms, right?
>> But it's not clear what behavior it will be after this KIP.
>>
>> Thank you.
>> Luke
>>
>>
>> On Fri, May 3, 2024 at 1:56 PM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>> > Christo,
>> >
>> > We have localTimeMs, remoteTimeMs, and totalTimeMs as part of the
>> > FetchConsumer request metric.
>> >
>> >
>> >
>> kafka.network:type=RequestMetrics,name={LocalTimeMs|RemoteTimeMs|TotalTimeMs},request={Produce|FetchConsumer|FetchFollower}
>> >
>> > RemoteTimeMs refers to the amount of time spent in the purgatory for
>> normal
>> > fetch requests
>> > and amount of time spent in reading the remote data for remote-fetch
>> > requests. Do we want
>> > to have a separate `TieredStorageTimeMs` to capture the time spent in
>> > remote-read requests?
>> >
>> > With per-broker level timer metrics combined with the request level
>> > metrics, the user will have
>> > sufficient information.
>> >
>> > Metric name =
>> >
>> >
>> kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs
>> >
>> > --
>> > Kamal
>> >
>> > On Mon, Apr 29, 2024 at 1:38 PM Christo Lolov 
>> > wrote:
>> >
>> > > Heya!
>> > >
>> > > Is it difficult to instead add the metric at
>> > > kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other
>> > > name=*)? Alternatively, if it is difficult to add it there, is it
>> > possible
>> > > to add 2 metrics, one at the RequestMetrics level (even if it is
>> > > total-time-ms - (all other times)) and one at what you are proposing?
>> As
>> > an
>> > > operator I would find it strange to not see the metric in the
>> > > RequestMetrics.
>> > >
>> > > Your thoughts?
>> > >
>> > > Best,
>> > > Christo
>> > >
>> > > On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash <
>> > > kamal.chandraprak...@gmail.com> wrote:
>> > >
>> > > > Christo,
>> > > >
>> > > > Updated the KIP with the remote fetch latency metric. Please take
>> > another
>> > > > look!
>> > > >
>> > > > --
>> > > > Kamal
>> > > >
>> > > > On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash <
>> > > > kamal.chandraprak...@gmail.com> wrote:
>> > > >
>> > > > > Hi Federico,
>> > > > >
>> > > > > Thanks for the suggestion! Updated the config name to "
>> > > > > remote.fetch.max.wait.ms".
>> > > > >
>> > > > > Christo,
>> > > > >
>> > > > > Good point. We don't have the remote-read latency metrics to
>> measure
>> > > the
>> > > > > performance of the remote read requests. I'll update the KIP to
>> emit
>> > > this
>> > > > > metric.
>> > > > >
>> > > > > --
>> > > > > Kamal
>> > > > >
>> > > > >
>> > > > > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri <
>> > fedeval...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> Hi Kamal, it looks like all TS configurations starts with
>> "remote."
>> > > > >> prefix, so I was wondering if we should name it
>> > > > >> "remote.fetch.max.wait.ms".
>> > > > >>
>> > > > >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash
>> > > > >>  wrote:
>> > > > >> >
>> > > > >> > Hi all,
>> > > > >> >
>> > > > >> > If there are no more comments, I'll start a vote thread by
>> > tomorrow.
>> > > > >> > Please review the KIP.
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> > Kamal
>> > > > >> >
>> > > > >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash <
>> > > > >> > kamal.chandraprak...@gmail.com> wrote:
>> > > > >> >
>> > > > >> > > Hi all,
>> > > > >> > >
>> > > > >> > > Bumping the thread. Please review this KIP. Thanks!
>> > > > >> > >
>> > > > >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash <
>> > > > >> > > kamal.chandraprak...@gmail.com> wrote:
>> > > > >> > >
>> > > > >> > >> Hi Jorge,
>> > > > >> > >>
>> > > > >> > >> Thanks for the review! Added your suggestions to the 

[jira] [Created] (KAFKA-16672) Fix flaky DedicatedMirrorIntegrationTest.testMultiNodeCluster

2024-05-06 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16672:
--

 Summary: Fix flaky 
DedicatedMirrorIntegrationTest.testMultiNodeCluster
 Key: KAFKA-16672
 URL: https://issues.apache.org/jira/browse/KAFKA-16672
 Project: Kafka
  Issue Type: Test
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


It is flaky on my jenkins, and sometimes it fails in Kafka 
[CI|https://ge.apache.org/scans/tests?search.buildOutcome=failure=gradle=P28D=kafka=Asia%2FTaipei=org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest=testMultiNodeCluster()]

The error happens in virtue of race condition. `KafkaBasedLog` loads records 
from topic via thread, so `RebalanceNeededException` will be thrown if we check 
the task configs too soon. It seems to me `RebalanceNeededException` is a 
temporary exception so we should treat it as a retryable exception in waiting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16671) Revisit SessionedProtocolIntegrationTest.ensureInternalEndpointIsSecured

2024-05-06 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16671:
--

 Summary: Revisit 
SessionedProtocolIntegrationTest.ensureInternalEndpointIsSecured
 Key: KAFKA-16671
 URL: https://issues.apache.org/jira/browse/KAFKA-16671
 Project: Kafka
  Issue Type: Test
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


loop 1000times on my local, and all pass. Let's enable the test to see what 
happens in our CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)