Re: [VOTE] KIP-971: Expose replication-offset-lag MirrorMaker2 metric

2024-01-08 Thread Dániel Urbán
Hi Elxan,
+1 (non-binding)
Thanks for the KIP, this will be a very useful metric for MM!
Daniel

Elxan Eminov  ezt írta (időpont: 2024. jan. 7., V,
2:17):

> Hi all,
> Bumping this for visibility
>
> On Wed, 3 Jan 2024 at 18:13, Elxan Eminov  wrote:
>
> > Hi All,
> > I'd like to initiate a vote for KIP-971.
> > This KIP is about adding a new metric to the MirrorSourceTask that tracks
> > the offset lag between a source and a target partition.
> >
> > KIP link:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-971%3A+Expose+replication-offset-lag+MirrorMaker2+metric
> >
> > Discussion thread:
> > https://lists.apache.org/thread/gwq9jd75dnm8htmpqkn17bnks6h3wqwp
> >
> > Thanks!
> >
>


[VOTE] KIP-916: MM2 distributed mode flow log context

2023-07-10 Thread Dániel Urbán
Hello everyone,

I would like to start a vote on KIP-916: MM2 distributed mode flow log
context.
The KIP aims to improve the logging of MM2 distributed mode. It is a
relatively small change, but it has a big impact, as the current logs are
very hard to decipher. (The current logs are based on the KConnect logging
model, which assumes connector name uniqueness, which is not true in the
MM2 distributed mode.)

Thanks in advance,
Daniel


Re: [DISCUSS] KIP-932: Queues for Kafka

2023-07-10 Thread Dániel Urbán
Yes, I think it's clear now, thank you.
I agree that allowing reading behind the LSO would require more work on the
broker side (we would need 1 more state for the messages, and transition
when the LSO moves forward), but I don't see the extra complexity on the
consumer side. Based on the KIP so far, brokers will be able to return
specific batches/messages to queue consumers - consumers will need to be
able to skip messages in case another consumer of the same group has
already acquired/acked those. If we have this logic present in the protocol
and the clients, consumers could skip pending messages using the same
mechanism, and only the broker would need to know *why* exactly a specific
record/batch is skipped.

I don't think that this feature would be too important, but compared to the
complexity of the KIP, 1 more state doesn't seem too complicated to me.

Thanks,
Daniel

Matthias J. Sax  ezt írta (időpont: 2023. júl. 10., H,
7:22):

> Daniel, sure.
>
> To allow the client to filter aborted messages, the broker currently
> attaches metadata that tell the client which records were aborted. But
> the first message after the LSO is a messages in pending state, ie, it
> was neither committed nor aborted yet, so it's not possible to filter or
> deliver it. Thus, the broker cannot provide this metadata (not sure if
> the client could filter without this metadata?)
>
> The main reason why this happens broker side is to avoid that the client
> needs to buffer pending messages "indefinitely" until the TX might
> eventually commit or abort, and thus put a lot a memory pressure on the
> client. For the "classic" case, the situation is  more severe as we
> guarantee ordered delivery, and thus, the client would need to buffer
> everything after the LSO. -- While it's relaxed for queuing as we might
> not guarantee order (ie, instead of buffering everything, only pending
> messages must be buffered), it would still imply a huge additional
> burden on tracking metadata (for both the broker and the consumer), and
> the wire protocol, and I am already worried about the metadata we might
> need to track for queuing in general.
>
> Does this make sense?
>
>
> -Matthias
>
>
>
> On 7/7/23 01:35, Dániel Urbán wrote:
> > Hi Matthias,
> > Can you please elaborate on this: "First, you need to understand that
> > aborted records are filtered client side, and thus for "read-committed"
> we
> > can never read beyond the LSO, and the same seems to apply for queuing."
> > I don't understand the connection here - what does skipping aborted
> records
> > have to do with the LSO? As you said, aborted message filtering is done
> on
> > the client side (in consumers, yes, but not sure if it has to be the same
> > for queues), but being blocked on the LSO is the responsibility of the
> > broker, isn't it? My thought was that the broker could act differently
> when
> > working with queues and read_committed isolation.
> > Thanks,
> > Daniel
> >
> > On Thu, Jul 6, 2023 at 7:26 PM Matthias J. Sax  wrote:
> >
> >> Thanks for the KIP.
> >>
> >> It seems we are in very early stage, and some very important sections in
> >> the KIP are still marked as TODO. In particular, I am curious about the
> >> protocol changes, how the "queuing state" will be represented and made
> >> durable, and all the error edge case / fail-over / fencing
> >> (broker/clients) that we need to put in place.
> >>
> >>
> >> A few other comments/question from my side:
> >>
> >> (1) Fetch from follower: this was already touched on, but the point is
> >> really that the consumer does not decide about it, but the broker does.
> >> When a consumer sends it's first fetch request it will always go to the
> >> leader, and the broker would reply to the consumer "go and fetch from
> >> this other broker". -- I think it's ok to exclude fetch from follower in
> >> the first version of the KIP, but it would need a broker change such
> >> that the broker knows it's a "queue fetch" request. -- It would also be
> >> worth to explore how fetch from follow could work in the future and
> >> ensure that our initial design allows for it and is future proof.
> >>
> >>
> >> (2) Why do we not allow pattern subscription and what happens if
> >> different consumers subscribe to different topics? It's not fully
> >> explained in the KIP.
> >>
> >>
> >> (3) auto.offset.reset and SPSO/SPSE -- I don't understand why we would
> >> not allow auto.offset.reset? In the discussion, you mentio

Re: [DISCUSS] KIP-932: Queues for Kafka

2023-07-07 Thread Dániel Urbán
y. If there is no
> good reason, should we ship with a good default but no hard-coded limit
> what the user can config?
>
> (9E) Should we have a client and broker config for all new configs (ie,
> broker set a max/min, but client can pick within those bounds)
>
>
>
> (10) Couple of question about certain statements:
>
> > If any records in the batch were not acknowledged, they remain acquired
> and will be presented to the application in response to a future poll.
>
> Do the un-acked records stay in the consumer buffer? Or would
> `ConsumerRecords` be "purged" for this case?
>
>
> > the share-partition leader guarantees that acknowledgements for the
> records in a batch are performed atomically.
>
> How are ackds for partial batches handled? And do you want to ensure
> atomicity?
>
>
> > and the Acquired state is not persisted. This minimises the amount of
> share-partition state that has to be logged.
>
> Is there any concerned about excessive re-delivery in case of error and
> a large "window"? What are the tradeoffs with regard to re-delivery vs
> maintaining too much meta-data?
>
>
> Thanks for reading all this...
>
>
>
> -Matthias
>
>
>
> On 7/1/23 4:42 AM, Kamal Chandraprakash wrote:
> > Hi Andrew,
> >
> > Thank you for the KIP -- interesting read. I have some questions:
> >
> > 101. "The calls to KafkaConsumer.acknowledge(ConsumerRecord,
> > AcknowledgeType) must be
> > issued in the order in which the records appear in the ConsumerRecords
> > object, which will
> > be in order of increasing offset for each share-partition"
> >
> > If the share-consumer uses thread pool internally and acknowledges the
> > records in out-of-order fashion.
> > Will this use case be supported? The "Managing durable share-partition
> > state" have transitions where the
> > records are ack'ed in out-of-order fashion so want to confirm this.
> >
> > 102. Will the configs be maintained in fine-grain per
> topic-to-share-group?
> > Some share-consumer groups
> > may want to increase the "record.lock.duration.ms" dynamically if record
> > processing is taking longer time
> > than usual during external system outage/downtime.
> >
> > 103. Can we also define whether all the consumer configs are eligible for
> > share-consumer-group. (eg)
> > `max.poll.interval.ms` default is 5 mins. Will this config have any
> effect
> > on the share consumers?
> >
> > 104. How will the consumer quota work? Will it be similar to the existing
> > consumer quota mechanism?
> >
> > --
> > Kamal
> >
> > On Wed, Jun 7, 2023 at 9:17 PM Andrew Schofield <
> > andrew_schofield_j...@outlook.com> wrote:
> >
> >> Hi Daniel,
> >> True, I see your point. It’s analogous to a KafkaConsumer fetching
> >> uncommitted records but not delivering them to the application.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 7 Jun 2023, at 16:38, Dániel Urbán  wrote:
> >>>
> >>> Hi Andrew,
> >>>
> >>> I think the "pending" state could be the solution for reading beyond
> the
> >>> LSO. Pending could indicate that a message is not yet available for
> >>> consumption (so they won't be offered for consumers), but with
> >> transactions
> >>> ending, they can become "available". With a pending state, records
> >> wouldn't
> >>> "disappear", they would simply not show up until they become available
> on
> >>> commit, or archived on abort.
> >>>
> >>> Regardless, I understand that this might be some extra, unwanted
> >>> complexity, I just thought that with the message ordering guarantee
> gone,
> >>> it would be a cool feature for share-groups. I've seen use-cases where
> >> the
> >>> LSO being blocked for an extended period of time caused huge lag for
> >>> traditional read_committed consumers, which could be completely avoided
> >> by
> >>> share-groups.
> >>>
> >>> Thanks,
> >>> Daniel
> >>>
> >>> Andrew Schofield  ezt írta
> (időpont:
> >>> 2023. jún. 7., Sze, 17:28):
> >>>
> >>>> Hi Daniel,
> >>>> Kind of. I don’t want a transaction abort to cause disappearance of
> >>>> records which are already in-flight. A “pending” state doesn’t seem
> >>>> helpful for read_committed. There’s no such disappea

Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-06-28 Thread Dániel Urbán
If there are no further comments, I will kick off a vote soon for the KIP.

Dániel Urbán  ezt írta (időpont: 2023. jún. 12., H,
11:27):

> Updated the KIP with a few example log lines before/after the change.
> Daniel
>
> Dániel Urbán  ezt írta (időpont: 2023. jún. 7.,
> Sze, 13:59):
>
>> Hi Chris,
>>
>> Thank you for your comments! I updated the KIP. I still need to add the
>> example before/after log lines, will do that soon, but I addressed all the
>> other points.
>> 1. Added more details on thread renaming under Public Interfaces, removed
>> pseudo code.
>> 2. Removed the stale header - originally, client.id related changes were
>> part of the KIP, and I failed removing all leftovers of that version.
>> 3. Threads listed under Public Interfaces with current/proposed names.
>> 4. Added a comment in the connect-log4j.properties, similar to the one
>> introduced in KIP-449. We don't have a dedicated MM2 log4j config, not sure
>> if we should introduce it here.
>> 5. Clarified testing section - I think thread names should not be tested
>> (they never were), but testing will focus on the new MDC context value.
>>
>> Thanks,
>> Daniel
>>
>> Chris Egerton  ezt írta (időpont: 2023. jún.
>> 5., H, 16:46):
>>
>>> Hi Daniel,
>>>
>>> Thanks for the updates! A few more thoughts:
>>>
>>> 1. The "Proposed Changes" section seems a bit like a pseudocode
>>> implementation of the KIP. We don't really need this level of detail;
>>> most
>>> of the time, we're just looking for implementation details that don't
>>> directly affect the user-facing changes proposed in the "Public
>>> Interfaces"
>>> section but are worth mentioning because they (1) demonstrate how the
>>> user-facing changes are made possible, (2) indirectly affect user-facing
>>> behavior, or (3) go into more detail (like providing examples) about the
>>> user-facing behavior. For this KIP, I think examples of user-facing
>>> behavior (like before/after of thread names and log messages) and
>>> possibly
>>> an official description of the scope of the changes (which threads are
>>> going to be renamed and/or include the new MDC key, and which aren't?)
>>> are
>>> all that we'd really need in this section; everything else is fairly
>>> clear
>>> IMO. FWIW, the reason we want to discourage going into too much detail
>>> with
>>> KIPs is that it can quickly devolve into code review over mailing list,
>>> which can hold KIPs up for longer than necessary when the core design
>>> changes they contain are already basically accepted by everyone.
>>>
>>> 2. The "MM2 distributed mode client.id and log change" header seems
>>> like it
>>> may no longer be accurate; the contents do not mention any changes to
>>> client IDs. I might be missing something though; please correct me if I
>>> am.
>>>
>>> 3. Can you provide some before/after examples of what thread names and
>>> log
>>> messages will look like? I'm wondering about the thread that the
>>> distributed herder runs on, threads for connectors and tasks, and threads
>>> for polling internal topics (which we currently manage with the
>>> KafkaBasedLog class). It's fine if some of these are unchanged, I just
>>> want
>>> to better understand the scope of the proposed changes and get an idea of
>>> how they may appear to users.
>>>
>>> 4. There's no mention of changes to the default Log4j config files that
>>> we
>>> ship. Is this intentional? I feel like we need some way for users to
>>> easily
>>> discover this feature; if we're not going to touch our default Log4j
>>> config
>>> files, is there another way that we can expect users to find out about
>>> the
>>> new MDC key?
>>>
>>> 5. RE the "Test Plan" section: can you provide a little more detail of
>>> which cases we'll be covering with the proposed unit tests? Will we be
>>> verifying that the MDC context is set in various places? If so, which
>>> places? And the same with thread names? (There doesn't have to be a ton
>>> of
>>> detail, but a little more than "unit tests" would be nice )
>>>
>>> Cheers,
>>>
>>> Chris
>>>
>>> On Mon, Jun 5, 2023 at 9:45 AM Dániel Urbán 
>>> wrote:
>>>
>>> > I updated the KIP accordingly. Tried to come up with more generic

Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener

2023-06-28 Thread Dániel Urbán
Hi Chris,

Thank you for your comments, and sorry for the late reply.

1. Having a single, connector-reported metric for the topics, and another
one for the groups sounds good to me. The only issue I see here is that I'm
not familiar with any non-primitive metrics in the Kafka codebase, and
don't know if introducing a map/list type metric value will be a problem.
2. The intention is to provide the full set each time. A delta based
approach could be possible, but I think it would be an unnecessary
complication. If we go with a metric instead, we should just stick to the
full set.

I will update the KIP with the metric based approach.

Thanks,
Daniel

Chris Egerton  ezt írta (időpont: 2023. jún. 6.,
K, 16:32):

> Hi Daniel,
>
> Thanks for the KIP! I see the value in exposing information on replicated
> topics and groups. For one, it matches a similar feature added to Kafka
> Connect in KIP-558 [1], where we started tracking the set of topics that
> connectors interacted with over their lifetime. And there's also the use
> case you provided about identifying the provenance of topics replicated
> with the identity replication policy (or really, any policy that doesn't
> preserve information about the source cluster). Finally, it seems like a
> decent debugging aid for prototyping and initially standing up MM2
> clusters, and a liveness check for existing ones.
>
> Here are my thoughts so far:
>
> 1. I know that MM2 has a lot of pluggable interfaces already but I'm always
> a little hesitant to introduce another one. One alternative could be to add
> new metrics for the sets of replicated topics and groups. Users can already
> implement pluggable metrics reporters [2], which could be a substitute for
> the listener interface proposed in the KIP.
>
> 2. Is the intention to provide the listener with the total current set of
> replicated groups and topics every time that set is computed? Or is the
> listener given the complete set the first time and a delta other times?
> Based on the type signatures of the interface methods I'm guessing it's the
> former, but based on the names (which use the word "changed") it seems like
> the latter. If we use complete sets, I think "refreshed" or "computed" may
> be better as suffixes, or we could possibly use "replicated" as a prefix
> ("replicatedTopics", "replicatedGroups").
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-558%3A+Track+the+set+of+actively+used+topics+by+connectors+in+Kafka+Connect
> [2] -
>
> https://kafka.apache.org/34/javadoc/org/apache/kafka/common/metrics/MetricsReporter.html
>
> Cheers,
>
> Chris
>
> On Fri, Apr 21, 2023 at 9:00 AM Dániel Urbán 
> wrote:
>
> > Thanks for the comments Viktor.
> > 1. My original motivation was IdentityReplicationPolicy based monitoring.
> > The current MirrorClient implementation cannot list the replica topics of
> > the target cluster. I think relying on the topic-partition level metrics
> is
> > a complex solution. Instead, I would like to make it simple to collect
> all
> > the replicated topics of a flow, without relying on the name of the
> topics.
> > Then, I simply tried to generalize the approach.
> > 2. Checkpoint metrics are reported per (group, topic, partition), it
> means
> > that there is no metric associated with a group. If a filter picks up a
> > group, but the group doesn't have committed offsets for any of the
> > replicated partitions, there is no metric to be eagerly registered. This
> is
> > a difference between how topic replication and group checkpointing works
> -
> > empty topics are still picked up for partition creation and to consume
> from
> > them. Groups are only picked up if they have committed offsets already.
> > 3. Not exactly sure what is the added value of listing all
> > topic-partitions, but that information is available where the filtering
> > happens. For groups, we don't have anything else besides the group name,
> so
> > we cannot really provide more info at that point without significantly
> > changing the refresh group logic.
> >
> > Thanks,
> > Daniel
> >
> > Viktor Somogyi-Vass  ezt írta
> > (időpont: 2023. ápr. 21., P, 11:43):
> >
> > > Hi all,
> > >
> > > A couple of comments:
> > > 1) Regarding the motivation: is the motivation simply monitoring
> related
> > or
> > > are there any other reasons to this?
> > > 2) Can we change monitoring to be identical to filters, so that what is
> > > actively filtered, we monitor exactly those topics and groups? (So
> group
> > > metrics aren't added lazily when

Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-06-12 Thread Dániel Urbán
Updated the KIP with a few example log lines before/after the change.
Daniel

Dániel Urbán  ezt írta (időpont: 2023. jún. 7., Sze,
13:59):

> Hi Chris,
>
> Thank you for your comments! I updated the KIP. I still need to add the
> example before/after log lines, will do that soon, but I addressed all the
> other points.
> 1. Added more details on thread renaming under Public Interfaces, removed
> pseudo code.
> 2. Removed the stale header - originally, client.id related changes were
> part of the KIP, and I failed removing all leftovers of that version.
> 3. Threads listed under Public Interfaces with current/proposed names.
> 4. Added a comment in the connect-log4j.properties, similar to the one
> introduced in KIP-449. We don't have a dedicated MM2 log4j config, not sure
> if we should introduce it here.
> 5. Clarified testing section - I think thread names should not be tested
> (they never were), but testing will focus on the new MDC context value.
>
> Thanks,
> Daniel
>
> Chris Egerton  ezt írta (időpont: 2023. jún. 5.,
> H, 16:46):
>
>> Hi Daniel,
>>
>> Thanks for the updates! A few more thoughts:
>>
>> 1. The "Proposed Changes" section seems a bit like a pseudocode
>> implementation of the KIP. We don't really need this level of detail; most
>> of the time, we're just looking for implementation details that don't
>> directly affect the user-facing changes proposed in the "Public
>> Interfaces"
>> section but are worth mentioning because they (1) demonstrate how the
>> user-facing changes are made possible, (2) indirectly affect user-facing
>> behavior, or (3) go into more detail (like providing examples) about the
>> user-facing behavior. For this KIP, I think examples of user-facing
>> behavior (like before/after of thread names and log messages) and possibly
>> an official description of the scope of the changes (which threads are
>> going to be renamed and/or include the new MDC key, and which aren't?) are
>> all that we'd really need in this section; everything else is fairly clear
>> IMO. FWIW, the reason we want to discourage going into too much detail
>> with
>> KIPs is that it can quickly devolve into code review over mailing list,
>> which can hold KIPs up for longer than necessary when the core design
>> changes they contain are already basically accepted by everyone.
>>
>> 2. The "MM2 distributed mode client.id and log change" header seems like
>> it
>> may no longer be accurate; the contents do not mention any changes to
>> client IDs. I might be missing something though; please correct me if I
>> am.
>>
>> 3. Can you provide some before/after examples of what thread names and log
>> messages will look like? I'm wondering about the thread that the
>> distributed herder runs on, threads for connectors and tasks, and threads
>> for polling internal topics (which we currently manage with the
>> KafkaBasedLog class). It's fine if some of these are unchanged, I just
>> want
>> to better understand the scope of the proposed changes and get an idea of
>> how they may appear to users.
>>
>> 4. There's no mention of changes to the default Log4j config files that we
>> ship. Is this intentional? I feel like we need some way for users to
>> easily
>> discover this feature; if we're not going to touch our default Log4j
>> config
>> files, is there another way that we can expect users to find out about the
>> new MDC key?
>>
>> 5. RE the "Test Plan" section: can you provide a little more detail of
>> which cases we'll be covering with the proposed unit tests? Will we be
>> verifying that the MDC context is set in various places? If so, which
>> places? And the same with thread names? (There doesn't have to be a ton of
>> detail, but a little more than "unit tests" would be nice )
>>
>> Cheers,
>>
>> Chris
>>
>> On Mon, Jun 5, 2023 at 9:45 AM Dániel Urbán 
>> wrote:
>>
>> > I updated the KIP accordingly. Tried to come up with more generic terms
>> in
>> > the Connect code instead of referring to "flow" anywhere.
>> > Daniel
>> >
>> > Dániel Urbán  ezt írta (időpont: 2023. jún. 5.,
>> H,
>> > 14:49):
>> >
>> > > Hi Chris,
>> > >
>> > > Thank you for your comments!
>> > >
>> > > I agree that the toString based logging is not ideal, and I believe
>> all
>> > > occurrences are within a proper logging context, so they can be
>> ignored.
>> > > If

Re: [DISCUSS] KIP-932: Queues for Kafka

2023-06-07 Thread Dániel Urbán
Hi Andrew,

I think the "pending" state could be the solution for reading beyond the
LSO. Pending could indicate that a message is not yet available for
consumption (so they won't be offered for consumers), but with transactions
ending, they can become "available". With a pending state, records wouldn't
"disappear", they would simply not show up until they become available on
commit, or archived on abort.

Regardless, I understand that this might be some extra, unwanted
complexity, I just thought that with the message ordering guarantee gone,
it would be a cool feature for share-groups. I've seen use-cases where the
LSO being blocked for an extended period of time caused huge lag for
traditional read_committed consumers, which could be completely avoided by
share-groups.

Thanks,
Daniel

Andrew Schofield  ezt írta (időpont:
2023. jún. 7., Sze, 17:28):

> Hi Daniel,
> Kind of. I don’t want a transaction abort to cause disappearance of
> records which are already in-flight. A “pending” state doesn’t seem
> helpful for read_committed. There’s no such disappearance problem
> for read_uncommitted.
>
> Thanks,
> Andrew
>
> > On 7 Jun 2023, at 16:19, Dániel Urbán  wrote:
> >
> > Hi Andrew,
> >
> > I agree with having a single isolation.level for the whole group, it
> makes
> > sense.
> > As for:
> > "b) The default isolation level for a share group is read_committed, in
> > which case
> > the SPSO and SPEO cannot move past the LSO."
> >
> > With this limitation (SPEO not moving beyond LSO), are you trying to
> avoid
> > handling the complexity of some kind of a "pending" state for the
> > uncommitted in-flight messages?
> >
> > Thanks,
> > Daniel
> >
> > Andrew Schofield  ezt írta (időpont:
> > 2023. jún. 7., Sze, 16:52):
> >
> >> HI Daniel,
> >> I’ve been thinking about this question and I think this area is a bit
> >> tricky.
> >>
> >> If there are some consumers in a share group with isolation level
> >> read_uncommitted
> >> and other consumers with read_committed, they have different
> expectations
> >> with
> >> regards to which messages are visible when EOS comes into the picture.
> >> It seems to me that this is not necessarily a good thing.
> >>
> >> One option would be to support just read_committed in KIP-932. This
> means
> >> it is unambiguous which records are in-flight, because they’re only
> >> committed
> >> ones.
> >>
> >> Another option would be to have the entire share group have an isolation
> >> level,
> >> which again gives an unambiguous set of in-flight records but without
> the
> >> restriction of permitting just read_committed behaviour.
> >>
> >> So, my preference is for the following:
> >> a) A share group has an isolation level that applies to all consumers in
> >> the group.
> >> b) The default isolation level for a share group is read_committed, in
> >> which case
> >> the SPSO and SPEO cannot move past the LSO.
> >> c) For a share group with read_uncommited isolation level, the SPSO and
> >> SPEO
> >> can move past the LSO.
> >> d) The kafka_configs.sh tool or the AdminClient can be used to set a
> >> non-default
> >> value for the isolation level for a share group. The value is applied
> when
> >> the first
> >> member joins.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 2 Jun 2023, at 10:02, Dániel Urbán  wrote:
> >>>
> >>> Hi Andrew,
> >>> Thank you for the clarification. One follow-up to read_committed mode:
> >>> Taking the change in message ordering guarantees into account, does
> this
> >>> mean that in queues, share-group consumers will be able to consume
> >>> committed records AFTER the LSO?
> >>> Thanks,
> >>> Daniel
> >>>
> >>> Andrew Schofield  ezt írta
> (időpont:
> >>> 2023. jún. 2., P, 10:39):
> >>>
> >>>> Hi Daniel,
> >>>> Thanks for your questions.
> >>>>
> >>>> 1) Yes, read_committed fetch will still be possible.
> >>>>
> >>>> 2) You weren’t wrong that this is a broad question :)
> >>>>
> >>>> Broadly speaking, I can see two ways of managing the in-flight
> records:
> >>>> the share-partition leader does it, or the share-group coordinator
> does
> >> it.
> >>>> I want to choose

Re: [DISCUSS] KIP-932: Queues for Kafka

2023-06-07 Thread Dániel Urbán
Hi Andrew,

I agree with having a single isolation.level for the whole group, it makes
sense.
As for:
"b) The default isolation level for a share group is read_committed, in
which case
the SPSO and SPEO cannot move past the LSO."

With this limitation (SPEO not moving beyond LSO), are you trying to avoid
handling the complexity of some kind of a "pending" state for the
uncommitted in-flight messages?

Thanks,
Daniel

Andrew Schofield  ezt írta (időpont:
2023. jún. 7., Sze, 16:52):

> HI Daniel,
> I’ve been thinking about this question and I think this area is a bit
> tricky.
>
> If there are some consumers in a share group with isolation level
> read_uncommitted
> and other consumers with read_committed, they have different expectations
> with
> regards to which messages are visible when EOS comes into the picture.
> It seems to me that this is not necessarily a good thing.
>
> One option would be to support just read_committed in KIP-932. This means
> it is unambiguous which records are in-flight, because they’re only
> committed
> ones.
>
> Another option would be to have the entire share group have an isolation
> level,
> which again gives an unambiguous set of in-flight records but without the
> restriction of permitting just read_committed behaviour.
>
> So, my preference is for the following:
> a) A share group has an isolation level that applies to all consumers in
> the group.
> b) The default isolation level for a share group is read_committed, in
> which case
> the SPSO and SPEO cannot move past the LSO.
> c) For a share group with read_uncommited isolation level, the SPSO and
> SPEO
> can move past the LSO.
> d) The kafka_configs.sh tool or the AdminClient can be used to set a
> non-default
> value for the isolation level for a share group. The value is applied when
> the first
> member joins.
>
> Thanks,
> Andrew
>
> > On 2 Jun 2023, at 10:02, Dániel Urbán  wrote:
> >
> > Hi Andrew,
> > Thank you for the clarification. One follow-up to read_committed mode:
> > Taking the change in message ordering guarantees into account, does this
> > mean that in queues, share-group consumers will be able to consume
> > committed records AFTER the LSO?
> > Thanks,
> > Daniel
> >
> > Andrew Schofield  ezt írta (időpont:
> > 2023. jún. 2., P, 10:39):
> >
> >> Hi Daniel,
> >> Thanks for your questions.
> >>
> >> 1) Yes, read_committed fetch will still be possible.
> >>
> >> 2) You weren’t wrong that this is a broad question :)
> >>
> >> Broadly speaking, I can see two ways of managing the in-flight records:
> >> the share-partition leader does it, or the share-group coordinator does
> it.
> >> I want to choose what works best, and I happen to have started with
> trying
> >> the share-partition leader doing it. This is just a whiteboard exercise
> at
> >> the
> >> moment, looking at the potential protocol flows and how well it all
> hangs
> >> together. When I have something coherent and understandable and worth
> >> reviewing, I’ll update the KIP with a proposal.
> >>
> >> I think it’s probably worth doing a similar exercise for the share-group
> >> coordinator way too. There are bound to be pros and cons, and I don’t
> >> really
> >> mind which way prevails.
> >>
> >> If the share-group coordinator does it, I already have experience of
> >> efficient
> >> storage of in-flight record state in a way that scales and is
> >> space-efficient.
> >> If the share-partition leader does it, storage of in-flight state is a
> bit
> >> more
> >> tricky.
> >>
> >> I think it’s worth thinking ahead to how EOS will work and also another
> >> couple of enhancements (key-based ordering and acquisition lock
> >> extension) so it’s somewhat future-proof.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 1 Jun 2023, at 11:51, Dániel Urbán  wrote:
> >>>
> >>> Hi Andrew,
> >>>
> >>> Thank you for the KIP, exciting work you are doing :)
> >>> I have 2 questions:
> >>> 1. I understand that EOS won't be supported for share-groups (yet), but
> >>> read_committed fetch will still be possible, correct?
> >>>
> >>> 2. I have a very broad question about the proposed solution: why not
> let
> >>> the share-group coordinator manage the states of the in-flight records?
> >>> I'm asking this because it seems to me that using the same pattern as
> the
> &g

Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-06-07 Thread Dániel Urbán
Hi Chris,

Thank you for your comments! I updated the KIP. I still need to add the
example before/after log lines, will do that soon, but I addressed all the
other points.
1. Added more details on thread renaming under Public Interfaces, removed
pseudo code.
2. Removed the stale header - originally, client.id related changes were
part of the KIP, and I failed removing all leftovers of that version.
3. Threads listed under Public Interfaces with current/proposed names.
4. Added a comment in the connect-log4j.properties, similar to the one
introduced in KIP-449. We don't have a dedicated MM2 log4j config, not sure
if we should introduce it here.
5. Clarified testing section - I think thread names should not be tested
(they never were), but testing will focus on the new MDC context value.

Thanks,
Daniel

Chris Egerton  ezt írta (időpont: 2023. jún. 5.,
H, 16:46):

> Hi Daniel,
>
> Thanks for the updates! A few more thoughts:
>
> 1. The "Proposed Changes" section seems a bit like a pseudocode
> implementation of the KIP. We don't really need this level of detail; most
> of the time, we're just looking for implementation details that don't
> directly affect the user-facing changes proposed in the "Public Interfaces"
> section but are worth mentioning because they (1) demonstrate how the
> user-facing changes are made possible, (2) indirectly affect user-facing
> behavior, or (3) go into more detail (like providing examples) about the
> user-facing behavior. For this KIP, I think examples of user-facing
> behavior (like before/after of thread names and log messages) and possibly
> an official description of the scope of the changes (which threads are
> going to be renamed and/or include the new MDC key, and which aren't?) are
> all that we'd really need in this section; everything else is fairly clear
> IMO. FWIW, the reason we want to discourage going into too much detail with
> KIPs is that it can quickly devolve into code review over mailing list,
> which can hold KIPs up for longer than necessary when the core design
> changes they contain are already basically accepted by everyone.
>
> 2. The "MM2 distributed mode client.id and log change" header seems like
> it
> may no longer be accurate; the contents do not mention any changes to
> client IDs. I might be missing something though; please correct me if I am.
>
> 3. Can you provide some before/after examples of what thread names and log
> messages will look like? I'm wondering about the thread that the
> distributed herder runs on, threads for connectors and tasks, and threads
> for polling internal topics (which we currently manage with the
> KafkaBasedLog class). It's fine if some of these are unchanged, I just want
> to better understand the scope of the proposed changes and get an idea of
> how they may appear to users.
>
> 4. There's no mention of changes to the default Log4j config files that we
> ship. Is this intentional? I feel like we need some way for users to easily
> discover this feature; if we're not going to touch our default Log4j config
> files, is there another way that we can expect users to find out about the
> new MDC key?
>
> 5. RE the "Test Plan" section: can you provide a little more detail of
> which cases we'll be covering with the proposed unit tests? Will we be
> verifying that the MDC context is set in various places? If so, which
> places? And the same with thread names? (There doesn't have to be a ton of
> detail, but a little more than "unit tests" would be nice )
>
> Cheers,
>
> Chris
>
> On Mon, Jun 5, 2023 at 9:45 AM Dániel Urbán  wrote:
>
> > I updated the KIP accordingly. Tried to come up with more generic terms
> in
> > the Connect code instead of referring to "flow" anywhere.
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2023. jún. 5.,
> H,
> > 14:49):
> >
> > > Hi Chris,
> > >
> > > Thank you for your comments!
> > >
> > > I agree that the toString based logging is not ideal, and I believe all
> > > occurrences are within a proper logging context, so they can be
> ignored.
> > > If thread names can be changed unconditionally, I agree, using a new
> MDC
> > > key is the ideal solution.
> > >
> > > Will update the KIP accordingly.
> > >
> > > Thanks,
> > > Daniel
> > >
> > > Chris Egerton  ezt írta (időpont: 2023. jún.
> > 2.,
> > > P, 22:23):
> > >
> > >> Hi Daniel,
> > >>
> > >> Are there any cases of Object::toString being used by internal classes
> > for
> > >> logging context that can't be covered by MDC contexts? For example,
&g

Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-06-05 Thread Dániel Urbán
I updated the KIP accordingly. Tried to come up with more generic terms in
the Connect code instead of referring to "flow" anywhere.
Daniel

Dániel Urbán  ezt írta (időpont: 2023. jún. 5., H,
14:49):

> Hi Chris,
>
> Thank you for your comments!
>
> I agree that the toString based logging is not ideal, and I believe all
> occurrences are within a proper logging context, so they can be ignored.
> If thread names can be changed unconditionally, I agree, using a new MDC
> key is the ideal solution.
>
> Will update the KIP accordingly.
>
> Thanks,
> Daniel
>
> Chris Egerton  ezt írta (időpont: 2023. jún. 2.,
> P, 22:23):
>
>> Hi Daniel,
>>
>> Are there any cases of Object::toString being used by internal classes for
>> logging context that can't be covered by MDC contexts? For example,
>> anything logged by WorkerSinkTask (or any WorkerTask subclass) already has
>> the MDC set for the task [1]. Since the Object::toString-prefixed style of
>> logging is a bit obsolete after the introduction of MDC contexts it'd feel
>> a little strange to continue trying to accommodate it, especially if the
>> changes from this KIP are going to be opt-in regardless.
>>
>> As far as thread names go: unlike log statements, I don't believe changing
>> them requires a KIP, and would be fine with merging a PR for that without
>> worrying about compatibility.
>>
>> With that in mind, I'm still wondering if we can add a separate MDC key
>> for
>> MM2 replication flows (perhaps "mirror.maker.flow") and unconditionally
>> add
>> that to the MDC contexts for every thread that gets spun up by each
>> DistributedHerder instance that MM2 creates. This would be different from
>> the draft PR and KIP, which involve altering the content of the existing
>> "connector.context" MDC key. Users would opt in to the change by altering
>> their Log4j config instead of altering their MM2 config, which matches
>> precedent set with the introduction of the "connector.context" key.
>>
>> Thoughts?
>>
>> [1] -
>>
>> https://github.com/apache/kafka/blob/146a6976aed0d9f90c70b6f21dca8b887cc34e71/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerTask.java#L255
>>
>> Cheers,
>>
>> Chris
>>
>> On Tue, May 23, 2023 at 11:36 AM Dániel Urbán 
>> wrote:
>>
>> > Hi Chris,
>> >
>> > Thank you for your comments!
>> > 1. This approach is OK for me. I thought that the "sample" configs in
>> the
>> > repo do not fall into the same category as the default of a new config.
>> > Adding a commented line instead should be ok, and the future opt-out
>> change
>> > sounds good to me.
>> >
>> > 2. Besides the MDC, there are 2 other places where the flow context
>> > information could be added:
>> > - Some of the Connect internal classes use their .toString methods when
>> > logging (e.g. WorkerSinkTask has a line like this: log.info("{}
>> Executing
>> > sink task", this);). These toString implementations contain the
>> connector
>> > name, and nothing else, so in MM2 dedicated mode, adding the flow would
>> > make these lines unique.
>> > - Connect worker thread names. Currently, they contain the connector
>> > name/task ID, but to make them unique, the flow should be added in MM2
>> > distributed mode.
>> > In my draft PR, I changed both of these, but for the sake of backward
>> > compatibility, I made the new toString/thread name dependent on the new,
>> > suggested configuration flag.
>> > If we go with a new MDC key, but we still want to change the toString
>> > methods and the thread names, we still might need an extra flag to turn
>> on
>> > the new behavior.
>> > AFAIK the toString calls are all made inside a proper logging context,
>> so
>> > changing the toString methods don't add much value. I think that the
>> thread
>> > name changes are useful, giving more context for log lines written
>> outside
>> > of a log context.
>> >
>> > In short, I think that MDC + thread name changes are necessary to make
>> MM2
>> > dedicated logs useful for diagnostics. If we keep both, then maybe
>> having a
>> > single config to control both at the same time is better than having a
>> new
>> > MDC key (configured separately in the log pattern) and a config flag
>> (set
>> > separately in the properties file) for the thread name change.
>> >
>> > Thanks,
>> > Daniel
>> >
>>
>


Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-06-05 Thread Dániel Urbán
Hi Chris,

Thank you for your comments!

I agree that the toString based logging is not ideal, and I believe all
occurrences are within a proper logging context, so they can be ignored.
If thread names can be changed unconditionally, I agree, using a new MDC
key is the ideal solution.

Will update the KIP accordingly.

Thanks,
Daniel

Chris Egerton  ezt írta (időpont: 2023. jún. 2.,
P, 22:23):

> Hi Daniel,
>
> Are there any cases of Object::toString being used by internal classes for
> logging context that can't be covered by MDC contexts? For example,
> anything logged by WorkerSinkTask (or any WorkerTask subclass) already has
> the MDC set for the task [1]. Since the Object::toString-prefixed style of
> logging is a bit obsolete after the introduction of MDC contexts it'd feel
> a little strange to continue trying to accommodate it, especially if the
> changes from this KIP are going to be opt-in regardless.
>
> As far as thread names go: unlike log statements, I don't believe changing
> them requires a KIP, and would be fine with merging a PR for that without
> worrying about compatibility.
>
> With that in mind, I'm still wondering if we can add a separate MDC key for
> MM2 replication flows (perhaps "mirror.maker.flow") and unconditionally add
> that to the MDC contexts for every thread that gets spun up by each
> DistributedHerder instance that MM2 creates. This would be different from
> the draft PR and KIP, which involve altering the content of the existing
> "connector.context" MDC key. Users would opt in to the change by altering
> their Log4j config instead of altering their MM2 config, which matches
> precedent set with the introduction of the "connector.context" key.
>
> Thoughts?
>
> [1] -
>
> https://github.com/apache/kafka/blob/146a6976aed0d9f90c70b6f21dca8b887cc34e71/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerTask.java#L255
>
> Cheers,
>
> Chris
>
> On Tue, May 23, 2023 at 11:36 AM Dániel Urbán 
> wrote:
>
> > Hi Chris,
> >
> > Thank you for your comments!
> > 1. This approach is OK for me. I thought that the "sample" configs in the
> > repo do not fall into the same category as the default of a new config.
> > Adding a commented line instead should be ok, and the future opt-out
> change
> > sounds good to me.
> >
> > 2. Besides the MDC, there are 2 other places where the flow context
> > information could be added:
> > - Some of the Connect internal classes use their .toString methods when
> > logging (e.g. WorkerSinkTask has a line like this: log.info("{}
> Executing
> > sink task", this);). These toString implementations contain the connector
> > name, and nothing else, so in MM2 dedicated mode, adding the flow would
> > make these lines unique.
> > - Connect worker thread names. Currently, they contain the connector
> > name/task ID, but to make them unique, the flow should be added in MM2
> > distributed mode.
> > In my draft PR, I changed both of these, but for the sake of backward
> > compatibility, I made the new toString/thread name dependent on the new,
> > suggested configuration flag.
> > If we go with a new MDC key, but we still want to change the toString
> > methods and the thread names, we still might need an extra flag to turn
> on
> > the new behavior.
> > AFAIK the toString calls are all made inside a proper logging context, so
> > changing the toString methods don't add much value. I think that the
> thread
> > name changes are useful, giving more context for log lines written
> outside
> > of a log context.
> >
> > In short, I think that MDC + thread name changes are necessary to make
> MM2
> > dedicated logs useful for diagnostics. If we keep both, then maybe
> having a
> > single config to control both at the same time is better than having a
> new
> > MDC key (configured separately in the log pattern) and a config flag (set
> > separately in the properties file) for the thread name change.
> >
> > Thanks,
> > Daniel
> >
>


Re: [DISCUSS] KIP-932: Queues for Kafka

2023-06-02 Thread Dániel Urbán
Hi Andrew,
Thank you for the clarification. One follow-up to read_committed mode:
Taking the change in message ordering guarantees into account, does this
mean that in queues, share-group consumers will be able to consume
committed records AFTER the LSO?
Thanks,
Daniel

Andrew Schofield  ezt írta (időpont:
2023. jún. 2., P, 10:39):

> Hi Daniel,
> Thanks for your questions.
>
> 1) Yes, read_committed fetch will still be possible.
>
> 2) You weren’t wrong that this is a broad question :)
>
> Broadly speaking, I can see two ways of managing the in-flight records:
> the share-partition leader does it, or the share-group coordinator does it.
> I want to choose what works best, and I happen to have started with trying
> the share-partition leader doing it. This is just a whiteboard exercise at
> the
> moment, looking at the potential protocol flows and how well it all hangs
> together. When I have something coherent and understandable and worth
> reviewing, I’ll update the KIP with a proposal.
>
> I think it’s probably worth doing a similar exercise for the share-group
> coordinator way too. There are bound to be pros and cons, and I don’t
> really
> mind which way prevails.
>
> If the share-group coordinator does it, I already have experience of
> efficient
> storage of in-flight record state in a way that scales and is
> space-efficient.
> If the share-partition leader does it, storage of in-flight state is a bit
> more
> tricky.
>
> I think it’s worth thinking ahead to how EOS will work and also another
> couple of enhancements (key-based ordering and acquisition lock
> extension) so it’s somewhat future-proof.
>
> Thanks,
> Andrew
>
> > On 1 Jun 2023, at 11:51, Dániel Urbán  wrote:
> >
> > Hi Andrew,
> >
> > Thank you for the KIP, exciting work you are doing :)
> > I have 2 questions:
> > 1. I understand that EOS won't be supported for share-groups (yet), but
> > read_committed fetch will still be possible, correct?
> >
> > 2. I have a very broad question about the proposed solution: why not let
> > the share-group coordinator manage the states of the in-flight records?
> > I'm asking this because it seems to me that using the same pattern as the
> > existing group coordinator would
> > a, solve the durability of the message state storage (same method as the
> > one used by the current group coordinator)
> >
> > b, pave the way for EOS with share-groups (same method as the one used by
> > the current group coordinator)
> >
> > c, allow follower-fetching
> > I saw your point about this: "FFF gives freedom to fetch records from a
> > nearby broker, but it does not also give the ability to commit offsets
> to a
> > nearby broker"
> > But does it matter if message acknowledgement is not "local"? Supposedly,
> > fetching is the actual hard work which benefits from follower fetching,
> not
> > the group related requests.
> >
> > The only problem I see with the share-group coordinator managing the
> > in-flight message state is that the coordinator is not aware of the exact
> > available offsets of a partition, nor how the messages are batched. For
> > this problem, maybe the share group coordinator could use some form of
> > "logical" addresses, such as "the next 2 batches after offset X", or
> "after
> > offset X, skip 2 batches, fetch next 2". Acknowledgements always contain
> > the exact offset, but for the "unknown" sections of a partition, these
> > logical addresses would be used. The coordinator could keep track of
> > message states with a mix of offsets and these batch based addresses. The
> > partition leader could support "skip X, fetch Y batches" fetch requests.
> > This solution would need changes in the Fetch API to allow such batch
> based
> > addresses, but I assume that fetch protocol changes will be needed
> > regardless of the specific solution.
> >
> > Thanks,
> > Daniel
> >
> > Andrew Schofield  ezt írta (időpont: 2023.
> máj.
> > 30., K, 18:15):
> >
> >> Yes, that’s it. I imagine something similar to KIP-848 for managing the
> >> share group
> >> membership, and consumers that fetch records from their assigned
> >> partitions and
> >> acknowledge when delivery completes.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 30 May 2023, at 16:52, Adam Warski  wrote:
> >>>
> >>> Thanks for the explanation!
> >>>
> >>> So effectively, a share group is subscribed to each partition - but the
> &g

Re: [DISCUSS] KIP-932: Queues for Kafka

2023-06-01 Thread Dániel Urbán
Hi Andrew,

Thank you for the KIP, exciting work you are doing :)
I have 2 questions:
1. I understand that EOS won't be supported for share-groups (yet), but
read_committed fetch will still be possible, correct?

2. I have a very broad question about the proposed solution: why not let
the share-group coordinator manage the states of the in-flight records?
I'm asking this because it seems to me that using the same pattern as the
existing group coordinator would
a, solve the durability of the message state storage (same method as the
one used by the current group coordinator)

b, pave the way for EOS with share-groups (same method as the one used by
the current group coordinator)

c, allow follower-fetching
I saw your point about this: "FFF gives freedom to fetch records from a
nearby broker, but it does not also give the ability to commit offsets to a
nearby broker"
But does it matter if message acknowledgement is not "local"? Supposedly,
fetching is the actual hard work which benefits from follower fetching, not
the group related requests.

The only problem I see with the share-group coordinator managing the
in-flight message state is that the coordinator is not aware of the exact
available offsets of a partition, nor how the messages are batched. For
this problem, maybe the share group coordinator could use some form of
"logical" addresses, such as "the next 2 batches after offset X", or "after
offset X, skip 2 batches, fetch next 2". Acknowledgements always contain
the exact offset, but for the "unknown" sections of a partition, these
logical addresses would be used. The coordinator could keep track of
message states with a mix of offsets and these batch based addresses. The
partition leader could support "skip X, fetch Y batches" fetch requests.
This solution would need changes in the Fetch API to allow such batch based
addresses, but I assume that fetch protocol changes will be needed
regardless of the specific solution.

Thanks,
Daniel

Andrew Schofield  ezt írta (időpont: 2023. máj.
30., K, 18:15):

> Yes, that’s it. I imagine something similar to KIP-848 for managing the
> share group
> membership, and consumers that fetch records from their assigned
> partitions and
> acknowledge when delivery completes.
>
> Thanks,
> Andrew
>
> > On 30 May 2023, at 16:52, Adam Warski  wrote:
> >
> > Thanks for the explanation!
> >
> > So effectively, a share group is subscribed to each partition - but the
> data is not pushed to the consumer, but only sent on demand. And when
> demand is signalled, a batch of messages is sent?
> > Hence it would be up to the consumer to prefetch a sufficient number of
> batches to ensure, that it will never be "bored"?
> >
> > Adam
> >
> >> On 30 May 2023, at 15:25, Andrew Schofield 
> wrote:
> >>
> >> Hi Adam,
> >> Thanks for your question.
> >>
> >> With a share group, each fetch is able to grab available records from
> any partition. So, it alleviates
> >> the “head-of-line” blocking problem where a slow consumer gets in the
> way. There’s no actual
> >> stealing from a slow consumer, but it can be overtaken and must
> complete its processing within
> >> the timeout.
> >>
> >> The way I see this working is that when a consumer joins a share group,
> it receives a set of
> >> assigned share-partitions. To start with, every consumer will be
> assigned all partitions. We
> >> can be smarter than that, but I think that’s really a question of
> writing a smarter assignor
> >> just as has occurred over the years with consumer groups.
> >>
> >> Only a small proportion of Kafka workloads are super high throughput.
> Share groups would
> >> struggle with those I’m sure. Share groups do not diminish the value of
> consumer groups
> >> for streaming. They just give another option for situations where a
> different style of
> >> consumption is more appropriate.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 29 May 2023, at 17:18, Adam Warski  wrote:
> >>>
> >>> Hello,
> >>>
> >>> thank you for the proposal! A very interesting read.
> >>>
> >>> I do have one question, though. When you subscribe to a topic using
> consumer groups, it might happen that one consumer has processed all
> messages from its partitions, while another one still has a lot of work to
> do (this might be due to unbalanced partitioning, long processing times
> etc.). In a message-queue approach, it would be great to solve this problem
> - so that a consumer that is free can steal work from other consumers. Is
> this somehow covered by share groups?
> >>>
> >>> Maybe this is planned as "further work", as indicated here:
> >>>
> >>> "
> >>> It manages the topic-partition assignments for the share-group
> members. An initial, trivial implementation would be to give each member
> the list of all topic-partitions which matches its subscriptions and then
> use the pull-based protocol to fetch records from all partitions. A more
> sophisticated implementation could use topic-partition load and lag metrics
> to distribute 

Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-05-23 Thread Dániel Urbán
Hi Chris,

Thank you for your comments!
1. This approach is OK for me. I thought that the "sample" configs in the
repo do not fall into the same category as the default of a new config.
Adding a commented line instead should be ok, and the future opt-out change
sounds good to me.

2. Besides the MDC, there are 2 other places where the flow context
information could be added:
- Some of the Connect internal classes use their .toString methods when
logging (e.g. WorkerSinkTask has a line like this: log.info("{} Executing
sink task", this);). These toString implementations contain the connector
name, and nothing else, so in MM2 dedicated mode, adding the flow would
make these lines unique.
- Connect worker thread names. Currently, they contain the connector
name/task ID, but to make them unique, the flow should be added in MM2
distributed mode.
In my draft PR, I changed both of these, but for the sake of backward
compatibility, I made the new toString/thread name dependent on the new,
suggested configuration flag.
If we go with a new MDC key, but we still want to change the toString
methods and the thread names, we still might need an extra flag to turn on
the new behavior.
AFAIK the toString calls are all made inside a proper logging context, so
changing the toString methods don't add much value. I think that the thread
name changes are useful, giving more context for log lines written outside
of a log context.

In short, I think that MDC + thread name changes are necessary to make MM2
dedicated logs useful for diagnostics. If we keep both, then maybe having a
single config to control both at the same time is better than having a new
MDC key (configured separately in the log pattern) and a config flag (set
separately in the properties file) for the thread name change.

Thanks,
Daniel


Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-05-11 Thread Dániel Urbán
Hello everyone,
I would like to bump this thread, pretty straightforward KIP, and helps a
lot with MM2 dedicated mode diagnostics.
Thanks in advance,
Daniel

Dániel Urbán  ezt írta (időpont: 2023. máj. 4., Cs,
14:08):

> Hi Viktor,
>
> Thank you for your comments. I agree that this change is low-risk - the
> default false feature flag shouldn't cause any problems to existing users.
>
> As for the rejected alternative of adding an attribute to the MDC - some
> of the internal worker classes (such as the different WorkerTask
> subclasses) have a toString implementation which is used in the log
> message. When those toString calls are used in logging, we don't always
> have a logging context with MDC attributes. In my current PR, I changed
> those toString methods in a backward compatible way, so if the feature flag
> is set to false, the method will return the same string as it used to. Even
> if we went with the extra MDC attribute, we would still probably need an
> extra config to make the toString methods backward compatible. Because of
> this, it might be easier to have a single flag to control the full feature.
>
> Daniel
>
> Viktor Somogyi-Vass  ezt írta
> (időpont: 2023. ápr. 21., P, 15:07):
>
>> Hi Daniel,
>>
>> I think this is a useful addition, it helps resolving issues and
>> escalations, and improves overall traceability.
>> Changing the logging context may imply the risk of making certain log
>> parsers unable to work on new logs. As I see we by default disable this
>> feature which solves this problem, however I also think that by disabling
>> it by default it isn't much of a help because users may not know about
>> this
>> configuration and would not benefit from these when they face problems. So
>> overall I'd like to go with default=true and wanted to put this out here
>> for the community to discuss whether it's a problem.
>> Also, what was the reasoning behind rejecting the second alternative? As I
>> see that would be a viable option and maybe a bit more idiomatic to the
>> logging framework.
>>
>> A minor note: please update the JIRA link in the KIP to point to the right
>> one.
>>
>> Best,
>> Viktor
>>
>> On Thu, Apr 13, 2023 at 2:19 PM Dániel Urbán 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I would like to bump this thread. I think this would be very useful for
>> any
>> > MM2 users, as the current logs with certain architectures (e.g. fan-out)
>> > are impossible to decipher.
>> > I already submitted a PR to demonstrate the proposed solution:
>> > https://github.com/apache/kafka/pull/13475
>> >
>> > Thanks for your comments in advance,
>> > Daniel
>> >
>> > Dániel Urbán  ezt írta (időpont: 2023. márc.
>> 30.,
>> > Cs, 18:24):
>> >
>> > > Hello everyone,
>> > >
>> > > I would like to kick off a discussion about KIP-916:
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-916%3A+MM2+distributed+mode+flow+log+context
>> > >
>> > > The KIP aims to enhance the diagnostic information for MM2 distributed
>> > > mode. MM2 relies on multiple Connect worker instances nested in a
>> single
>> > > process. In Connect, Connector names are guaranteed to be unique in a
>> > > single process, but in MM2, this is not true. Because of this, the
>> > > diagnostics provided by Connect (client.ids, log context) do not
>> ensure
>> > > that logs are distinguishable for different flows (Connect workers)
>> > inside
>> > > an MM2 process.
>> > >
>> > > Thanks for all you input in advance,
>> > > Daniel
>> > >
>> >
>>
>


Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-05-04 Thread Dániel Urbán
Hi Viktor,

Thank you for your comments. I agree that this change is low-risk - the
default false feature flag shouldn't cause any problems to existing users.

As for the rejected alternative of adding an attribute to the MDC - some of
the internal worker classes (such as the different WorkerTask subclasses)
have a toString implementation which is used in the log message. When those
toString calls are used in logging, we don't always have a logging context
with MDC attributes. In my current PR, I changed those toString methods in
a backward compatible way, so if the feature flag is set to false, the
method will return the same string as it used to. Even if we went with the
extra MDC attribute, we would still probably need an extra config to make
the toString methods backward compatible. Because of this, it might be
easier to have a single flag to control the full feature.

Daniel

Viktor Somogyi-Vass  ezt írta
(időpont: 2023. ápr. 21., P, 15:07):

> Hi Daniel,
>
> I think this is a useful addition, it helps resolving issues and
> escalations, and improves overall traceability.
> Changing the logging context may imply the risk of making certain log
> parsers unable to work on new logs. As I see we by default disable this
> feature which solves this problem, however I also think that by disabling
> it by default it isn't much of a help because users may not know about this
> configuration and would not benefit from these when they face problems. So
> overall I'd like to go with default=true and wanted to put this out here
> for the community to discuss whether it's a problem.
> Also, what was the reasoning behind rejecting the second alternative? As I
> see that would be a viable option and maybe a bit more idiomatic to the
> logging framework.
>
> A minor note: please update the JIRA link in the KIP to point to the right
> one.
>
> Best,
> Viktor
>
> On Thu, Apr 13, 2023 at 2:19 PM Dániel Urbán 
> wrote:
>
> > Hi everyone,
> >
> > I would like to bump this thread. I think this would be very useful for
> any
> > MM2 users, as the current logs with certain architectures (e.g. fan-out)
> > are impossible to decipher.
> > I already submitted a PR to demonstrate the proposed solution:
> > https://github.com/apache/kafka/pull/13475
> >
> > Thanks for your comments in advance,
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2023. márc. 30.,
> > Cs, 18:24):
> >
> > > Hello everyone,
> > >
> > > I would like to kick off a discussion about KIP-916:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-916%3A+MM2+distributed+mode+flow+log+context
> > >
> > > The KIP aims to enhance the diagnostic information for MM2 distributed
> > > mode. MM2 relies on multiple Connect worker instances nested in a
> single
> > > process. In Connect, Connector names are guaranteed to be unique in a
> > > single process, but in MM2, this is not true. Because of this, the
> > > diagnostics provided by Connect (client.ids, log context) do not ensure
> > > that logs are distinguishable for different flows (Connect workers)
> > inside
> > > an MM2 process.
> > >
> > > Thanks for all you input in advance,
> > > Daniel
> > >
> >
>


Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener

2023-04-21 Thread Dániel Urbán
Thanks for the comments Viktor.
1. My original motivation was IdentityReplicationPolicy based monitoring.
The current MirrorClient implementation cannot list the replica topics of
the target cluster. I think relying on the topic-partition level metrics is
a complex solution. Instead, I would like to make it simple to collect all
the replicated topics of a flow, without relying on the name of the topics.
Then, I simply tried to generalize the approach.
2. Checkpoint metrics are reported per (group, topic, partition), it means
that there is no metric associated with a group. If a filter picks up a
group, but the group doesn't have committed offsets for any of the
replicated partitions, there is no metric to be eagerly registered. This is
a difference between how topic replication and group checkpointing works -
empty topics are still picked up for partition creation and to consume from
them. Groups are only picked up if they have committed offsets already.
3. Not exactly sure what is the added value of listing all
topic-partitions, but that information is available where the filtering
happens. For groups, we don't have anything else besides the group name, so
we cannot really provide more info at that point without significantly
changing the refresh group logic.

Thanks,
Daniel

Viktor Somogyi-Vass  ezt írta
(időpont: 2023. ápr. 21., P, 11:43):

> Hi all,
>
> A couple of comments:
> 1) Regarding the motivation: is the motivation simply monitoring related or
> are there any other reasons to this?
> 2) Can we change monitoring to be identical to filters, so that what is
> actively filtered, we monitor exactly those topics and groups? (So group
> metrics aren't added lazily when a checkpoint is created but when the
> filter is changed.)
> 3) Not sure if we want to widen the scope but since these are interfaces
> I'd use TopicPartition and some kind of GroupDescription classes (not sure
> if the latter exists) instead of Strings. If later on we'll need extra
> properties for these then it can be added on easier.
>
> Best,
> Viktor
>
> On Wed, Apr 19, 2023 at 1:42 PM Dániel Urbán 
> wrote:
>
> > I wouldn't really include a non-existent group (same as we don't care
> about
> > a non-existent topic), that doesn't really matter.
> > I think having an existing group which doesn't have an offset to
> checkpoint
> > is equivalent to a topic having no records to replicate from the
> monitoring
> > perspective.
> >
> > I think the precise way to put it is to monitor the topics and groups
> > picked up by the filtering logic of MM2. "The list currently replicated"
> is
> > not a good definition, as an empty topic would still be interesting for
> > monitoring purposes, even if there is no message to replicate.
> > I think the core motivation is to capture the output of the
> > TopicFilter/GroupFilter + the extra, built-in logic of MM2 related to
> > filtering (e.g. internal topics are never replicated, the heartbeats
> topics
> > are always replicated, and so on). This logic is too complex to reproduce
> > in an external monitoring system, as it would need to use the exact same
> > TopicFilter/GroupFilter configs as MM2 is using, and then implement the
> > additional built-in logic of MM2 to finally get the topics and groups
> > picked up by the replication.
> >
> > I think this would be useful in any replication setups (finding the
> > effective list of filtered topics and groups), but especially useful when
> > using the IdentityReplicationPolicy. One gap related to the identity
> policy
> > is that we cannot find the replica topics of a specific flow, even when
> > using MirrorClient, or having access to the source and target Kafka
> > clusters, as the "traditional" way of finding replica topics is based on
> > topic naming and the ReplicationPolicy.
> >
> > Thanks,
> > Daniel
> >
> > hudeqi <16120...@bjtu.edu.cn> ezt írta (időpont: 2023. ápr. 19., Sze,
> > 10:58):
> >
> > > Thanks for your reply, Daniel.
> > > Regarding the group list, do you mean that if the group of the source
> > > cluster has not committed an offset (the group does not exist or the
> > group
> > > has not committed an offset to the topic being replicated), then the
> > > current metric cannot be collected? Then this involves the question of
> > > motivation: Do we want to monitor the topic list and group list we
> > > configured, or the topic list and group list that are currently being
> > > replicated? If it is the latter, shouldn't it be detected for a group
> > that
> > > has not committed an offset? I don't know if I understand correctly.
> > >
> > > best,
> > > hudeqi
> > >
> > >
> > >  -原始邮件-
> > >  发件人: "Dániel Urbán" 
> > >  发送时间: 2023-04-19 15:50:01 (星期三)
> > >  收件人: dev@kafka.apache.org
> > >  抄送:
> > >  主题: Re: Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener
> > > 
> > > 
> >
>


Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener

2023-04-19 Thread Dániel Urbán
I wouldn't really include a non-existent group (same as we don't care about
a non-existent topic), that doesn't really matter.
I think having an existing group which doesn't have an offset to checkpoint
is equivalent to a topic having no records to replicate from the monitoring
perspective.

I think the precise way to put it is to monitor the topics and groups
picked up by the filtering logic of MM2. "The list currently replicated" is
not a good definition, as an empty topic would still be interesting for
monitoring purposes, even if there is no message to replicate.
I think the core motivation is to capture the output of the
TopicFilter/GroupFilter + the extra, built-in logic of MM2 related to
filtering (e.g. internal topics are never replicated, the heartbeats topics
are always replicated, and so on). This logic is too complex to reproduce
in an external monitoring system, as it would need to use the exact same
TopicFilter/GroupFilter configs as MM2 is using, and then implement the
additional built-in logic of MM2 to finally get the topics and groups
picked up by the replication.

I think this would be useful in any replication setups (finding the
effective list of filtered topics and groups), but especially useful when
using the IdentityReplicationPolicy. One gap related to the identity policy
is that we cannot find the replica topics of a specific flow, even when
using MirrorClient, or having access to the source and target Kafka
clusters, as the "traditional" way of finding replica topics is based on
topic naming and the ReplicationPolicy.

Thanks,
Daniel

hudeqi <16120...@bjtu.edu.cn> ezt írta (időpont: 2023. ápr. 19., Sze,
10:58):

> Thanks for your reply, Daniel.
> Regarding the group list, do you mean that if the group of the source
> cluster has not committed an offset (the group does not exist or the group
> has not committed an offset to the topic being replicated), then the
> current metric cannot be collected? Then this involves the question of
> motivation: Do we want to monitor the topic list and group list we
> configured, or the topic list and group list that are currently being
> replicated? If it is the latter, shouldn't it be detected for a group that
> has not committed an offset? I don't know if I understand correctly.
>
> best,
> hudeqi
>
>
>  -原始邮件-
>  发件人: "Dániel Urbán" 
>  发送时间: 2023-04-19 15:50:01 (星期三)
>  收件人: dev@kafka.apache.org
>  抄送:
>  主题: Re: Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener
> 
> 


Re: Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener

2023-04-19 Thread Dániel Urbán
Hi hudeqi,

Thank you for your comments!
Related to the topic list: you are correct, the partition metrics are
created eagerly, so even if a topic has no active traffic, the metrics are
visible. I missed this fact when creating the KIP.
For the group list: the group metrics are created lazily, so for the
groups, the metrics are only created if there was a checkpoint replicated
(i.e. the group is included in the replication AND has committed offset for
at least one of the replicated partitions).

Regardless of the eager/lazy nature of these metrics, I think that having a
centralized way of getting the list of replicated topics/groups still makes
sense, and it can simplify access to this information for external clients.
E.g. We could add a listener implementation which writes the list of
topics/groups into a compact topic, which then can be consumed by external
clients, which is simpler than scraping partition metrics and collecting
the topic tags.

I will add corrections to the Rejected Alternatives section with more
details about the metrics.

Thanks,
Daniel

hudeqi <16120...@bjtu.edu.cn> ezt írta (időpont: 2023. ápr. 19., Sze, 4:22):

> Hi, I have some questions about motivation.
> The problem to be solved by this kip is that it cannot accurately monitor
> the topic and group currently being replicated? From my point of view,
> using MirrorSourceMetrics.recordRate can monitor the configured topic list
> that is currently being replicated (even if the topic has no data), and
> using MirrorCheckpointMetrics.CHECKPOINT_LATENCY can monitor the currently
> replicated group list (if it is wrong, please correct me).
>
> best,
> hudeqi
>
> Dániel Urbán urb.dani...@gmail.com写道:
> > Hello everyone,
> >
> > I would like to bump this KIP. Please consider reviewing it, as it would
> > improve the monitoring capabilities around MM2.
> > I also submitted a PR (https://github.com/apache/kafka/pull/13595) to
> > demonstrate the current state of the KIP.
> >
> > Thanks in advance,
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2023. ápr. 13.,
> Cs,
> > 15:02):
> >
> > > Hi everyone,
> > >
> > > I would like to start a discussion on KIP-918: MM2 Topic And Group
> > > Listener (
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-918%3A+MM2+Topic+And+Group+Listener
> > > ).
> > > This new feature of MM2 would allow following the latest set of
> replicated
> > > topics and groups, which is currently not possible in MM2.
> Additionally,
> > > this would help IdentityReplicationPolicy users, as they could use
> this new
> > > feature to track the replicated topics (which is not available through
> the
> > > policy due to topics not being renamed during replication).
> > >
> > > Thanks in advance,
> > > Daniel
> > >
>


Re: [DISCUSS] KIP-918: MM2 Topic And Group Listener

2023-04-18 Thread Dániel Urbán
Hello everyone,

I would like to bump this KIP. Please consider reviewing it, as it would
improve the monitoring capabilities around MM2.
I also submitted a PR (https://github.com/apache/kafka/pull/13595) to
demonstrate the current state of the KIP.

Thanks in advance,
Daniel

Dániel Urbán  ezt írta (időpont: 2023. ápr. 13., Cs,
15:02):

> Hi everyone,
>
> I would like to start a discussion on KIP-918: MM2 Topic And Group
> Listener (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-918%3A+MM2+Topic+And+Group+Listener
> ).
> This new feature of MM2 would allow following the latest set of replicated
> topics and groups, which is currently not possible in MM2. Additionally,
> this would help IdentityReplicationPolicy users, as they could use this new
> feature to track the replicated topics (which is not available through the
> policy due to topics not being renamed during replication).
>
> Thanks in advance,
> Daniel
>


[DISCUSS] KIP-918: MM2 Topic And Group Listener

2023-04-13 Thread Dániel Urbán
Hi everyone,

I would like to start a discussion on KIP-918: MM2 Topic And Group Listener
(
https://cwiki.apache.org/confluence/display/KAFKA/KIP-918%3A+MM2+Topic+And+Group+Listener
).
This new feature of MM2 would allow following the latest set of replicated
topics and groups, which is currently not possible in MM2. Additionally,
this would help IdentityReplicationPolicy users, as they could use this new
feature to track the replicated topics (which is not available through the
policy due to topics not being renamed during replication).

Thanks in advance,
Daniel


Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-04-13 Thread Dániel Urbán
Hi everyone,

I would like to bump this thread. I think this would be very useful for any
MM2 users, as the current logs with certain architectures (e.g. fan-out)
are impossible to decipher.
I already submitted a PR to demonstrate the proposed solution:
https://github.com/apache/kafka/pull/13475

Thanks for your comments in advance,
Daniel

Dániel Urbán  ezt írta (időpont: 2023. márc. 30.,
Cs, 18:24):

> Hello everyone,
>
> I would like to kick off a discussion about KIP-916:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-916%3A+MM2+distributed+mode+flow+log+context
>
> The KIP aims to enhance the diagnostic information for MM2 distributed
> mode. MM2 relies on multiple Connect worker instances nested in a single
> process. In Connect, Connector names are guaranteed to be unique in a
> single process, but in MM2, this is not true. Because of this, the
> diagnostics provided by Connect (client.ids, log context) do not ensure
> that logs are distinguishable for different flows (Connect workers) inside
> an MM2 process.
>
> Thanks for all you input in advance,
> Daniel
>


[DISCUSS] KIP-916: MM2 distributed mode flow log context

2023-03-30 Thread Dániel Urbán
Hello everyone,

I would like to kick off a discussion about KIP-916:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-916%3A+MM2+distributed+mode+flow+log+context

The KIP aims to enhance the diagnostic information for MM2 distributed
mode. MM2 relies on multiple Connect worker instances nested in a single
process. In Connect, Connector names are guaranteed to be unique in a
single process, but in MM2, this is not true. Because of this, the
diagnostics provided by Connect (client.ids, log context) do not ensure
that logs are distinguishable for different flows (Connect workers) inside
an MM2 process.

Thanks for all you input in advance,
Daniel


Re: [VOTE] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2023-01-11 Thread Dániel Urbán
As for the vote, thanks for participating, the vote passed with 3 binding
and 1 non-binding votes.
Moving the KIP forward.
Daniel


Dániel Urbán  ezt írta (időpont: 2023. jan. 11.,
Sze, 13:28):

> Hi Mickael,
> Yes, after this KIP is implemented, the nested Connect workers inside MM2
> will behave the same way as vanilla Connect, and EOS will be supported.
> I would like to ask Chris to confirm, as I'm not too familiar with the
> details of KIP-618, but I really don't see any issues after KIP-710 passes.
> Thanks,
> Daniel
>
> Mickael Maison  ezt írta (időpont: 2023. jan.
> 10., K, 11:53):
>
>> Hi Daniel,
>>
>> Can you confirm that, following this KIP, MM in dedicated mode will be
>> able to run with exactly once enabled?
>> (Once the PR [0] to add KIP-618 support to MM is merged)
>>
>> 0: https://github.com/apache/kafka/pull/12366
>>
>> Thanks,
>> Mickael
>>
>> On Tue, Jan 10, 2023 at 11:36 AM Viktor Somogyi-Vass
>>  wrote:
>> >
>> > Ok, then +1 (binding) :)
>> >
>> > On Mon, Jan 9, 2023 at 3:44 PM John Roesler 
>> wrote:
>> >
>> > > Yes, you are!
>> > >
>> > > Congrats again :)
>> > > -John
>> > >
>> > > On Mon, Jan 9, 2023, at 08:25, Viktor Somogyi-Vass wrote:
>> > > > Hey all,
>> > > >
>> > > > Now that I'm a committer am I allowed to change my non-binding vote
>> to
>> > > > binding to pass the KIP? :)
>> > > >
>> > > > On Thu, Nov 10, 2022 at 6:13 PM Greg Harris
>> > > > >
>> > > > wrote:
>> > > >
>> > > >> +1 (non-binding)
>> > > >>
>> > > >> Thanks for the KIP, this is an important improvement.
>> > > >>
>> > > >> Greg
>> > > >>
>> > > >> On Thu, Nov 10, 2022 at 7:21 AM John Roesler 
>> > > wrote:
>> > > >>
>> > > >> > Thanks for the KIP, Daniel!
>> > > >> >
>> > > >> > I'm no MM expert, but I've read over the KIP and discussion, and
>> it
>> > > seems
>> > > >> > reasonable to me.
>> > > >> >
>> > > >> > I'm +1 (binding).
>> > > >> >
>> > > >> > Thanks,
>> > > >> > -John
>> > > >> >
>> > > >> > On 2022/10/22 07:38:38 Urbán Dániel wrote:
>> > > >> > > Hi everyone,
>> > > >> > >
>> > > >> > > I would like to start a vote on KIP-710 which aims to support
>> > > running a
>> > > >> > > dedicated MM2 cluster in distributed mode:
>> > > >> > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters
>> > > >> > >
>> > > >> > > Regards,
>> > > >> > > Daniel
>> > > >> > >
>> > > >> > >
>> > > >> > > --
>> > > >> > > Ezt az e-mailt átvizsgálta az Avast AntiVirus szoftver.
>> > > >> > > www.avast.com
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>>
>


Re: [VOTE] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2023-01-11 Thread Dániel Urbán
Hi Mickael,
Yes, after this KIP is implemented, the nested Connect workers inside MM2
will behave the same way as vanilla Connect, and EOS will be supported.
I would like to ask Chris to confirm, as I'm not too familiar with the
details of KIP-618, but I really don't see any issues after KIP-710 passes.
Thanks,
Daniel

Mickael Maison  ezt írta (időpont: 2023. jan.
10., K, 11:53):

> Hi Daniel,
>
> Can you confirm that, following this KIP, MM in dedicated mode will be
> able to run with exactly once enabled?
> (Once the PR [0] to add KIP-618 support to MM is merged)
>
> 0: https://github.com/apache/kafka/pull/12366
>
> Thanks,
> Mickael
>
> On Tue, Jan 10, 2023 at 11:36 AM Viktor Somogyi-Vass
>  wrote:
> >
> > Ok, then +1 (binding) :)
> >
> > On Mon, Jan 9, 2023 at 3:44 PM John Roesler  wrote:
> >
> > > Yes, you are!
> > >
> > > Congrats again :)
> > > -John
> > >
> > > On Mon, Jan 9, 2023, at 08:25, Viktor Somogyi-Vass wrote:
> > > > Hey all,
> > > >
> > > > Now that I'm a committer am I allowed to change my non-binding vote
> to
> > > > binding to pass the KIP? :)
> > > >
> > > > On Thu, Nov 10, 2022 at 6:13 PM Greg Harris
>  > > >
> > > > wrote:
> > > >
> > > >> +1 (non-binding)
> > > >>
> > > >> Thanks for the KIP, this is an important improvement.
> > > >>
> > > >> Greg
> > > >>
> > > >> On Thu, Nov 10, 2022 at 7:21 AM John Roesler 
> > > wrote:
> > > >>
> > > >> > Thanks for the KIP, Daniel!
> > > >> >
> > > >> > I'm no MM expert, but I've read over the KIP and discussion, and
> it
> > > seems
> > > >> > reasonable to me.
> > > >> >
> > > >> > I'm +1 (binding).
> > > >> >
> > > >> > Thanks,
> > > >> > -John
> > > >> >
> > > >> > On 2022/10/22 07:38:38 Urbán Dániel wrote:
> > > >> > > Hi everyone,
> > > >> > >
> > > >> > > I would like to start a vote on KIP-710 which aims to support
> > > running a
> > > >> > > dedicated MM2 cluster in distributed mode:
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters
> > > >> > >
> > > >> > > Regards,
> > > >> > > Daniel
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Ezt az e-mailt átvizsgálta az Avast AntiVirus szoftver.
> > > >> > > www.avast.com
> > > >> > >
> > > >> >
> > > >>
> > >
>


Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2023-01-11 Thread Dániel Urbán
Hi Mickael,
Thanks for the input, I updated the KIP with the config name you suggested.
Daniel

Mickael Maison  ezt írta (időpont: 2022. nov. 7.,
H, 10:48):

> Hi Daniel,
>
> Thanks for the KIP.
>
> It would be nice to expose more of the REST API as some endpoints are
> really useful, for example /admin or
> /connectors//tasks-config. However as dedicated mode is
> currently unusable, I think the approach of "just fixing it" by
> exposing the internal endpoints is fine. It also does not seem to
> corner us too much if we decide to make further changes in the future.
>
> One suggestion I have is to avoid using "mm" in the configuration
> name. Could we rename mm.enable.internal.rest to
> dedicated.mode.enable.internal.rest or something like that?
>
> Thanks,
> Mickael
>
> On Tue, Sep 27, 2022 at 3:56 PM Chris Egerton 
> wrote:
> >
> > Thanks Daniel! No further comments from me, looks good.
> >
> > On Tue, Sep 27, 2022 at 4:39 AM Dániel Urbán 
> wrote:
> >
> > > Hi Chris,
> > >
> > > I understand your points, and I agree that this path is safer in terms
> of
> > > future plans, I accept it.
> > > Updated the KIP accordingly.
> > >
> > > Thanks,
> > > Daniel
> > >
> > > Chris Egerton  ezt írta (időpont: 2022.
> szept.
> > > 21., Sze, 18:54):
> > >
> > > > Hi Daniel,
> > > >
> > > > I'm a little hesitant to add support for REST extensions and the
> admin
> > > API
> > > > to dedicated MM2 nodes because they may restrict our options in the
> > > future
> > > > if/when we add a public-facing REST API.
> > > >
> > > > Regarding REST extensions specifically, I understand their security
> value
> > > > for public-facing APIs, but for the internal API--which should only
> ever
> > > be
> > > > targeted by MM2 nodes, and never by humans, UIs, CLIs, etc.--I'm not
> sure
> > > > there's enough need there to add that API this time around. The
> internal
> > > > endpoints used by Kafka Connect should be secured by the session key
> > > > mechanism introduced in KIP-507 [1], and IIUC, with this KIP, users
> will
> > > > also be able to configure their cluster to use mTLS.
> > > >
> > > > Regarding the admin API, I consider it part of the public-facing
> REST API
> > > > for Kafka Connect, so I was assuming we wouldn't want to add it to
> this
> > > KIP
> > > > since we're intentionally slimming down the REST API to just the bare
> > > > essentials (i.e., just the internal endpoints). I can see value for
> it,
> > > but
> > > > it's similar to the status endpoints in the Kafka Connect REST API;
> we
> > > > might choose to expose this sometime down the line, but IMO it'd be
> > > better
> > > > to do that in a separate KIP so that we can iron out the details of
> > > exactly
> > > > what kind of REST API would best suit dedicated MM2 clusters, and how
> > > that
> > > > would differ from the API provided by vanilla Kafka Connect.
> > > >
> > > > Let me know what you think!
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > [1] -
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-507%3A+Securing+Internal+Connect+REST+Endpoints
> > > >
> > > > On Wed, Sep 21, 2022 at 4:59 AM Dániel Urbán 
> > > > wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > > About the worker id: makes sense. I changed the wording, but kept
> it
> > > > listed
> > > > > as it is a change compared to existing MM2 code. Updated the KIP
> > > > > accordingly.
> > > > >
> > > > > About the REST server configurations: again, I agree, it should be
> > > > > generalized. But I'm not sure about those exceptions you listed,
> as all
> > > > of
> > > > > them make sense in MM2 as well. For example, security related rest
> > > > > extensions could work with MM2 as well. Admin listeners are also
> > > useful,
> > > > as
> > > > > they would allow the same admin operations for MM2 nodes. Am I
> mistaken
> > > > > here? Updated the KIP without mentioning exceptions for now.
> > > > >
> > > >

Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-09-27 Thread Dániel Urbán
Hi Chris,

I understand your points, and I agree that this path is safer in terms of
future plans, I accept it.
Updated the KIP accordingly.

Thanks,
Daniel

Chris Egerton  ezt írta (időpont: 2022. szept.
21., Sze, 18:54):

> Hi Daniel,
>
> I'm a little hesitant to add support for REST extensions and the admin API
> to dedicated MM2 nodes because they may restrict our options in the future
> if/when we add a public-facing REST API.
>
> Regarding REST extensions specifically, I understand their security value
> for public-facing APIs, but for the internal API--which should only ever be
> targeted by MM2 nodes, and never by humans, UIs, CLIs, etc.--I'm not sure
> there's enough need there to add that API this time around. The internal
> endpoints used by Kafka Connect should be secured by the session key
> mechanism introduced in KIP-507 [1], and IIUC, with this KIP, users will
> also be able to configure their cluster to use mTLS.
>
> Regarding the admin API, I consider it part of the public-facing REST API
> for Kafka Connect, so I was assuming we wouldn't want to add it to this KIP
> since we're intentionally slimming down the REST API to just the bare
> essentials (i.e., just the internal endpoints). I can see value for it, but
> it's similar to the status endpoints in the Kafka Connect REST API; we
> might choose to expose this sometime down the line, but IMO it'd be better
> to do that in a separate KIP so that we can iron out the details of exactly
> what kind of REST API would best suit dedicated MM2 clusters, and how that
> would differ from the API provided by vanilla Kafka Connect.
>
> Let me know what you think!
>
> Cheers,
>
> Chris
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-507%3A+Securing+Internal+Connect+REST+Endpoints
>
> On Wed, Sep 21, 2022 at 4:59 AM Dániel Urbán 
> wrote:
>
> > Hi Chris,
> >
> > About the worker id: makes sense. I changed the wording, but kept it
> listed
> > as it is a change compared to existing MM2 code. Updated the KIP
> > accordingly.
> >
> > About the REST server configurations: again, I agree, it should be
> > generalized. But I'm not sure about those exceptions you listed, as all
> of
> > them make sense in MM2 as well. For example, security related rest
> > extensions could work with MM2 as well. Admin listeners are also useful,
> as
> > they would allow the same admin operations for MM2 nodes. Am I mistaken
> > here? Updated the KIP without mentioning exceptions for now.
> >
> > I think yes, the lazy config resolution should be unconditional. Even if
> we
> > don't consider the distributed mode of MM2, the eager resolution does not
> > allow using sensitive configs in the mm2 properties, as they will be
> > written by value into the config topic. I didn't really consider this as
> > breaking change, but I might be wrong.
> >
> > Enable flag property name: also makes sense, updated the KIP.
> >
> > Thanks a lot!
> > Daniel
> >
> > Chris Egerton  ezt írta (időpont: 2022. szept.
> > 20., K, 20:29):
> >
> > > Hi Daniel,
> > >
> > > Looking pretty good! Regarding the worker ID generation--apologies, I
> was
> > > unclear with my question. I was wondering if we had to adopt any
> special
> > > logic at all for MM2, or if we could use the same logic that vanilla
> > Kafka
> > > Connect does in distributed mode, where the worker ID is its advertised
> > URL
> > > (e.g., "connect:8083" or "localhost:25565"). Unless I'm mistaken, this
> > > should allow MM2 nodes to be identified unambiguously. Do you think it
> > > makes sense to follow this strategy?
> > >
> > > Now that the details on the new REST interface have been fleshed out,
> I'm
> > > also wondering if we want to add support for the "
> > > rest.advertised.host.name",
> > > "rest.advertised.port", and "rest.advertised.listener" properties,
> which
> > > are vital for being able to run Kafka Connect in distributed mode from
> > > within containers. In fact, I'm wondering if we can generalize some of
> > the
> > > specification in the KIP around which REST properties are accepted by
> > > stating that all REST-related properties that are available with
> vanilla
> > > Kafka Connect will be supported for dedicated MM2 nodes when
> > > "mm.enable.rest" is set to "true", except for ones related to the
> > > public-facing REST API like "rest.extension.classes",
> "admin.listeners",
> >

Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-09-21 Thread Dániel Urbán
Hi Chris,

About the worker id: makes sense. I changed the wording, but kept it listed
as it is a change compared to existing MM2 code. Updated the KIP
accordingly.

About the REST server configurations: again, I agree, it should be
generalized. But I'm not sure about those exceptions you listed, as all of
them make sense in MM2 as well. For example, security related rest
extensions could work with MM2 as well. Admin listeners are also useful, as
they would allow the same admin operations for MM2 nodes. Am I mistaken
here? Updated the KIP without mentioning exceptions for now.

I think yes, the lazy config resolution should be unconditional. Even if we
don't consider the distributed mode of MM2, the eager resolution does not
allow using sensitive configs in the mm2 properties, as they will be
written by value into the config topic. I didn't really consider this as
breaking change, but I might be wrong.

Enable flag property name: also makes sense, updated the KIP.

Thanks a lot!
Daniel

Chris Egerton  ezt írta (időpont: 2022. szept.
20., K, 20:29):

> Hi Daniel,
>
> Looking pretty good! Regarding the worker ID generation--apologies, I was
> unclear with my question. I was wondering if we had to adopt any special
> logic at all for MM2, or if we could use the same logic that vanilla Kafka
> Connect does in distributed mode, where the worker ID is its advertised URL
> (e.g., "connect:8083" or "localhost:25565"). Unless I'm mistaken, this
> should allow MM2 nodes to be identified unambiguously. Do you think it
> makes sense to follow this strategy?
>
> Now that the details on the new REST interface have been fleshed out, I'm
> also wondering if we want to add support for the "
> rest.advertised.host.name",
> "rest.advertised.port", and "rest.advertised.listener" properties, which
> are vital for being able to run Kafka Connect in distributed mode from
> within containers. In fact, I'm wondering if we can generalize some of the
> specification in the KIP around which REST properties are accepted by
> stating that all REST-related properties that are available with vanilla
> Kafka Connect will be supported for dedicated MM2 nodes when
> "mm.enable.rest" is set to "true", except for ones related to the
> public-facing REST API like "rest.extension.classes", "admin.listeners",
> and "admin.listeners.https.*".
>
> One other thought--is the lazy evaluation of config provider references
> going to take place unconditionally, or only when the internal REST API is
> enabled on a worker?
>
> Finally, do you think we could change "mm.enable.rest" to
> "mm.enable.internal.rest"? That way, if we want to introduce a
> public-facing REST API later on, we can use "mm.enable.rest" as the name
> for that property.
>
> Cheers,
>
> Chris
>
> On Fri, Sep 16, 2022 at 9:28 AM Dániel Urbán 
> wrote:
>
> > Hi Chris,
> >
> > I went through the KIP and updated it based on our discussion. I think
> your
> > suggestions simplified (and shortened) the KIP significantly.
> >
> > Thanks,
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2022.
> szept.
> > 16., P, 15:15):
> >
> > > Hi Chris,
> > >
> > > 1. For the REST-server-per-flow setup, it made sense to introduce some
> > > simplified configuration. With a single REST server, it doesn't make
> > sense
> > > anymore, I'm removing it from the KIP.
> > > 2. I think that changing the worker ID generation still makes sense,
> > > otherwise there is no way to differentiate between the MM2 instances.
> > >
> > > Thanks,
> > > Daniel
> > >
> > > On Wed, Aug 31, 2022 at 8:39 PM Chris Egerton  >
> > > wrote:
> > >
> > > > Hi Daniel,
> > > >
> > > > I've taken a look at the KIP in detail. Here are my complete thoughts
> > > > (minus the aforementioned sections that may be affected by changes to
> > an
> > > > internal-only REST API):
> > > >
> > > > 1. Why introduce new mm.host.name and mm.rest.protocol properties
> > > instead
> > > > of using the properties that are already used by Kafka Connect:
> > > listeners,
> > > > rest.advertised.host.name, rest.advertised.host.port, and
> > > > rest.advertised.listener? We used to have the rest.host.name and
> > > rest.port
> > > > properties in Connect but deprecated and eventually removed them in
> > favor
> > > > of the listeners property in KIP-208 [1]; I'm hoping we can keep
> things
> > > as
> &g

Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-09-16 Thread Dániel Urbán
Hi Chris,

I went through the KIP and updated it based on our discussion. I think your
suggestions simplified (and shortened) the KIP significantly.

Thanks,
Daniel

Dániel Urbán  ezt írta (időpont: 2022. szept.
16., P, 15:15):

> Hi Chris,
>
> 1. For the REST-server-per-flow setup, it made sense to introduce some
> simplified configuration. With a single REST server, it doesn't make sense
> anymore, I'm removing it from the KIP.
> 2. I think that changing the worker ID generation still makes sense,
> otherwise there is no way to differentiate between the MM2 instances.
>
> Thanks,
> Daniel
>
> On Wed, Aug 31, 2022 at 8:39 PM Chris Egerton 
> wrote:
>
> > Hi Daniel,
> >
> > I've taken a look at the KIP in detail. Here are my complete thoughts
> > (minus the aforementioned sections that may be affected by changes to an
> > internal-only REST API):
> >
> > 1. Why introduce new mm.host.name and mm.rest.protocol properties
> instead
> > of using the properties that are already used by Kafka Connect:
> listeners,
> > rest.advertised.host.name, rest.advertised.host.port, and
> > rest.advertised.listener? We used to have the rest.host.name and
> rest.port
> > properties in Connect but deprecated and eventually removed them in favor
> > of the listeners property in KIP-208 [1]; I'm hoping we can keep things
> as
> > similar as possible between MM2 and Connect in order to make it easier
> for
> > users to work with both. I'm also hoping that we can allow users to
> > configure the port that their MM2 nodes listen on instead of hardcoding
> MM2
> > to bind to port 0.
> >
> > 2. Do we still need to change the worker IDs that get used in the status
> > topic?
> >
> > Everything else looks good, or should change once the KIP is updated with
> > the internal-only REST API alternative.
> >
> > Cheers,
> >
> > Chris
> >
> > [1] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface
> >
> > On Mon, Aug 29, 2022 at 1:55 PM Chris Egerton  wrote:
> >
> > > Hi Daniel,
> > >
> > > Yeah, I think that's the way to go. Adding multiple servers for each
> > > herder seems like it'd be too much of a pain for users to configure,
> and
> > if
> > > we keep the API strictly internal for now, we shouldn't be painting
> > > ourselves into too much of a corner if/when we decide to expose a
> > > public-facing REST API for dedicated MM2 clusters.
> > >
> > > I plan to take a look at the rest of the KIP and provide a complete
> > review
> > > sometime this week; I'll hold off on commenting on anything that seems
> > like
> > > it'll be affected by switching to an internal-only REST API until those
> > > changes are published, but should be able to review everything else.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Aug 29, 2022 at 6:57 AM Dániel Urbán 
> > > wrote:
> > >
> > >> Hi Chris,
> > >>
> > >> I understand your point, sounds good to me.
> > >> So in short, we should opt for an internal-only API, and preferably a
> > >> single server solution. Is that right?
> > >>
> > >> Thanks
> > >> Daniel
> > >>
> > >> Chris Egerton  ezt írta (időpont: 2022. aug.
> > >> 26.,
> > >> P, 17:36):
> > >>
> > >> > Hi Daniel,
> > >> >
> > >> > Glad to hear from you!
> > >> >
> > >> > With regards to the stripped-down REST API alternative, I don't see
> > how
> > >> > this would prevent us from introducing the fully-fledged Connect
> REST
> > >> API,
> > >> > or even an augmented variant of it, at some point down the road. If
> we
> > >> go
> > >> > with the internal-only API now, and want to expand later, can't we
> > gate
> > >> the
> > >> > expansion behind a feature flag configuration property that by
> default
> > >> > disables the new feature?
> > >> >
> > >> > I'm also not sure that we'd ever want to expose the raw Connect REST
> > API
> > >> > for dedicated MM2 clusters. If people want that capability, they can
> > >> > already spin up a vanilla Connect cluster and run as many MM2
> > >> connectors as
> > >> > they'd like on it, and as of KIP-458 [1], it's even

Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-09-16 Thread Dániel Urbán
Hi Chris,

1. For the REST-server-per-flow setup, it made sense to introduce some
simplified configuration. With a single REST server, it doesn't make sense
anymore, I'm removing it from the KIP.
2. I think that changing the worker ID generation still makes sense,
otherwise there is no way to differentiate between the MM2 instances.

Thanks,
Daniel

On Wed, Aug 31, 2022 at 8:39 PM Chris Egerton 
wrote:

> Hi Daniel,
>
> I've taken a look at the KIP in detail. Here are my complete thoughts
> (minus the aforementioned sections that may be affected by changes to an
> internal-only REST API):
>
> 1. Why introduce new mm.host.name and mm.rest.protocol properties instead
> of using the properties that are already used by Kafka Connect: listeners,
> rest.advertised.host.name, rest.advertised.host.port, and
> rest.advertised.listener? We used to have the rest.host.name and rest.port
> properties in Connect but deprecated and eventually removed them in favor
> of the listeners property in KIP-208 [1]; I'm hoping we can keep things as
> similar as possible between MM2 and Connect in order to make it easier for
> users to work with both. I'm also hoping that we can allow users to
> configure the port that their MM2 nodes listen on instead of hardcoding MM2
> to bind to port 0.
>
> 2. Do we still need to change the worker IDs that get used in the status
> topic?
>
> Everything else looks good, or should change once the KIP is updated with
> the internal-only REST API alternative.
>
> Cheers,
>
> Chris
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface
>
> On Mon, Aug 29, 2022 at 1:55 PM Chris Egerton  wrote:
>
> > Hi Daniel,
> >
> > Yeah, I think that's the way to go. Adding multiple servers for each
> > herder seems like it'd be too much of a pain for users to configure, and
> if
> > we keep the API strictly internal for now, we shouldn't be painting
> > ourselves into too much of a corner if/when we decide to expose a
> > public-facing REST API for dedicated MM2 clusters.
> >
> > I plan to take a look at the rest of the KIP and provide a complete
> review
> > sometime this week; I'll hold off on commenting on anything that seems
> like
> > it'll be affected by switching to an internal-only REST API until those
> > changes are published, but should be able to review everything else.
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Aug 29, 2022 at 6:57 AM Dániel Urbán 
> > wrote:
> >
> >> Hi Chris,
> >>
> >> I understand your point, sounds good to me.
> >> So in short, we should opt for an internal-only API, and preferably a
> >> single server solution. Is that right?
> >>
> >> Thanks
> >> Daniel
> >>
> >> Chris Egerton  ezt írta (időpont: 2022. aug.
> >> 26.,
> >> P, 17:36):
> >>
> >> > Hi Daniel,
> >> >
> >> > Glad to hear from you!
> >> >
> >> > With regards to the stripped-down REST API alternative, I don't see
> how
> >> > this would prevent us from introducing the fully-fledged Connect REST
> >> API,
> >> > or even an augmented variant of it, at some point down the road. If we
> >> go
> >> > with the internal-only API now, and want to expand later, can't we
> gate
> >> the
> >> > expansion behind a feature flag configuration property that by default
> >> > disables the new feature?
> >> >
> >> > I'm also not sure that we'd ever want to expose the raw Connect REST
> API
> >> > for dedicated MM2 clusters. If people want that capability, they can
> >> > already spin up a vanilla Connect cluster and run as many MM2
> >> connectors as
> >> > they'd like on it, and as of KIP-458 [1], it's even possible to use a
> >> > single Connect cluster to replicate between any two Kafka clusters
> >> instead
> >> > of only targeting the Kafka cluster that the vanilla Connect cluster
> >> > operates on top of. I do agree that it'd be great to be able to
> >> dynamically
> >> > adjust things like topic filters without having to restart a dedicated
> >> MM2
> >> > node; I'm just not sure that the vanilla Connect REST API is the
> >> > appropriate way to do that, especially since the exact mechanisms that
> >> make
> >> > a single Connect cluster viable for replicating across any two Kafka
> >> > clusters could be abused and cause a dedicated MM2 cluster to start
> >>

Re: Transactions, delivery timeout and changing transactional producer behavior

2022-09-09 Thread Dániel Urbán
Hi all,

I would like to bump this and bring some attention to the issue.
This is a nasty bug in the transactional producer, would be nice if I could
get some feedback on the PR: https://github.com/apache/kafka/pull/12392

Thanks in advance,
Daniel

Viktor Somogyi-Vass  ezt írta
(időpont: 2022. júl. 25., H, 15:28):

> Hi Luke & Artem,
>
> We prepared the fix, would you please help in getting a committer-reviewer
> to get this issue resolved?
>
> Thanks,
> Viktor
>
> On Fri, Jul 8, 2022 at 12:57 PM Dániel Urbán 
> wrote:
>
> > Submitted a PR with the fix: https://github.com/apache/kafka/pull/12392
> > In the PR I tried keeping the producer in a usable state after the forced
> > bump. I understand that it might be the cleanest solution, but the only
> > other option I know of is to transition into a fatal state, meaning that
> > the producer has to be recreated after a delivery timeout. I think that
> is
> > still fine compared to the out-of-order messages.
> >
> > Looking forward to your reviews,
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2022. júl. 7.,
> Cs,
> > 12:04):
> >
> > > Thanks for the feedback, I created
> > > https://issues.apache.org/jira/browse/KAFKA-14053 and started working
> on
> > > a PR.
> > >
> > > Luke, for the workaround, we used the transaction admin tool released
> in
> > > 3.0 to "abort" these hanging batches manually.
> > > Naturally, the cluster health should be stabilized. This issue popped
> up
> > > most frequently around times when some partitions went into a few
> minute
> > > window of unavailability. The infinite retries on the producer side
> > caused
> > > a situation where the last retry was still in-flight, but the delivery
> > > timeout was triggered on the client side. We reduced the retries and
> > > increased the delivery timeout to avoid such situations.
> > > Still, the issue can occur in other scenarios, for example a client
> > > queueing up many batches in the producer buffer, and causing those
> > batches
> > > to spend most of the delivery timeout window in the client memory.
> > >
> > > Thanks,
> > > Daniel
> > >
> > > Luke Chen  ezt írta (időpont: 2022. júl. 7., Cs,
> > 5:13):
> > >
> > >> Hi Daniel,
> > >>
> > >> Thanks for reporting the issue, and the investigation.
> > >> I'm curious, so, what's your workaround for this issue?
> > >>
> > >> I agree with Artem, it makes sense. Please file a bug in JIRA.
> > >> And looking forward to your PR! :)
> > >>
> > >> Thank you.
> > >> Luke
> > >>
> > >> On Thu, Jul 7, 2022 at 3:07 AM Artem Livshits
> > >>  wrote:
> > >>
> > >> > Hi Daniel,
> > >> >
> > >> > What you say makes sense.  Could you file a bug and put this info
> > there
> > >> so
> > >> > that it's easier to track?
> > >> >
> > >> > -Artem
> > >> >
> > >> > On Wed, Jul 6, 2022 at 8:34 AM Dániel Urbán 
> > >> wrote:
> > >> >
> > >> > > Hello everyone,
> > >> > >
> > >> > > I've been investigating some transaction related issues in a very
> > >> > > problematic cluster. Besides finding some interesting issues, I
> had
> > >> some
> > >> > > ideas about how transactional producer behavior could be improved.
> > >> > >
> > >> > > My suggestion in short is: when the transactional producer
> > encounters
> > >> an
> > >> > > error which doesn't necessarily mean that the in-flight request
> was
> > >> > > processed (for example a client side timeout), the producer should
> > not
> > >> > send
> > >> > > an EndTxnRequest on abort, but instead it should bump the producer
> > >> epoch.
> > >> > >
> > >> > > The long description about the issue I found, and how I came to
> the
> > >> > > suggestion:
> > >> > >
> > >> > > First, the description of the issue. When I say that the cluster
> is
> > >> "very
> > >> > > problematic", I mean all kinds of different issues, be it infra
> > (disks
> > >> > and
> > >> > > network) or throughput (high volume producers withou

Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-08-29 Thread Dániel Urbán
Hi Chris,

I understand your point, sounds good to me.
So in short, we should opt for an internal-only API, and preferably a
single server solution. Is that right?

Thanks
Daniel

Chris Egerton  ezt írta (időpont: 2022. aug. 26.,
P, 17:36):

> Hi Daniel,
>
> Glad to hear from you!
>
> With regards to the stripped-down REST API alternative, I don't see how
> this would prevent us from introducing the fully-fledged Connect REST API,
> or even an augmented variant of it, at some point down the road. If we go
> with the internal-only API now, and want to expand later, can't we gate the
> expansion behind a feature flag configuration property that by default
> disables the new feature?
>
> I'm also not sure that we'd ever want to expose the raw Connect REST API
> for dedicated MM2 clusters. If people want that capability, they can
> already spin up a vanilla Connect cluster and run as many MM2 connectors as
> they'd like on it, and as of KIP-458 [1], it's even possible to use a
> single Connect cluster to replicate between any two Kafka clusters instead
> of only targeting the Kafka cluster that the vanilla Connect cluster
> operates on top of. I do agree that it'd be great to be able to dynamically
> adjust things like topic filters without having to restart a dedicated MM2
> node; I'm just not sure that the vanilla Connect REST API is the
> appropriate way to do that, especially since the exact mechanisms that make
> a single Connect cluster viable for replicating across any two Kafka
> clusters could be abused and cause a dedicated MM2 cluster to start writing
> to a completely different Kafka cluster that's not even defined in its
> config file.
>
> Finally, as far as security goes--since this is essentially a bug fix, I'm
> inclined to make it as easy as possible for users to adopt it. MTLS is a
> great start for securing a REST API, but it's not sufficient on its own
> since anyone who could issue an authenticated REST request against the MM2
> cluster would still be able to make any changes they want (with the
> exception of accessing internal endpoints, which were secured with
> KIP-507). If we were to bring up the fully-fledged Connect REST API,
> cluster administrators would also likely have to add some kind of
> authorization layer to prevent people from using the REST API to mess with
> the configurations of the connectors that MM2 brought up. One way of doing
> that is to add a REST extension to your Connect cluster, but implementing
> and configuring one in order to be able to run a multi-node MM2 cluster
> without hitting this bug seems like too much work to be worth it.
>
> I think if we had a better picture of what a REST API for dedicated MM2
> clusters would/should look like, then it would be easier to go along with
> this, and we could even just add the feature flag in this KIP right now to
> address any security concerns. My instinct would be to address this in a
> follow-up KIP in order to reduce scope creep, though, and keep this KIP
> focused on addressing the bug with multi-node dedicated MM2 clusters. What
> do you think?
>
> Cheers,
>
> Chris
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy
>
> On Thu, Aug 25, 2022 at 3:55 AM Dániel Urbán 
> wrote:
>
> > Hi Chris,
> >
> > Thanks for bringing this up again :)
> >
> > 1. I think that is reasonable, though I find the current state of MM2 to
> be
> > confusing. The current issue with the distributed mode is not documented
> > properly, but maybe the logging will help a bit.
> >
> > 2. Going for an internal-only Connect REST version would lock MM2 out of
> a
> > path where the REST API can be used to dynamically reconfigure
> > replications. For now, I agree, it would be easy to corrupt the state of
> > MM2 if someone wanted to use the properties and the REST at the same
> time,
> > but in the future, we might have a chance to introduce a different config
> > mechanism, where only the cluster connections have to be specified in the
> > properties file, and everything else can be configured through REST
> > (enabling replications, changing topic filters, etc.). Because of this,
> I'm
> > leaning towards a full Connect REST API. To avoid issues with conflicts
> > between the props file and the REST, we could document security best
> > practices (e.g. turn on basic auth or mTLS on the Connect REST to avoid
> > unwanted interactions).
> >
> > 3. That is a good point, and I agree, a big plus for motivation.
> >
> > I have a working version of this in which all flows spin up a dedicated
> > Connect REST, but I can give other solutions a try, too.

Re: Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2022-08-25 Thread Dániel Urbán
Hi Chris,

Thanks for bringing this up again :)

1. I think that is reasonable, though I find the current state of MM2 to be
confusing. The current issue with the distributed mode is not documented
properly, but maybe the logging will help a bit.

2. Going for an internal-only Connect REST version would lock MM2 out of a
path where the REST API can be used to dynamically reconfigure
replications. For now, I agree, it would be easy to corrupt the state of
MM2 if someone wanted to use the properties and the REST at the same time,
but in the future, we might have a chance to introduce a different config
mechanism, where only the cluster connections have to be specified in the
properties file, and everything else can be configured through REST
(enabling replications, changing topic filters, etc.). Because of this, I'm
leaning towards a full Connect REST API. To avoid issues with conflicts
between the props file and the REST, we could document security best
practices (e.g. turn on basic auth or mTLS on the Connect REST to avoid
unwanted interactions).

3. That is a good point, and I agree, a big plus for motivation.

I have a working version of this in which all flows spin up a dedicated
Connect REST, but I can give other solutions a try, too.

Thanks,
Daniel

Chris Egerton  ezt írta (időpont: 2022. aug. 24.,
Sze, 17:46):

> Hi Daniel,
>
> I'd like to resurface this KIP in case you're still interested in pursuing
> it. I know it's been a while since you published it, and it hasn't received
> much attention, but I'm hoping we can give it a try now and finally put
> this long-standing bug to rest. To that end, I have some thoughts about the
> proposal. This isn't a complete review, but I wanted to give enough to get
> the ball rolling:
>
> 1. Some environments with firewalls or strict security policies may not be
> able to bring up a REST server for each MM2 node. If we decide that we'd
> like to use the Connect REST API (or even just parts of it) to address this
> bug with MM2, it does make sense to eventually make the availability of the
> REST API a hard requirement for running MM2, but it might be a bit too
> abrupt to do that all in a single release. What do you think about making
> the REST API optional for now, but noting that it will become required in a
> later release (probably 4.0.0 or, if that's not enough time, 5.0.0)? We
> could choose not to bring the REST server for any node whose configuration
> doesn't explicitly opt into one, and maybe log a warning message on startup
> if none is configured. In effect, we'd be marking the current mode (no REST
> server) as deprecated.
>
> 2. I'm not sure that we should count out the "Creating an internal-only
> derivation of the Connect REST API" rejected alternative. Right now, the
> single source of truth for the configuration of a MM2 cluster (assuming
> it's being run in dedicated mode, and not as a connector in a vanilla
> Connect cluster) is the configuration file used for the process. By
> bringing up the REST API, we'd expose endpoints to modify connector
> configurations, which would not only add complexity to the operation of a
> MM2 cluster, but even qualify as an attack vector for malicious entities.
> Thanks to KIP-507 we have some amount of security around the internal-only
> endpoints used by the Connect framework, but for any public endpoints, the
> Connect REST API doesn't come with any security out of the box.
>
> 3. Small point, but with support for exactly-once source connectors coming
> out in 3.3.0, it's also worth noting that that's another feature that won't
> work properly with multi-node MM2 clusters without adding a REST server for
> each node (or some substitute that accomplishes the same goal). I don't
> think this will affect the direction of the design discussion too much, but
> it does help strengthen the motivation.
>
> Cheers,
>
> Chris
>
> On 2021/02/18 15:57:36 Dániel Urbán wrote:
> > Hello everyone,
> >
> > * Sorry, I meant KIP-710.
> >
> > Right now the MirrorMaker cluster is somewhat unreliable, and not
> > supporting running in a cluster properly. I'd say that fixing this would
> be
> > a nice addition.
> > Does anyone have some input on this?
> >
> > Thanks in advance
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2021. jan. 26., K,
> > 15:56):
> >
> > > Hello everyone,
> > >
> > > I would like to start a discussion on KIP-709, which addresses some
> > > missing features in MM2 dedicated mode.
> > >
> > >
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters
> > >
> > > Currently, the dedicated mode of MM2 does not f

Re: Transactions, delivery timeout and changing transactional producer behavior

2022-07-08 Thread Dániel Urbán
Submitted a PR with the fix: https://github.com/apache/kafka/pull/12392
In the PR I tried keeping the producer in a usable state after the forced
bump. I understand that it might be the cleanest solution, but the only
other option I know of is to transition into a fatal state, meaning that
the producer has to be recreated after a delivery timeout. I think that is
still fine compared to the out-of-order messages.

Looking forward to your reviews,
Daniel

Dániel Urbán  ezt írta (időpont: 2022. júl. 7., Cs,
12:04):

> Thanks for the feedback, I created
> https://issues.apache.org/jira/browse/KAFKA-14053 and started working on
> a PR.
>
> Luke, for the workaround, we used the transaction admin tool released in
> 3.0 to "abort" these hanging batches manually.
> Naturally, the cluster health should be stabilized. This issue popped up
> most frequently around times when some partitions went into a few minute
> window of unavailability. The infinite retries on the producer side caused
> a situation where the last retry was still in-flight, but the delivery
> timeout was triggered on the client side. We reduced the retries and
> increased the delivery timeout to avoid such situations.
> Still, the issue can occur in other scenarios, for example a client
> queueing up many batches in the producer buffer, and causing those batches
> to spend most of the delivery timeout window in the client memory.
>
> Thanks,
> Daniel
>
> Luke Chen  ezt írta (időpont: 2022. júl. 7., Cs, 5:13):
>
>> Hi Daniel,
>>
>> Thanks for reporting the issue, and the investigation.
>> I'm curious, so, what's your workaround for this issue?
>>
>> I agree with Artem, it makes sense. Please file a bug in JIRA.
>> And looking forward to your PR! :)
>>
>> Thank you.
>> Luke
>>
>> On Thu, Jul 7, 2022 at 3:07 AM Artem Livshits
>>  wrote:
>>
>> > Hi Daniel,
>> >
>> > What you say makes sense.  Could you file a bug and put this info there
>> so
>> > that it's easier to track?
>> >
>> > -Artem
>> >
>> > On Wed, Jul 6, 2022 at 8:34 AM Dániel Urbán 
>> wrote:
>> >
>> > > Hello everyone,
>> > >
>> > > I've been investigating some transaction related issues in a very
>> > > problematic cluster. Besides finding some interesting issues, I had
>> some
>> > > ideas about how transactional producer behavior could be improved.
>> > >
>> > > My suggestion in short is: when the transactional producer encounters
>> an
>> > > error which doesn't necessarily mean that the in-flight request was
>> > > processed (for example a client side timeout), the producer should not
>> > send
>> > > an EndTxnRequest on abort, but instead it should bump the producer
>> epoch.
>> > >
>> > > The long description about the issue I found, and how I came to the
>> > > suggestion:
>> > >
>> > > First, the description of the issue. When I say that the cluster is
>> "very
>> > > problematic", I mean all kinds of different issues, be it infra (disks
>> > and
>> > > network) or throughput (high volume producers without fine tuning).
>> > > In this cluster, Kafka transactions are widely used by many producers.
>> > And
>> > > in this cluster, partitions get "stuck" frequently (few times every
>> > week).
>> > >
>> > > The exact meaning of a partition being "stuck" is this:
>> > >
>> > > On the client side:
>> > > 1. A transactional producer sends X batches to a partition in a single
>> > > transaction
>> > > 2. Out of the X batches, the last few get sent, but are timed out
>> thanks
>> > to
>> > > the delivery timeout config
>> > > 3. producer.flush() is unblocked due to all batches being "finished"
>> > > 4. Based on the errors reported in the producer.send() callback,
>> > > producer.abortTransaction() is called
>> > > 5. Then producer.close() is also invoked with a 5s timeout (this
>> > > application does not reuse the producer instances optimally)
>> > > 6. The transactional.id of the producer is never reused (it was
>> random
>> > > generated)
>> > >
>> > > On the partition leader side (what appears in the log segment of the
>> > > partition):
>> > > 1. The batches sent by the producer are all appended to the log
>> > > 2. But the ABORT marker of the transaction wa

Re: Transactions, delivery timeout and changing transactional producer behavior

2022-07-07 Thread Dániel Urbán
Thanks for the feedback, I created
https://issues.apache.org/jira/browse/KAFKA-14053 and started working on a
PR.

Luke, for the workaround, we used the transaction admin tool released in
3.0 to "abort" these hanging batches manually.
Naturally, the cluster health should be stabilized. This issue popped up
most frequently around times when some partitions went into a few minute
window of unavailability. The infinite retries on the producer side caused
a situation where the last retry was still in-flight, but the delivery
timeout was triggered on the client side. We reduced the retries and
increased the delivery timeout to avoid such situations.
Still, the issue can occur in other scenarios, for example a client
queueing up many batches in the producer buffer, and causing those batches
to spend most of the delivery timeout window in the client memory.

Thanks,
Daniel

Luke Chen  ezt írta (időpont: 2022. júl. 7., Cs, 5:13):

> Hi Daniel,
>
> Thanks for reporting the issue, and the investigation.
> I'm curious, so, what's your workaround for this issue?
>
> I agree with Artem, it makes sense. Please file a bug in JIRA.
> And looking forward to your PR! :)
>
> Thank you.
> Luke
>
> On Thu, Jul 7, 2022 at 3:07 AM Artem Livshits
>  wrote:
>
> > Hi Daniel,
> >
> > What you say makes sense.  Could you file a bug and put this info there
> so
> > that it's easier to track?
> >
> > -Artem
> >
> > On Wed, Jul 6, 2022 at 8:34 AM Dániel Urbán 
> wrote:
> >
> > > Hello everyone,
> > >
> > > I've been investigating some transaction related issues in a very
> > > problematic cluster. Besides finding some interesting issues, I had
> some
> > > ideas about how transactional producer behavior could be improved.
> > >
> > > My suggestion in short is: when the transactional producer encounters
> an
> > > error which doesn't necessarily mean that the in-flight request was
> > > processed (for example a client side timeout), the producer should not
> > send
> > > an EndTxnRequest on abort, but instead it should bump the producer
> epoch.
> > >
> > > The long description about the issue I found, and how I came to the
> > > suggestion:
> > >
> > > First, the description of the issue. When I say that the cluster is
> "very
> > > problematic", I mean all kinds of different issues, be it infra (disks
> > and
> > > network) or throughput (high volume producers without fine tuning).
> > > In this cluster, Kafka transactions are widely used by many producers.
> > And
> > > in this cluster, partitions get "stuck" frequently (few times every
> > week).
> > >
> > > The exact meaning of a partition being "stuck" is this:
> > >
> > > On the client side:
> > > 1. A transactional producer sends X batches to a partition in a single
> > > transaction
> > > 2. Out of the X batches, the last few get sent, but are timed out
> thanks
> > to
> > > the delivery timeout config
> > > 3. producer.flush() is unblocked due to all batches being "finished"
> > > 4. Based on the errors reported in the producer.send() callback,
> > > producer.abortTransaction() is called
> > > 5. Then producer.close() is also invoked with a 5s timeout (this
> > > application does not reuse the producer instances optimally)
> > > 6. The transactional.id of the producer is never reused (it was random
> > > generated)
> > >
> > > On the partition leader side (what appears in the log segment of the
> > > partition):
> > > 1. The batches sent by the producer are all appended to the log
> > > 2. But the ABORT marker of the transaction was appended before the
> last 1
> > > or 2 batches of the transaction
> > >
> > > On the transaction coordinator side (what appears in the transaction
> > state
> > > partition):
> > > The transactional.id is present with the Empty state.
> > >
> > > These happenings result in the following:
> > > 1. The partition leader handles the first batch after the ABORT marker
> as
> > > the first message of a new transaction of the same producer id + epoch.
> > > (LSO is blocked at this point)
> > > 2. The transaction coordinator is not aware of any in-progress
> > transaction
> > > of the producer, thus never aborting the transaction, not even after
> the
> > > transaction.timeout.ms passes.
> > >
> > > This is happening with Kafka 2.5 running in the cluster, producer

Transactions, delivery timeout and changing transactional producer behavior

2022-07-06 Thread Dániel Urbán
Hello everyone,

I've been investigating some transaction related issues in a very
problematic cluster. Besides finding some interesting issues, I had some
ideas about how transactional producer behavior could be improved.

My suggestion in short is: when the transactional producer encounters an
error which doesn't necessarily mean that the in-flight request was
processed (for example a client side timeout), the producer should not send
an EndTxnRequest on abort, but instead it should bump the producer epoch.

The long description about the issue I found, and how I came to the
suggestion:

First, the description of the issue. When I say that the cluster is "very
problematic", I mean all kinds of different issues, be it infra (disks and
network) or throughput (high volume producers without fine tuning).
In this cluster, Kafka transactions are widely used by many producers. And
in this cluster, partitions get "stuck" frequently (few times every week).

The exact meaning of a partition being "stuck" is this:

On the client side:
1. A transactional producer sends X batches to a partition in a single
transaction
2. Out of the X batches, the last few get sent, but are timed out thanks to
the delivery timeout config
3. producer.flush() is unblocked due to all batches being "finished"
4. Based on the errors reported in the producer.send() callback,
producer.abortTransaction() is called
5. Then producer.close() is also invoked with a 5s timeout (this
application does not reuse the producer instances optimally)
6. The transactional.id of the producer is never reused (it was random
generated)

On the partition leader side (what appears in the log segment of the
partition):
1. The batches sent by the producer are all appended to the log
2. But the ABORT marker of the transaction was appended before the last 1
or 2 batches of the transaction

On the transaction coordinator side (what appears in the transaction state
partition):
The transactional.id is present with the Empty state.

These happenings result in the following:
1. The partition leader handles the first batch after the ABORT marker as
the first message of a new transaction of the same producer id + epoch.
(LSO is blocked at this point)
2. The transaction coordinator is not aware of any in-progress transaction
of the producer, thus never aborting the transaction, not even after the
transaction.timeout.ms passes.

This is happening with Kafka 2.5 running in the cluster, producer versions
range between 2.0 and 2.6.
I scanned through a lot of tickets, and I believe that this issue is not
specific to these versions, and could happen with newest versions as well.
If I'm mistaken, some pointers would be appreciated.

Assuming that the issue could occur with any version, I believe this issue
boils down to one oversight on the client side:
When a request fails without a definitive response (e.g. a delivery
timeout), the client cannot assume that the request is "finished", and
simply abort the transaction. If the request is still in flight, and the
EndTxnRequest, then the WriteTxnMarkerRequest gets sent and processed
earlier, the contract is violated by the client.
This could be avoided by providing more information to the partition
leader. Right now, a new transactional batch signals the start of a new
transaction, and there is no way for the partition leader to decide whether
the batch is an out-of-order message.
In a naive and wasteful protocol, we could have a unique transaction id
added to each batch and marker, meaning that the leader would be capable of
refusing batches which arrive after the control marker of the transaction.
But instead of changing the log format and the protocol, we can achieve the
same by bumping the producer epoch.

Bumping the epoch has a similar effect to "changing the transaction id" -
the in-progress transaction will be aborted with a bumped producer epoch,
telling the partition leader about the producer epoch change. From this
point on, any batches sent with the old epoch will be refused by the leader
due to the fencing mechanism. It doesn't really matter how many batches
will get appended to the log, and how many will be refused - this is an
aborted transaction - but the out-of-order message cannot occur, and cannot
block the LSO infinitely.

My suggestion is, that the TransactionManager inside the producer should
keep track of what type of errors were encountered by the batches of the
transaction, and categorize them along the lines of "definitely completed"
and "might not be completed". When the transaction goes into an abortable
state, and there is at least one batch with "might not be completed", the
EndTxnRequest should be skipped, and an epoch bump should be sent.
As for what type of error counts as "might not be completed", I can only
think of client side timeouts.

I believe this is a relatively small change (only affects the client lib),
but it helps in avoiding some corrupt states in Kafka transactions.

Looking forward to 

Re: [DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2021-02-18 Thread Dániel Urbán
Hello everyone,

* Sorry, I meant KIP-710.

Right now the MirrorMaker cluster is somewhat unreliable, and not
supporting running in a cluster properly. I'd say that fixing this would be
a nice addition.
Does anyone have some input on this?

Thanks in advance
Daniel

Dániel Urbán  ezt írta (időpont: 2021. jan. 26., K,
15:56):

> Hello everyone,
>
> I would like to start a discussion on KIP-709, which addresses some
> missing features in MM2 dedicated mode.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters
>
> Currently, the dedicated mode of MM2 does not fully support running in a
> cluster. The core issue is that the Connect REST Server is not included in
> the dedicated mode, which makes follower->leader communication impossible.
> In some cases, this results in the cluster not being able to react to
> dynamic configuration changes (e.g. dynamic topic filter changes).
> Another smaller detail is that MM2 dedicated mode eagerly resolves config
> provider references in the Connector configurations, which is undesirable
> and a breaking change compared to vanilla Connect. This can cause an issue
> for example when there is an environment variable reference, which contains
> some host-specific information, like a file path. The leader resolves the
> reference eagerly, and the resolved value is propagated to other MM2 nodes
> instead of the reference being resolved locally, separately on each node.
>
> The KIP addresses these by adding the Connect REST Server to the MM2
> dedicated mode for each replication flow, and postponing the config
> provider reference resolution.
>
> Please discuss, I know this is a major change, but also an important
> feature for MM2 users.
>
> Daniel
>


[DISCUSS] KIP-710: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2021-01-26 Thread Dániel Urbán
Hello everyone,

I would like to start a discussion on KIP-709, which addresses some missing
features in MM2 dedicated mode.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters

Currently, the dedicated mode of MM2 does not fully support running in a
cluster. The core issue is that the Connect REST Server is not included in
the dedicated mode, which makes follower->leader communication impossible.
In some cases, this results in the cluster not being able to react to
dynamic configuration changes (e.g. dynamic topic filter changes).
Another smaller detail is that MM2 dedicated mode eagerly resolves config
provider references in the Connector configurations, which is undesirable
and a breaking change compared to vanilla Connect. This can cause an issue
for example when there is an environment variable reference, which contains
some host-specific information, like a file path. The leader resolves the
reference eagerly, and the resolved value is propagated to other MM2 nodes
instead of the reference being resolved locally, separately on each node.

The KIP addresses these by adding the Connect REST Server to the MM2
dedicated mode for each replication flow, and postponing the config
provider reference resolution.

Please discuss, I know this is a major change, but also an important
feature for MM2 users.

Daniel


Re: [DISCUSS] KIP-709: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2021-01-26 Thread Dániel Urbán
Hi Tom,
Sure, will increment. Sorry for the confusion, I just started using the
next number, but did not check the KIP list.
Thanks,
Daniel

Thomas Scott  ezt írta (időpont: 2021. jan. 26., K,
10:58):

> Hi Daniel,
>
>   It seems we have duplicate KIP-709s. Can we move this one to KIP-710?
>
> Thanks
>
>   Tom
>
>
> On Tue, Jan 26, 2021 at 8:35 AM Dániel Urbán 
> wrote:
>
> > Hello everyone,
> >
> > I would like to start a discussion on KIP-709, which addresses some
> missing
> > features in MM2 dedicated mode.
> >
> > Currently, the dedicated mode of MM2 does not fully support running in a
> > cluster. The core issue is that the Connect REST Server is not included
> in
> > the dedicated mode, which makes follower->leader communication
> impossible.
> > In some cases, this results in the cluster not being able to react to
> > dynamic configuration changes (e.g. dynamic topic filter changes).
> > Another smaller detail is that MM2 dedicated mode eagerly resolves config
> > provider references in the Connector configurations, which is undesirable
> > and a breaking change compared to vanilla Connect. This can cause an
> issue
> > for example when there is an environment variable reference, which
> contains
> > some host-specific information, like a file path. The leader resolves the
> > reference eagerly, and the resolved value is propagated to other MM2
> nodes
> > instead of the reference being resolved locally, separately on each node.
> >
> > The KIP addresses these by adding the Connect REST Server to the MM2
> > dedicated mode for each replication flow, and postponing the config
> > provider reference resolution.
> >
> > Please discuss, I know this is a major change, but also an important
> > feature for MM2 users.
> >
> > Daniel
> >
>


Re: [DISCUSS] KIP-709: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2021-01-26 Thread Dániel Urbán
And I guess providing the link wouldn't hurt either:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-709%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters

On Tue, Jan 26, 2021 at 9:35 AM Dániel Urbán  wrote:

> Hello everyone,
>
> I would like to start a discussion on KIP-709, which addresses some missing
> features in MM2 dedicated mode.
>
> Currently, the dedicated mode of MM2 does not fully support running in a
> cluster. The core issue is that the Connect REST Server is not included in
> the dedicated mode, which makes follower->leader communication impossible.
> In some cases, this results in the cluster not being able to react to
> dynamic configuration changes (e.g. dynamic topic filter changes).
> Another smaller detail is that MM2 dedicated mode eagerly resolves config
> provider references in the Connector configurations, which is undesirable
> and a breaking change compared to vanilla Connect. This can cause an issue
> for example when there is an environment variable reference, which contains
> some host-specific information, like a file path. The leader resolves the
> reference eagerly, and the resolved value is propagated to other MM2 nodes
> instead of the reference being resolved locally, separately on each node.
>
> The KIP addresses these by adding the Connect REST Server to the MM2
> dedicated mode for each replication flow, and postponing the config
> provider reference resolution.
>
> Please discuss, I know this is a major change, but also an important
> feature for MM2 users.
>
> Daniel
>


[DISCUSS] KIP-709: Full support for distributed mode in dedicated MirrorMaker 2.0 clusters

2021-01-26 Thread Dániel Urbán
Hello everyone,

I would like to start a discussion on KIP-709, which addresses some missing
features in MM2 dedicated mode.

Currently, the dedicated mode of MM2 does not fully support running in a
cluster. The core issue is that the Connect REST Server is not included in
the dedicated mode, which makes follower->leader communication impossible.
In some cases, this results in the cluster not being able to react to
dynamic configuration changes (e.g. dynamic topic filter changes).
Another smaller detail is that MM2 dedicated mode eagerly resolves config
provider references in the Connector configurations, which is undesirable
and a breaking change compared to vanilla Connect. This can cause an issue
for example when there is an environment variable reference, which contains
some host-specific information, like a file path. The leader resolves the
reference eagerly, and the resolved value is propagated to other MM2 nodes
instead of the reference being resolved locally, separately on each node.

The KIP addresses these by adding the Connect REST Server to the MM2
dedicated mode for each replication flow, and postponing the config
provider reference resolution.

Please discuss, I know this is a major change, but also an important
feature for MM2 users.

Daniel


confluence access

2020-10-13 Thread Dániel Urbán
Hi,

I'd like to access the wiki and contribute to KIPs, please add me. My
username is urbandan.

Thanks in advance,
Daniel


Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-10-08 Thread Dániel Urbán
Thank you for the votes!

The KIP passes with 3 binding votes (Manikumar, Mickael, Ismael) and 3
non-binding votes (Viktor, Kamal, David).

Daniel

Ismael Juma  ezt írta (időpont: 2020. okt. 8., Cs,
16:27):

> Thanks for the KIP, +1 (binding).
>
> On Mon, Jul 27, 2020 at 1:09 AM Dániel Urbán 
> wrote:
>
> > Hello everyone,
> >
> > I'd like to start a vote on KIP-635. The KIP enhances the GetOffsetShell
> > tool by enabling querying multiple topic-partitions, adding new filtering
> > options, and adding a config override option.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
> >
> > The original discussion thread was named "[DISCUSS] KIP-308:
> > GetOffsetShell: new KafkaConsumer API, support for multiple topics,
> > minimize the number of requests to server". The id had to be changed as
> > there was a collision, and the KIP also had to be renamed, as some of its
> > motivations were outdated.
> >
> > Thanks,
> > Daniel
> >
>


Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

2020-10-01 Thread Dániel Urbán
Hi Viktor,

I think the current state of the proposal is flexible enough to support
use-cases where the response data is of interest to the auditor.
This part ensures that: "... doing the auditing before sending the response
back ...". Additionally, event classes could be extended with additional
data if needed.

Overall, the KIP looks good, thanks!

Daniel

Viktor Somogyi-Vass  ezt írta (időpont: 2020.
szept. 30., Sze, 17:24):

> Hi Daniel,
>
> I think in this sense we can use the precedence set with the
> KAfkaAdminClient. It has *Result and *Options classes which in this
> interpretation are similar in versioning and usage as they transform and
> convey the responses of the protocol in a minimalistic API.
> I've modified the KIP a bit and created some examples for these event
> classes. For now as the implementation I think we can treat this similarly
> to KIP-4 (AdminClient) which didn't push implementation for everything but
> rather pushed implementing everything to subsequent KIPs as the
> requirements become important. In this first KIP we can create the more
> important ones (listed in the "Default Implementation") section if that is
> fine.
>
> Regarding the response passing: to be honest I feel like that it's not that
> strictly related to auditing but I think it's a good idea and could fit
> into this API. I think that we should design this current API with this in
> mind. Did you have any specific ideas about the implementation?
>
> Viktor
>
> On Tue, Sep 22, 2020 at 9:05 AM Dániel Urbán 
> wrote:
>
> > An example I had in mind was the ProduceResponse - the auditor might need
> > access to the new end offset of the partitions.
> > The event-based approach sounds good - new events and fields can be added
> > on-demand. Do we need the same versioning strategy we use with the
> > requests/responses?
> >
> > Daniel
> >
> > Viktor Somogyi-Vass  ezt írta (időpont: 2020.
> > szept. 21., H, 14:08):
> >
> > > Hi Daniel,
> > >
> > > > If the auditor needs access to the details of the action, one could
> > argue
> > > that even the response should be passed down to the auditor.
> > > At this point I don't think we need to include responses into the
> > interface
> > > but if you have a use-case we can consider doing that.
> > >
> > > > Is it feasible to convert the Java requests and responses to public
> > API?
> > > Well I think that in this case we would need to actually transform a
> lot
> > of
> > > classes and that might be a bit too invasive. Although since the
> protocol
> > > itself *is* a public API it might make sense to have some kind of Java
> > > representation as a public API as well.
> > >
> > > > If not, do we have another option to access this info in the auditor?
> > > I think one option would be to do what the original KIP-567 was
> > > implemented. Basically we could have an AuditEvent interface that would
> > > contain request specific data. Its obvious drawback is that it has to
> be
> > > implemented for most of the 40 something protocols but on the upside
> > these
> > > classes shouldn't be complicated. I can try to do a PoC with this to
> see
> > > how it looks like and whether it solves the problem. To be honest I
> think
> > > it would be better than publishing the request classes as an API
> because
> > > here we're restricting access to only what is necessary.
> > >
> > > Thanks,
> > > Viktor
> > >
> > >
> > >
> > > On Fri, Sep 18, 2020 at 8:37 AM Dániel Urbán 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > If the auditor needs access to the details of the action, one could
> > argue
> > > > that even the response should be passed down to the auditor.
> > > > Is it feasible to convert the Java requests and responses to public
> > API?
> > > > If not, do we have another option to access this info in the auditor?
> > > > I know that the auditor could just send proper requests through the
> API
> > > to
> > > > the brokers, but that seems like an awful lot of overhead, and could
> > > > introduce timing issues as well.
> > > >
> > > > Daniel
> > > >
> > > >
> > > > Viktor Somogyi-Vass  ezt írta (időpont:
> 2020.
> > > > szept. 16., Sze, 17:17):
> > > >
> > > > > One more after-thought on your sec

Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

2020-09-22 Thread Dániel Urbán
An example I had in mind was the ProduceResponse - the auditor might need
access to the new end offset of the partitions.
The event-based approach sounds good - new events and fields can be added
on-demand. Do we need the same versioning strategy we use with the
requests/responses?

Daniel

Viktor Somogyi-Vass  ezt írta (időpont: 2020.
szept. 21., H, 14:08):

> Hi Daniel,
>
> > If the auditor needs access to the details of the action, one could argue
> that even the response should be passed down to the auditor.
> At this point I don't think we need to include responses into the interface
> but if you have a use-case we can consider doing that.
>
> > Is it feasible to convert the Java requests and responses to public API?
> Well I think that in this case we would need to actually transform a lot of
> classes and that might be a bit too invasive. Although since the protocol
> itself *is* a public API it might make sense to have some kind of Java
> representation as a public API as well.
>
> > If not, do we have another option to access this info in the auditor?
> I think one option would be to do what the original KIP-567 was
> implemented. Basically we could have an AuditEvent interface that would
> contain request specific data. Its obvious drawback is that it has to be
> implemented for most of the 40 something protocols but on the upside these
> classes shouldn't be complicated. I can try to do a PoC with this to see
> how it looks like and whether it solves the problem. To be honest I think
> it would be better than publishing the request classes as an API because
> here we're restricting access to only what is necessary.
>
> Thanks,
> Viktor
>
>
>
> On Fri, Sep 18, 2020 at 8:37 AM Dániel Urbán 
> wrote:
>
> > Hi,
> >
> > Thanks for the KIP.
> >
> > If the auditor needs access to the details of the action, one could argue
> > that even the response should be passed down to the auditor.
> > Is it feasible to convert the Java requests and responses to public API?
> > If not, do we have another option to access this info in the auditor?
> > I know that the auditor could just send proper requests through the API
> to
> > the brokers, but that seems like an awful lot of overhead, and could
> > introduce timing issues as well.
> >
> > Daniel
> >
> >
> > Viktor Somogyi-Vass  ezt írta (időpont: 2020.
> > szept. 16., Sze, 17:17):
> >
> > > One more after-thought on your second point (AbstractRequest): the
> > reason I
> > > introduced it in the first place was that this way implementers can
> > access
> > > request data. A use case can be if they want to audit a change in
> > > configuration or client quotas but not just acknowledge the fact that
> > such
> > > an event happened but also capture the change itself by peeking into
> the
> > > request. Sometimes it can be useful especially when people want to
> trace
> > > back the order of events and what happened when and not just
> acknowledge
> > > that there was an event of a certain kind. I also recognize that this
> > might
> > > be a very loose interpretation of auditing as it's not strictly related
> > to
> > > authorization but rather a way of tracing the admin actions within the
> > > cluster. It even could be a different API therefore but because of the
> > > variety of the Kafka APIs it's very hard to give a method that fits
> all,
> > so
> > > it's easier to pass down the AbstractRequest and the implementation can
> > do
> > > the extraction of valuable info. So that's why I added this in the
> first
> > > place and I'm interested in your thoughts.
> > >
> > > On Wed, Sep 16, 2020 at 4:41 PM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com>
> > > wrote:
> > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for reviewing the KIP.
> > > >
> > > > 1.) I just wanted to follow the conventions used with the Authorizer
> as
> > > it
> > > > is built in a similar fashion, although it's true that in KafkaServer
> > we
> > > > call the configure() method and the start() in the next line. This
> > would
> > > be
> > > > the same in Auditor and even simpler as there aren't any parameters
> to
> > > > start(), so I can remove it. If it turns out there is a need for it,
> we
> > > can
> > > > add it later.
> > > >
> > > > 2.) Yes, this is a very good point, I will remove it, however in this
> > > case
> > >

Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

2020-09-18 Thread Dániel Urbán
Hi,

Thanks for the KIP.

If the auditor needs access to the details of the action, one could argue
that even the response should be passed down to the auditor.
Is it feasible to convert the Java requests and responses to public API?
If not, do we have another option to access this info in the auditor?
I know that the auditor could just send proper requests through the API to
the brokers, but that seems like an awful lot of overhead, and could
introduce timing issues as well.

Daniel


Viktor Somogyi-Vass  ezt írta (időpont: 2020.
szept. 16., Sze, 17:17):

> One more after-thought on your second point (AbstractRequest): the reason I
> introduced it in the first place was that this way implementers can access
> request data. A use case can be if they want to audit a change in
> configuration or client quotas but not just acknowledge the fact that such
> an event happened but also capture the change itself by peeking into the
> request. Sometimes it can be useful especially when people want to trace
> back the order of events and what happened when and not just acknowledge
> that there was an event of a certain kind. I also recognize that this might
> be a very loose interpretation of auditing as it's not strictly related to
> authorization but rather a way of tracing the admin actions within the
> cluster. It even could be a different API therefore but because of the
> variety of the Kafka APIs it's very hard to give a method that fits all, so
> it's easier to pass down the AbstractRequest and the implementation can do
> the extraction of valuable info. So that's why I added this in the first
> place and I'm interested in your thoughts.
>
> On Wed, Sep 16, 2020 at 4:41 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> wrote:
>
> > Hi Mickael,
> >
> > Thanks for reviewing the KIP.
> >
> > 1.) I just wanted to follow the conventions used with the Authorizer as
> it
> > is built in a similar fashion, although it's true that in KafkaServer we
> > call the configure() method and the start() in the next line. This would
> be
> > the same in Auditor and even simpler as there aren't any parameters to
> > start(), so I can remove it. If it turns out there is a need for it, we
> can
> > add it later.
> >
> > 2.) Yes, this is a very good point, I will remove it, however in this
> case
> > I don't think we need to add the ApiKey as it is already available in
> > AuthorizableRequestContext.requestType(). One less parameter :).
> >
> > 3.) I'll add it. It will simply log important changes in the cluster like
> > topic events (create, update, delete, partition or replication factor
> > change), ACL events, config changes, reassignment, altering log dirs,
> > offset delete, group delete with the authorization info like who
> initiated
> > the call, was it authorized, were there any errors. Let me know if you
> > think there are other APIs I should include.
> >
> > 4.) The builder is there mostly for easier usability but actually
> thinking
> > of it it doesn't help much so I removed it. The AuditInfo is also a
> helper
> > class so I don't see any value in transforming it into an interface but
> if
> > I simplify it (by removing the builder) it will be cleaner. Would that
> work?
> >
> > I'll update the KIP to reflect my answers.
> >
> > Viktor
> >
> >
> > On Mon, Sep 14, 2020 at 6:02 PM Mickael Maison  >
> > wrote:
> >
> >> Hi Viktor,
> >>
> >> Thanks for restarting the discussion on this KIP. Being able to easily
> >> audit usage of a Kafka cluster is a very valuable feature.
> >>
> >> Regarding the API, I have a few of questions:
> >> 1) You introduced a start() method. I don't think any other interfaces
> >> have such a method. Users can do any setup they want in configure()
> >>
> >> 2) The first argument of audit is an AbstractRequest. Unfortunately
> >> this type is not part of the public API. But actually I'm not sure
> >> having the full request is really needed here. Maybe just passing the
> >> Apikey would be enough as we already have all the resources from the
> >> auditInfos field.
> >>
> >> 3) The KIP mentions a "LoggingAuditor" default implementation. What is
> >> it doing? Can you add more details about it?
> >>
> >> 4) Can fields of AuditInfo be null? I can see there's a constructor
> >> without an Errors and that sets the error field to None. However, with
> >> the builder pattern, if error is not set it's null.
> >>
> >> 5) Should AuditInfo be an interface?
> >>
> >> On Mon, Sep 14, 2020 at 3:26 PM Viktor Somogyi-Vass
> >>  wrote:
> >> >
> >> > Hi everyone,
> >> >
> >> > Changed the interface a little bit to accommodate methods better where
> >> > authorization happens for multiple operations so the implementer of
> the
> >> > audit interface will receive all authorizations together.
> >> > I'll wait a few more days to allow people to react or give feedback
> but
> >> if
> >> > there are no objections until then, I'll start a vote.
> >> >
> >> > Viktor
> >> >
> >> > On Tue, Sep 8, 2020 at 9:49 AM 

Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-08-31 Thread Dániel Urbán
Hello everyone,

I'd like to ask you to consider voting for this KIP, it only needs 2 more
binding votes.

This KIP focuses on the GetOffsetShell tool. The tool is useful for
monitoring and investigations as well.
Unfortunately, the tool misses one key, and some quality-of-life features:
- Because of lack of configuration options, it cannot be used with secure
clusters - which makes it unviable in many use-cases.
- Only supports a single topic - usability improvement, also in line with
other tools using pattern based matching.
- Still utilizes the deprecated "--broker-list" argument name.

Overall, I believe that this is a non-intrusive change - minor improvements
without breaking changes. But for some, this would be a great improvement
in using Kafka.

Thank you in advance,
Daniel


Dániel Urbán  ezt írta (időpont: 2020. aug. 27., Cs,
17:52):

> Hi all,
>
> Please vote if you'd like to see this implemented. This one fixes a
> long-time debt, would be nice to see it pass.
>
> Thank you,
> Daniel
>
> Dániel Urbán  ezt írta (időpont: 2020. aug. 18.,
> K, 14:06):
>
>> Hello everyone,
>>
>> Please, if you are interested in this KIP and PR, don't forget to vote.
>>
>> Thank you,
>> Daniel
>>
>> Dániel Urbán  ezt írta (időpont: 2020. aug. 13.,
>> Cs, 14:00):
>>
>>> Hi David,
>>>
>>> Thank you for the suggestion. KIP-635 was referencing the --broker-list
>>> issue, but based on your suggestion, I pinged the PR
>>> https://github.com/apache/kafka/pull/8123.
>>> Since I got no response, I updated KIP-635 to deprecate --broker-list.
>>> Will update the PR related to KIP-635 to reflect that change.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> David Jacot  ezt írta (időpont: 2020. aug. 10., H,
>>> 20:48):
>>>
>>>> Hi Daniel,
>>>>
>>>> I was not aware of that PR. At minimum, I would add `--bootstrap-server`
>>>> to the list in the KIP for completeness. Regarding the implementation,
>>>> I would leave a comment in that PR asking if they plan to continue it.
>>>> If
>>>> not,
>>>> we could do it as part of your PR directly.
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On Mon, Aug 10, 2020 at 10:49 AM Dániel Urbán 
>>>> wrote:
>>>>
>>>> > Hi everyone,
>>>> >
>>>> > Just a reminder, please vote if you are interested in this KIP being
>>>> > implemented.
>>>> >
>>>> > Thanks,
>>>> > Daniel
>>>> >
>>>> > Dániel Urbán  ezt írta (időpont: 2020. júl.
>>>> 31., P,
>>>> > 9:01):
>>>> >
>>>> > > Hi David,
>>>> > >
>>>> > > There is another PR linked on KAFKA-8507, which is still open:
>>>> > > https://github.com/apache/kafka/pull/8123
>>>> > > Wasn't sure if it will go in, and wanted to avoid conflicts. Do you
>>>> think
>>>> > > I should do the switch to '--bootstrap-server' anyway?
>>>> > >
>>>> > > Thanks,
>>>> > > Daniel
>>>> > >
>>>> > > David Jacot  ezt írta (időpont: 2020. júl.
>>>> 30., Cs,
>>>> > > 17:52):
>>>> > >
>>>> > >> Hi Daniel,
>>>> > >>
>>>> > >> Thanks for the KIP.
>>>> > >>
>>>> > >> It seems that we have forgotten to include this tool in KIP-499.
>>>> > >> KAFKA-8507
>>>> > >> is resolved
>>>> > >> by this tool still uses the deprecated "--broker-list". I suggest
>>>> to
>>>> > >> include "--bootstrap-server"
>>>> > >> in your public interfaces as well and fix this omission during the
>>>> > >> implementation.
>>>> > >>
>>>> > >> +1 (non-binding)
>>>> > >>
>>>> > >> Thanks,
>>>> > >> David
>>>> > >>
>>>> > >> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
>>>> > >> kamal.chandraprak...@gmail.com> wrote:
>>>> > >>
>>>> > >> > +1 (non-binding), thanks for the KIP!
>>>> > >> >
>>>> > >> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar <
>>>> manikumar.re...@gmail

Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-08-27 Thread Dániel Urbán
Hi all,

Please vote if you'd like to see this implemented. This one fixes a
long-time debt, would be nice to see it pass.

Thank you,
Daniel

Dániel Urbán  ezt írta (időpont: 2020. aug. 18., K,
14:06):

> Hello everyone,
>
> Please, if you are interested in this KIP and PR, don't forget to vote.
>
> Thank you,
> Daniel
>
> Dániel Urbán  ezt írta (időpont: 2020. aug. 13.,
> Cs, 14:00):
>
>> Hi David,
>>
>> Thank you for the suggestion. KIP-635 was referencing the --broker-list
>> issue, but based on your suggestion, I pinged the PR
>> https://github.com/apache/kafka/pull/8123.
>> Since I got no response, I updated KIP-635 to deprecate --broker-list.
>> Will update the PR related to KIP-635 to reflect that change.
>>
>> Thanks,
>> Daniel
>>
>> David Jacot  ezt írta (időpont: 2020. aug. 10., H,
>> 20:48):
>>
>>> Hi Daniel,
>>>
>>> I was not aware of that PR. At minimum, I would add `--bootstrap-server`
>>> to the list in the KIP for completeness. Regarding the implementation,
>>> I would leave a comment in that PR asking if they plan to continue it. If
>>> not,
>>> we could do it as part of your PR directly.
>>>
>>> Cheers,
>>> David
>>>
>>> On Mon, Aug 10, 2020 at 10:49 AM Dániel Urbán 
>>> wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > Just a reminder, please vote if you are interested in this KIP being
>>> > implemented.
>>> >
>>> > Thanks,
>>> > Daniel
>>> >
>>> > Dániel Urbán  ezt írta (időpont: 2020. júl.
>>> 31., P,
>>> > 9:01):
>>> >
>>> > > Hi David,
>>> > >
>>> > > There is another PR linked on KAFKA-8507, which is still open:
>>> > > https://github.com/apache/kafka/pull/8123
>>> > > Wasn't sure if it will go in, and wanted to avoid conflicts. Do you
>>> think
>>> > > I should do the switch to '--bootstrap-server' anyway?
>>> > >
>>> > > Thanks,
>>> > > Daniel
>>> > >
>>> > > David Jacot  ezt írta (időpont: 2020. júl.
>>> 30., Cs,
>>> > > 17:52):
>>> > >
>>> > >> Hi Daniel,
>>> > >>
>>> > >> Thanks for the KIP.
>>> > >>
>>> > >> It seems that we have forgotten to include this tool in KIP-499.
>>> > >> KAFKA-8507
>>> > >> is resolved
>>> > >> by this tool still uses the deprecated "--broker-list". I suggest to
>>> > >> include "--bootstrap-server"
>>> > >> in your public interfaces as well and fix this omission during the
>>> > >> implementation.
>>> > >>
>>> > >> +1 (non-binding)
>>> > >>
>>> > >> Thanks,
>>> > >> David
>>> > >>
>>> > >> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
>>> > >> kamal.chandraprak...@gmail.com> wrote:
>>> > >>
>>> > >> > +1 (non-binding), thanks for the KIP!
>>> > >> >
>>> > >> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar <
>>> manikumar.re...@gmail.com>
>>> > >> > wrote:
>>> > >> >
>>> > >> > > +1 (binding)
>>> > >> > >
>>> > >> > > Thanks for the KIP!
>>> > >> > >
>>> > >> > >
>>> > >> > >
>>> > >> > > On Thu, Jul 30, 2020 at 3:07 PM Dániel Urbán <
>>> urb.dani...@gmail.com
>>> > >
>>> > >> > > wrote:
>>> > >> > >
>>> > >> > > > Hi everyone,
>>> > >> > > >
>>> > >> > > > If you are interested in this KIP, please do not forget to
>>> vote.
>>> > >> > > >
>>> > >> > > > Thanks,
>>> > >> > > > Daniel
>>> > >> > > >
>>> > >> > > > Viktor Somogyi-Vass  ezt írta
>>> (időpont:
>>> > >> 2020.
>>> > >> > > > júl.
>>> > >> > > > 28., K, 16:06):
>>> > >> > > >
>>> > >> > > > > +1 from me (non-binding), thanks for the KIP.
>>> > >> > > > >
>>> > >> > > > > On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán <
>>> > >> urb.dani...@gmail.com
>>> > >> > >
>>> > >> > > > > wrote:
>>> > >> > > > >
>>> > >> > > > > > Hello everyone,
>>> > >> > > > > >
>>> > >> > > > > > I'd like to start a vote on KIP-635. The KIP enhances the
>>> > >> > > > GetOffsetShell
>>> > >> > > > > > tool by enabling querying multiple topic-partitions,
>>> adding
>>> > new
>>> > >> > > > filtering
>>> > >> > > > > > options, and adding a config override option.
>>> > >> > > > > >
>>> > >> > > > > >
>>> > >> > > > >
>>> > >> > > >
>>> > >> > >
>>> > >> >
>>> > >>
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
>>> > >> > > > > >
>>> > >> > > > > > The original discussion thread was named "[DISCUSS]
>>> KIP-308:
>>> > >> > > > > > GetOffsetShell: new KafkaConsumer API, support for
>>> multiple
>>> > >> topics,
>>> > >> > > > > > minimize the number of requests to server". The id had to
>>> be
>>> > >> > changed
>>> > >> > > as
>>> > >> > > > > > there was a collision, and the KIP also had to be
>>> renamed, as
>>> > >> some
>>> > >> > of
>>> > >> > > > its
>>> > >> > > > > > motivations were outdated.
>>> > >> > > > > >
>>> > >> > > > > > Thanks,
>>> > >> > > > > > Daniel
>>> > >> > > > > >
>>> > >> > > > >
>>> > >> > > >
>>> > >> > >
>>> > >> >
>>> > >>
>>> > >
>>> >
>>>
>>


Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-08-18 Thread Dániel Urbán
Hello everyone,

Please, if you are interested in this KIP and PR, don't forget to vote.

Thank you,
Daniel

Dániel Urbán  ezt írta (időpont: 2020. aug. 13., Cs,
14:00):

> Hi David,
>
> Thank you for the suggestion. KIP-635 was referencing the --broker-list
> issue, but based on your suggestion, I pinged the PR
> https://github.com/apache/kafka/pull/8123.
> Since I got no response, I updated KIP-635 to deprecate --broker-list.
> Will update the PR related to KIP-635 to reflect that change.
>
> Thanks,
> Daniel
>
> David Jacot  ezt írta (időpont: 2020. aug. 10., H,
> 20:48):
>
>> Hi Daniel,
>>
>> I was not aware of that PR. At minimum, I would add `--bootstrap-server`
>> to the list in the KIP for completeness. Regarding the implementation,
>> I would leave a comment in that PR asking if they plan to continue it. If
>> not,
>> we could do it as part of your PR directly.
>>
>> Cheers,
>> David
>>
>> On Mon, Aug 10, 2020 at 10:49 AM Dániel Urbán 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > Just a reminder, please vote if you are interested in this KIP being
>> > implemented.
>> >
>> > Thanks,
>> > Daniel
>> >
>> > Dániel Urbán  ezt írta (időpont: 2020. júl.
>> 31., P,
>> > 9:01):
>> >
>> > > Hi David,
>> > >
>> > > There is another PR linked on KAFKA-8507, which is still open:
>> > > https://github.com/apache/kafka/pull/8123
>> > > Wasn't sure if it will go in, and wanted to avoid conflicts. Do you
>> think
>> > > I should do the switch to '--bootstrap-server' anyway?
>> > >
>> > > Thanks,
>> > > Daniel
>> > >
>> > > David Jacot  ezt írta (időpont: 2020. júl. 30.,
>> Cs,
>> > > 17:52):
>> > >
>> > >> Hi Daniel,
>> > >>
>> > >> Thanks for the KIP.
>> > >>
>> > >> It seems that we have forgotten to include this tool in KIP-499.
>> > >> KAFKA-8507
>> > >> is resolved
>> > >> by this tool still uses the deprecated "--broker-list". I suggest to
>> > >> include "--bootstrap-server"
>> > >> in your public interfaces as well and fix this omission during the
>> > >> implementation.
>> > >>
>> > >> +1 (non-binding)
>> > >>
>> > >> Thanks,
>> > >> David
>> > >>
>> > >> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
>> > >> kamal.chandraprak...@gmail.com> wrote:
>> > >>
>> > >> > +1 (non-binding), thanks for the KIP!
>> > >> >
>> > >> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar <
>> manikumar.re...@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > +1 (binding)
>> > >> > >
>> > >> > > Thanks for the KIP!
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Thu, Jul 30, 2020 at 3:07 PM Dániel Urbán <
>> urb.dani...@gmail.com
>> > >
>> > >> > > wrote:
>> > >> > >
>> > >> > > > Hi everyone,
>> > >> > > >
>> > >> > > > If you are interested in this KIP, please do not forget to
>> vote.
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Daniel
>> > >> > > >
>> > >> > > > Viktor Somogyi-Vass  ezt írta
>> (időpont:
>> > >> 2020.
>> > >> > > > júl.
>> > >> > > > 28., K, 16:06):
>> > >> > > >
>> > >> > > > > +1 from me (non-binding), thanks for the KIP.
>> > >> > > > >
>> > >> > > > > On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán <
>> > >> urb.dani...@gmail.com
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > >
>> > >> > > > > > Hello everyone,
>> > >> > > > > >
>> > >> > > > > > I'd like to start a vote on KIP-635. The KIP enhances the
>> > >> > > > GetOffsetShell
>> > >> > > > > > tool by enabling querying multiple topic-partitions, adding
>> > new
>> > >> > > > filtering
>> > >> > > > > > options, and adding a config override option.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
>> > >> > > > > >
>> > >> > > > > > The original discussion thread was named "[DISCUSS]
>> KIP-308:
>> > >> > > > > > GetOffsetShell: new KafkaConsumer API, support for multiple
>> > >> topics,
>> > >> > > > > > minimize the number of requests to server". The id had to
>> be
>> > >> > changed
>> > >> > > as
>> > >> > > > > > there was a collision, and the KIP also had to be renamed,
>> as
>> > >> some
>> > >> > of
>> > >> > > > its
>> > >> > > > > > motivations were outdated.
>> > >> > > > > >
>> > >> > > > > > Thanks,
>> > >> > > > > > Daniel
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>


Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-08-13 Thread Dániel Urbán
Hi David,

Thank you for the suggestion. KIP-635 was referencing the --broker-list
issue, but based on your suggestion, I pinged the PR
https://github.com/apache/kafka/pull/8123.
Since I got no response, I updated KIP-635 to deprecate --broker-list. Will
update the PR related to KIP-635 to reflect that change.

Thanks,
Daniel

David Jacot  ezt írta (időpont: 2020. aug. 10., H,
20:48):

> Hi Daniel,
>
> I was not aware of that PR. At minimum, I would add `--bootstrap-server`
> to the list in the KIP for completeness. Regarding the implementation,
> I would leave a comment in that PR asking if they plan to continue it. If
> not,
> we could do it as part of your PR directly.
>
> Cheers,
> David
>
> On Mon, Aug 10, 2020 at 10:49 AM Dániel Urbán 
> wrote:
>
> > Hi everyone,
> >
> > Just a reminder, please vote if you are interested in this KIP being
> > implemented.
> >
> > Thanks,
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2020. júl. 31.,
> P,
> > 9:01):
> >
> > > Hi David,
> > >
> > > There is another PR linked on KAFKA-8507, which is still open:
> > > https://github.com/apache/kafka/pull/8123
> > > Wasn't sure if it will go in, and wanted to avoid conflicts. Do you
> think
> > > I should do the switch to '--bootstrap-server' anyway?
> > >
> > > Thanks,
> > > Daniel
> > >
> > > David Jacot  ezt írta (időpont: 2020. júl. 30.,
> Cs,
> > > 17:52):
> > >
> > >> Hi Daniel,
> > >>
> > >> Thanks for the KIP.
> > >>
> > >> It seems that we have forgotten to include this tool in KIP-499.
> > >> KAFKA-8507
> > >> is resolved
> > >> by this tool still uses the deprecated "--broker-list". I suggest to
> > >> include "--bootstrap-server"
> > >> in your public interfaces as well and fix this omission during the
> > >> implementation.
> > >>
> > >> +1 (non-binding)
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
> > >> kamal.chandraprak...@gmail.com> wrote:
> > >>
> > >> > +1 (non-binding), thanks for the KIP!
> > >> >
> > >> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar <
> manikumar.re...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > +1 (binding)
> > >> > >
> > >> > > Thanks for the KIP!
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Jul 30, 2020 at 3:07 PM Dániel Urbán <
> urb.dani...@gmail.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi everyone,
> > >> > > >
> > >> > > > If you are interested in this KIP, please do not forget to vote.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Daniel
> > >> > > >
> > >> > > > Viktor Somogyi-Vass  ezt írta
> (időpont:
> > >> 2020.
> > >> > > > júl.
> > >> > > > 28., K, 16:06):
> > >> > > >
> > >> > > > > +1 from me (non-binding), thanks for the KIP.
> > >> > > > >
> > >> > > > > On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán <
> > >> urb.dani...@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hello everyone,
> > >> > > > > >
> > >> > > > > > I'd like to start a vote on KIP-635. The KIP enhances the
> > >> > > > GetOffsetShell
> > >> > > > > > tool by enabling querying multiple topic-partitions, adding
> > new
> > >> > > > filtering
> > >> > > > > > options, and adding a config override option.
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
> > >> > > > > >
> > >> > > > > > The original discussion thread was named "[DISCUSS] KIP-308:
> > >> > > > > > GetOffsetShell: new KafkaConsumer API, support for multiple
> > >> topics,
> > >> > > > > > minimize the number of requests to server". The id had to be
> > >> > changed
> > >> > > as
> > >> > > > > > there was a collision, and the KIP also had to be renamed,
> as
> > >> some
> > >> > of
> > >> > > > its
> > >> > > > > > motivations were outdated.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Daniel
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-08-10 Thread Dániel Urbán
Hi everyone,

Just a reminder, please vote if you are interested in this KIP being
implemented.

Thanks,
Daniel

Dániel Urbán  ezt írta (időpont: 2020. júl. 31., P,
9:01):

> Hi David,
>
> There is another PR linked on KAFKA-8507, which is still open:
> https://github.com/apache/kafka/pull/8123
> Wasn't sure if it will go in, and wanted to avoid conflicts. Do you think
> I should do the switch to '--bootstrap-server' anyway?
>
> Thanks,
> Daniel
>
> David Jacot  ezt írta (időpont: 2020. júl. 30., Cs,
> 17:52):
>
>> Hi Daniel,
>>
>> Thanks for the KIP.
>>
>> It seems that we have forgotten to include this tool in KIP-499.
>> KAFKA-8507
>> is resolved
>> by this tool still uses the deprecated "--broker-list". I suggest to
>> include "--bootstrap-server"
>> in your public interfaces as well and fix this omission during the
>> implementation.
>>
>> +1 (non-binding)
>>
>> Thanks,
>> David
>>
>> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>> > +1 (non-binding), thanks for the KIP!
>> >
>> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar 
>> > wrote:
>> >
>> > > +1 (binding)
>> > >
>> > > Thanks for the KIP!
>> > >
>> > >
>> > >
>> > > On Thu, Jul 30, 2020 at 3:07 PM Dániel Urbán 
>> > > wrote:
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > If you are interested in this KIP, please do not forget to vote.
>> > > >
>> > > > Thanks,
>> > > > Daniel
>> > > >
>> > > > Viktor Somogyi-Vass  ezt írta (időpont:
>> 2020.
>> > > > júl.
>> > > > 28., K, 16:06):
>> > > >
>> > > > > +1 from me (non-binding), thanks for the KIP.
>> > > > >
>> > > > > On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán <
>> urb.dani...@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hello everyone,
>> > > > > >
>> > > > > > I'd like to start a vote on KIP-635. The KIP enhances the
>> > > > GetOffsetShell
>> > > > > > tool by enabling querying multiple topic-partitions, adding new
>> > > > filtering
>> > > > > > options, and adding a config override option.
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
>> > > > > >
>> > > > > > The original discussion thread was named "[DISCUSS] KIP-308:
>> > > > > > GetOffsetShell: new KafkaConsumer API, support for multiple
>> topics,
>> > > > > > minimize the number of requests to server". The id had to be
>> > changed
>> > > as
>> > > > > > there was a collision, and the KIP also had to be renamed, as
>> some
>> > of
>> > > > its
>> > > > > > motivations were outdated.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Daniel
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-07-31 Thread Dániel Urbán
Hi David,

There is another PR linked on KAFKA-8507, which is still open:
https://github.com/apache/kafka/pull/8123
Wasn't sure if it will go in, and wanted to avoid conflicts. Do you think I
should do the switch to '--bootstrap-server' anyway?

Thanks,
Daniel

David Jacot  ezt írta (időpont: 2020. júl. 30., Cs,
17:52):

> Hi Daniel,
>
> Thanks for the KIP.
>
> It seems that we have forgotten to include this tool in KIP-499. KAFKA-8507
> is resolved
> by this tool still uses the deprecated "--broker-list". I suggest to
> include "--bootstrap-server"
> in your public interfaces as well and fix this omission during the
> implementation.
>
> +1 (non-binding)
>
> Thanks,
> David
>
> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > +1 (non-binding), thanks for the KIP!
> >
> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks for the KIP!
> > >
> > >
> > >
> > > On Thu, Jul 30, 2020 at 3:07 PM Dániel Urbán 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > If you are interested in this KIP, please do not forget to vote.
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > > Viktor Somogyi-Vass  ezt írta (időpont:
> 2020.
> > > > júl.
> > > > 28., K, 16:06):
> > > >
> > > > > +1 from me (non-binding), thanks for the KIP.
> > > > >
> > > > > On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán <
> urb.dani...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > > I'd like to start a vote on KIP-635. The KIP enhances the
> > > > GetOffsetShell
> > > > > > tool by enabling querying multiple topic-partitions, adding new
> > > > filtering
> > > > > > options, and adding a config override option.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
> > > > > >
> > > > > > The original discussion thread was named "[DISCUSS] KIP-308:
> > > > > > GetOffsetShell: new KafkaConsumer API, support for multiple
> topics,
> > > > > > minimize the number of requests to server". The id had to be
> > changed
> > > as
> > > > > > there was a collision, and the KIP also had to be renamed, as
> some
> > of
> > > > its
> > > > > > motivations were outdated.
> > > > > >
> > > > > > Thanks,
> > > > > > Daniel
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-07-30 Thread Dániel Urbán
Hi everyone,

If you are interested in this KIP, please do not forget to vote.

Thanks,
Daniel

Viktor Somogyi-Vass  ezt írta (időpont: 2020. júl.
28., K, 16:06):

> +1 from me (non-binding), thanks for the KIP.
>
> On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán 
> wrote:
>
> > Hello everyone,
> >
> > I'd like to start a vote on KIP-635. The KIP enhances the GetOffsetShell
> > tool by enabling querying multiple topic-partitions, adding new filtering
> > options, and adding a config override option.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
> >
> > The original discussion thread was named "[DISCUSS] KIP-308:
> > GetOffsetShell: new KafkaConsumer API, support for multiple topics,
> > minimize the number of requests to server". The id had to be changed as
> > there was a collision, and the KIP also had to be renamed, as some of its
> > motivations were outdated.
> >
> > Thanks,
> > Daniel
> >
>


[VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-07-27 Thread Dániel Urbán
Hello everyone,

I'd like to start a vote on KIP-635. The KIP enhances the GetOffsetShell
tool by enabling querying multiple topic-partitions, adding new filtering
options, and adding a config override option.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override

The original discussion thread was named "[DISCUSS] KIP-308:
GetOffsetShell: new KafkaConsumer API, support for multiple topics,
minimize the number of requests to server". The id had to be changed as
there was a collision, and the KIP also had to be renamed, as some of its
motivations were outdated.

Thanks,
Daniel


Re: 回复: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server

2020-07-21 Thread Dániel Urbán
Hi,

I've updated the PR based on the discussion and the comments on the PR.
If there are no more issues, I'll start a vote in a few days.

Thanks,
Daniel

wang120445...@sina.com  ezt írta (időpont: 2020.
júl. 1., Sze, 3:26):

> maybe it just likes RBAC’s  show tables;
>
>
>
> wang120445...@sina.com
>
> 发件人: Hu Xi
> 发送时间: 2020-06-30 23:04
> 收件人: dev@kafka.apache.org
> 主题: 回复: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support
> for multiple topics, minimize the number of requests to server
> That's a great KIP for GetOffsetShell tool. I have a question about the
> multiple-topic lookup situation.
>
> In a secured environment, what does the tool output if it has DESCRIBE
> privileges for some topics but hasn't for others?
>
> 
> 发件人: Dániel Urbán 
> 发送时间: 2020年6月30日 22:15
> 收件人: dev@kafka.apache.org 
> 主题: Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support
> for multiple topics, minimize the number of requests to server
>
> Hi Manikumar,
> Thanks, went ahead and assigned a new ID, it is KIP-635 now:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
> Daniel
>
> Manikumar  ezt írta (időpont: 2020. jún. 30.,
> K,
> 16:03):
>
> > Hi,
> >
> > Yes, we can assign new id to this KIP.
> >
> > Thanks.
> >
> > On Tue, Jun 30, 2020 at 6:59 PM Dániel Urbán 
> > wrote:
> >
> > > Hi,
> > >
> > > To help with the discussion, I also have a PR for this KIP now.
> > reflecting
> > > the current state of the KIP:
> https://github.com/apache/kafka/pull/8957.
> > > I would like to ask a committer to start the test job on it.
> > >
> > > One thing I realised though is that there is a KIP id collision, there
> is
> > > another KIP with the same id:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85474993
> > > What is the protocol in this case? Should I acquire a new id for the
> > > GetOffsetShell KIP, and update it?
> > >
> > > Thanks in advance,
> > > Daniel
> > >
> > > Dániel Urbán  ezt írta (időpont: 2020.
> jún.
> > > 30., K, 9:23):
> > >
> > > > Hi Manikumar,
> > > >
> > > > Thanks for the comments.
> > > > 1. Will change this - thought that "command-config" is used for admin
> > > > clients.
> > > > 2. It's not necessary, just felt like a nice quality-of-life feature
> -
> > > will
> > > > remove it.
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > > On Tue, Jun 30, 2020 at 4:16 AM Manikumar  >
> > > > wrote:
> > > >
> > > > > Hi Daniel,
> > > > >
> > > > > Thanks for working on this KIP.  Proposed changes looks good to me,
> > > > >
> > > > > minor comments:
> > > > > 1. We use "command-config" option name in most of the cmdline tools
> > to
> > > > pass
> > > > > config
> > > > > properties file. We can use the same name here.
> > > > >
> > > > > 2. Not sure, if we need a separate option to pass an consumer
> > property.
> > > > > fewer options are better.
> > > > >
> > > > > Thanks,
> > > > > Manikumar
> > > > >
> > > > > On Wed, Jun 24, 2020 at 8:53 PM Dániel Urbán <
> urb.dani...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I see that this KIP turned somewhat inactive - I'd like to pick
> it
> > up
> > > > and
> > > > > > work on it if it is okay.
> > > > > > Part of the work is done, as switching to the Consumer API is
> > already
> > > > in
> > > > > > trunk, but some functionality is still missing.
> > > > > >
> > > > > > I've seen the current PR and the discussion so far, only have a
> few
> > > > > things
> > > > > > to add:
> > > > > > - I like the idea of the topic-partition argument, it would be
> > useful
> > > > to
> > > > > > filter down to specific partitions.
> > > > > > - Instead of a topic list arg, a pattern would be more powerful,
> > and
> > > > also
> > > > > > fit better with the other tools (e.g. how the kafka-topics tool
> > > works).
> > > > > >
> > > > > > Regards,
> > > > > > Daniel
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server

2020-06-30 Thread Dániel Urbán
That's a good question. In the PR I submitted, it would result in a list of
partitions contained by a topic for which the user has DESCRIBE privilege.
The tool utilizes Consumer.listTopics, so unauthorized topics are not
present in the response at all. The current version in trunk simply reports
that the topic does not exist if the user has no DESCRIBE privilege for it.

It would be hard (impossible?) to support detailed information about
unauthorized topics when using a pattern as a filter. It could be
manageable if the tool only supported a list of topics.

Maybe the only improvement needed is to explicitly document that the tool
only scans the authorized topics?

Hu Xi  ezt írta (időpont: 2020. jún. 30., K, 17:04):

> That's a great KIP for GetOffsetShell tool. I have a question about the
> multiple-topic lookup situation.
>
> In a secured environment, what does the tool output if it has DESCRIBE
> privileges for some topics but hasn't for others?
>
> ________
> 发件人: Dániel Urbán 
> 发送时间: 2020年6月30日 22:15
> 收件人: dev@kafka.apache.org 
> 主题: Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support
> for multiple topics, minimize the number of requests to server
>
> Hi Manikumar,
> Thanks, went ahead and assigned a new ID, it is KIP-635 now:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
> Daniel
>
> Manikumar  ezt írta (időpont: 2020. jún. 30.,
> K,
> 16:03):
>
> > Hi,
> >
> > Yes, we can assign new id to this KIP.
> >
> > Thanks.
> >
> > On Tue, Jun 30, 2020 at 6:59 PM Dániel Urbán 
> > wrote:
> >
> > > Hi,
> > >
> > > To help with the discussion, I also have a PR for this KIP now.
> > reflecting
> > > the current state of the KIP:
> https://github.com/apache/kafka/pull/8957.
> > > I would like to ask a committer to start the test job on it.
> > >
> > > One thing I realised though is that there is a KIP id collision, there
> is
> > > another KIP with the same id:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85474993
> > > What is the protocol in this case? Should I acquire a new id for the
> > > GetOffsetShell KIP, and update it?
> > >
> > > Thanks in advance,
> > > Daniel
> > >
> > > Dániel Urbán  ezt írta (időpont: 2020.
> jún.
> > > 30., K, 9:23):
> > >
> > > > Hi Manikumar,
> > > >
> > > > Thanks for the comments.
> > > > 1. Will change this - thought that "command-config" is used for admin
> > > > clients.
> > > > 2. It's not necessary, just felt like a nice quality-of-life feature
> -
> > > will
> > > > remove it.
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > > On Tue, Jun 30, 2020 at 4:16 AM Manikumar  >
> > > > wrote:
> > > >
> > > > > Hi Daniel,
> > > > >
> > > > > Thanks for working on this KIP.  Proposed changes looks good to me,
> > > > >
> > > > > minor comments:
> > > > > 1. We use "command-config" option name in most of the cmdline tools
> > to
> > > > pass
> > > > > config
> > > > > properties file. We can use the same name here.
> > > > >
> > > > > 2. Not sure, if we need a separate option to pass an consumer
> > property.
> > > > > fewer options are better.
> > > > >
> > > > > Thanks,
> > > > > Manikumar
> > > > >
> > > > > On Wed, Jun 24, 2020 at 8:53 PM Dániel Urbán <
> urb.dani...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I see that this KIP turned somewhat inactive - I'd like to pick
> it
> > up
> > > > and
> > > > > > work on it if it is okay.
> > > > > > Part of the work is done, as switching to the Consumer API is
> > already
> > > > in
> > > > > > trunk, but some functionality is still missing.
> > > > > >
> > > > > > I've seen the current PR and the discussion so far, only have a
> few
> > > > > things
> > > > > > to add:
> > > > > > - I like the idea of the topic-partition argument, it would be
> > useful
> > > > to
> > > > > > filter down to specific partitions.
> > > > > > - Instead of a topic list arg, a pattern would be more powerful,
> > and
> > > > also
> > > > > > fit better with the other tools (e.g. how the kafka-topics tool
> > > works).
> > > > > >
> > > > > > Regards,
> > > > > > Daniel
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server

2020-06-30 Thread Dániel Urbán
Hi Manikumar,
Thanks, went ahead and assigned a new ID, it is KIP-635 now:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-635%3A+GetOffsetShell%3A+support+for+multiple+topics+and+consumer+configuration+override
Daniel

Manikumar  ezt írta (időpont: 2020. jún. 30., K,
16:03):

> Hi,
>
> Yes, we can assign new id to this KIP.
>
> Thanks.
>
> On Tue, Jun 30, 2020 at 6:59 PM Dániel Urbán 
> wrote:
>
> > Hi,
> >
> > To help with the discussion, I also have a PR for this KIP now.
> reflecting
> > the current state of the KIP: https://github.com/apache/kafka/pull/8957.
> > I would like to ask a committer to start the test job on it.
> >
> > One thing I realised though is that there is a KIP id collision, there is
> > another KIP with the same id:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85474993
> > What is the protocol in this case? Should I acquire a new id for the
> > GetOffsetShell KIP, and update it?
> >
> > Thanks in advance,
> > Daniel
> >
> > Dániel Urbán  ezt írta (időpont: 2020. jún.
> > 30., K, 9:23):
> >
> > > Hi Manikumar,
> > >
> > > Thanks for the comments.
> > > 1. Will change this - thought that "command-config" is used for admin
> > > clients.
> > > 2. It's not necessary, just felt like a nice quality-of-life feature -
> > will
> > > remove it.
> > >
> > > Thanks,
> > > Daniel
> > >
> > > On Tue, Jun 30, 2020 at 4:16 AM Manikumar 
> > > wrote:
> > >
> > > > Hi Daniel,
> > > >
> > > > Thanks for working on this KIP.  Proposed changes looks good to me,
> > > >
> > > > minor comments:
> > > > 1. We use "command-config" option name in most of the cmdline tools
> to
> > > pass
> > > > config
> > > > properties file. We can use the same name here.
> > > >
> > > > 2. Not sure, if we need a separate option to pass an consumer
> property.
> > > > fewer options are better.
> > > >
> > > > Thanks,
> > > > Manikumar
> > > >
> > > > On Wed, Jun 24, 2020 at 8:53 PM Dániel Urbán 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I see that this KIP turned somewhat inactive - I'd like to pick it
> up
> > > and
> > > > > work on it if it is okay.
> > > > > Part of the work is done, as switching to the Consumer API is
> already
> > > in
> > > > > trunk, but some functionality is still missing.
> > > > >
> > > > > I've seen the current PR and the discussion so far, only have a few
> > > > things
> > > > > to add:
> > > > > - I like the idea of the topic-partition argument, it would be
> useful
> > > to
> > > > > filter down to specific partitions.
> > > > > - Instead of a topic list arg, a pattern would be more powerful,
> and
> > > also
> > > > > fit better with the other tools (e.g. how the kafka-topics tool
> > works).
> > > > >
> > > > > Regards,
> > > > > Daniel
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server

2020-06-30 Thread Dániel Urbán
Hi,

To help with the discussion, I also have a PR for this KIP now. reflecting
the current state of the KIP: https://github.com/apache/kafka/pull/8957.
I would like to ask a committer to start the test job on it.

One thing I realised though is that there is a KIP id collision, there is
another KIP with the same id:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85474993
What is the protocol in this case? Should I acquire a new id for the
GetOffsetShell KIP, and update it?

Thanks in advance,
Daniel

Dániel Urbán  ezt írta (időpont: 2020. jún.
30., K, 9:23):

> Hi Manikumar,
>
> Thanks for the comments.
> 1. Will change this - thought that "command-config" is used for admin
> clients.
> 2. It's not necessary, just felt like a nice quality-of-life feature - will
> remove it.
>
> Thanks,
> Daniel
>
> On Tue, Jun 30, 2020 at 4:16 AM Manikumar 
> wrote:
>
> > Hi Daniel,
> >
> > Thanks for working on this KIP.  Proposed changes looks good to me,
> >
> > minor comments:
> > 1. We use "command-config" option name in most of the cmdline tools to
> pass
> > config
> > properties file. We can use the same name here.
> >
> > 2. Not sure, if we need a separate option to pass an consumer property.
> > fewer options are better.
> >
> > Thanks,
> > Manikumar
> >
> > On Wed, Jun 24, 2020 at 8:53 PM Dániel Urbán 
> > wrote:
> >
> > > Hi,
> > >
> > > I see that this KIP turned somewhat inactive - I'd like to pick it up
> and
> > > work on it if it is okay.
> > > Part of the work is done, as switching to the Consumer API is already
> in
> > > trunk, but some functionality is still missing.
> > >
> > > I've seen the current PR and the discussion so far, only have a few
> > things
> > > to add:
> > > - I like the idea of the topic-partition argument, it would be useful
> to
> > > filter down to specific partitions.
> > > - Instead of a topic list arg, a pattern would be more powerful, and
> also
> > > fit better with the other tools (e.g. how the kafka-topics tool works).
> > >
> > > Regards,
> > > Daniel
> > >
> >
>


Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server

2020-06-30 Thread Dániel Urbán
Hi Manikumar,

Thanks for the comments.
1. Will change this - thought that "command-config" is used for admin
clients.
2. It's not necessary, just felt like a nice quality-of-life feature - will
remove it.

Thanks,
Daniel

On Tue, Jun 30, 2020 at 4:16 AM Manikumar  wrote:

> Hi Daniel,
>
> Thanks for working on this KIP.  Proposed changes looks good to me,
>
> minor comments:
> 1. We use "command-config" option name in most of the cmdline tools to pass
> config
> properties file. We can use the same name here.
>
> 2. Not sure, if we need a separate option to pass an consumer property.
> fewer options are better.
>
> Thanks,
> Manikumar
>
> On Wed, Jun 24, 2020 at 8:53 PM Dániel Urbán 
> wrote:
>
> > Hi,
> >
> > I see that this KIP turned somewhat inactive - I'd like to pick it up and
> > work on it if it is okay.
> > Part of the work is done, as switching to the Consumer API is already in
> > trunk, but some functionality is still missing.
> >
> > I've seen the current PR and the discussion so far, only have a few
> things
> > to add:
> > - I like the idea of the topic-partition argument, it would be useful to
> > filter down to specific partitions.
> > - Instead of a topic list arg, a pattern would be more powerful, and also
> > fit better with the other tools (e.g. how the kafka-topics tool works).
> >
> > Regards,
> > Daniel
> >
>


Re: [DISCUSS] KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server

2020-06-24 Thread Dániel Urbán
Hi,

I see that this KIP turned somewhat inactive - I'd like to pick it up and
work on it if it is okay.
Part of the work is done, as switching to the Consumer API is already in
trunk, but some functionality is still missing.

I've seen the current PR and the discussion so far, only have a few things
to add:
- I like the idea of the topic-partition argument, it would be useful to
filter down to specific partitions.
- Instead of a topic list arg, a pattern would be more powerful, and also
fit better with the other tools (e.g. how the kafka-topics tool works).

Regards,
Daniel


contributor request

2020-06-24 Thread Dániel Urbán
Hi,
I'd like to work on some KIPs. Can you please add me to the contributors?
My username is durban on both JIRA and Confluence.
Thanks in advance,
Daniel