from:"Jorge Esteban Quilcate Otoya"

Re: [DISCUSS] KIP-1046: Expose producer.id.expiration.check.interval.ms as dynamic broker configuration

2024-05-16 Thread Jorge Esteban Quilcate Otoya

Thanks Justine. I have updated the KIP with the configuration details.

On Thu, 16 May 2024 at 21:14, Justine Olshan 
wrote:

> Hey Jorge,
>
> Thanks for the KIP. I didn't realize until I read the details that this
> configuration is currently not public at all. I think it is still ok that
> we are exposing the value though. Can we just include some information
> about the current default, the documentation etc that is already defined as
> this will now become part of the public documentation?
>
> Thanks,
> Justine
>
> On Thu, May 16, 2024 at 10:23 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi dev team,
> >
> > I'd like to start a discussion thread for KIP-1046:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1046%3A+Expose+producer.id.expiration.check.interval.ms+as+dynamic+broker+configuration
> >
> > This KIP aims to align how tuning configurations for Producer ID
> expiration
> > checks are exposed.
> >
> > Looking forward to your feedback.
> >
> > Cheers,
> > Jorge.
> >
>

[DISCUSS] KIP-1046: Expose producer.id.expiration.check.interval.ms as dynamic broker configuration

2024-05-16 Thread Jorge Esteban Quilcate Otoya

Hi dev team,

I'd like to start a discussion thread for KIP-1046:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1046%3A+Expose+producer.id.expiration.check.interval.ms+as+dynamic+broker+configuration

This KIP aims to align how tuning configurations for Producer ID expiration
checks are exposed.

Looking forward to your feedback.

Cheers,
Jorge.

[jira] [Resolved] (KAFKA-16685) RLMTask warning logs do not include parent exception trace

2024-05-08 Thread Jorge Esteban Quilcate Otoya (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-16685.
--
Resolution: Fixed

Merged https://github.com/apache/kafka/pull/15880

> RLMTask warning logs do not include parent exception trace
> --
>
> Key: KAFKA-16685
> URL: https://issues.apache.org/jira/browse/KAFKA-16685
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>        Reporter: Jorge Esteban Quilcate Otoya
>    Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> When RLMTask warning exceptions happen and are logged, it only includes the 
> exception message, but we lose the stack trace.
> See 
> [https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]
> This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16691) Support for nested structures: TimestampConverter

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-16691:


 Summary: Support for nested structures: TimestampConverter
 Key: KAFKA-16691
 URL: https://issues.apache.org/jira/browse/KAFKA-16691
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16690) Support for nested structures: HeaderFrom

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-16690:


 Summary: Support for nested structures: HeaderFrom
 Key: KAFKA-16690
 URL: https://issues.apache.org/jira/browse/KAFKA-16690
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14226) Introduce support for nested structures

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-14226.
--
Resolution: Fixed

Merged: https://github.com/apache/kafka/pull/15379

> Introduce support for nested structures
> ---
>
> Key: KAFKA-14226
> URL: https://issues.apache.org/jira/browse/KAFKA-14226
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: Jorge Esteban Quilcate Otoya
>        Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Abstraction for FieldPath and initial SMTs:
>  * ExtractField
>  * HeaderFrom
>  * TimestampConverter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16685) RSM Task warn logs do not include parent exception trace

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-16685:


 Summary: RSM Task warn logs do not include parent exception trace
 Key: KAFKA-16685
 URL: https://issues.apache.org/jira/browse/KAFKA-16685
 Project: Kafka
  Issue Type: Improvement
  Components: Tiered-Storage
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


When RSMTask exceptions happen and are logged, it only includes the exception 
message, but we lose the stack trace.

See 
[https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]

This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New Kafka PMC Member: Greg Harris

2024-04-14 Thread Jorge Esteban Quilcate Otoya

Congrats, Greg!!

On Sun 14. Apr 2024 at 15.05, Josep Prat 
wrote:

> Congrats Greg!!!
>
>
> Best,
>
> Josep Prat
> Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> +491715557497 | aiven.io
> Aiven Deutschland GmbH
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>
> On Sun, Apr 14, 2024, 12:30 Divij Vaidya  wrote:
>
> > Congratulations Greg!
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sun, Apr 14, 2024 at 6:39 AM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Congratulations, Greg!
> > >
> > > On Sun, Apr 14, 2024 at 8:57 AM Yash Mayya 
> wrote:
> > >
> > > > Congrats Greg!
> > > >
> > > > On Sun, 14 Apr, 2024, 05:56 Randall Hauch,  wrote:
> > > >
> > > > > Congratulations, Greg!
> > > > >
> > > > > On Sat, Apr 13, 2024 at 6:36 PM Luke Chen 
> wrote:
> > > > >
> > > > > > Congrats, Greg!
> > > > > >
> > > > > > On Sun, Apr 14, 2024 at 7:05 AM Viktor Somogyi-Vass
> > > > > >  wrote:
> > > > > >
> > > > > > > Congrats Greg! :)
> > > > > > >
> > > > > > > On Sun, Apr 14, 2024, 00:35 Bill Bejeck 
> > wrote:
> > > > > > >
> > > > > > > > Congrats Greg!
> > > > > > > >
> > > > > > > > -Bill
> > > > > > > >
> > > > > > > > On Sat, Apr 13, 2024 at 4:25 PM Boudjelda Mohamed Said <
> > > > > > > bmsc...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congratulations Greg
> > > > > > > > >
> > > > > > > > > On Sat 13 Apr 2024 at 20:42, Chris Egerton <
> > > ceger...@apache.org>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > Greg Harris has been a Kafka committer since July 2023.
> He
> > > has
> > > > > > > remained
> > > > > > > > > > very active and instructive in the community since
> > becoming a
> > > > > > > > committer.
> > > > > > > > > > It's my pleasure to announce that Greg is now a member of
> > > Kafka
> > > > > > PMC.
> > > > > > > > > >
> > > > > > > > > > Congratulations, Greg!
> > > > > > > > > >
> > > > > > > > > > Chris, on behalf of the Apache Kafka PMC
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] New committer: Christo Lolov

2024-03-26 Thread Jorge Esteban Quilcate Otoya

Congrats Christo!!

On Tue 26. Mar 2024 at 14.33, Apoorv Mittal 
wrote:

> Congrats Christo!
>
> Regards,
> Apoorv Mittal
> +44 7721681581
>
>
> On Tue, Mar 26, 2024 at 12:05 PM Luke Chen  wrote:
>
> > Hi, Everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer:
> > Christo Lolov.
> >
> > Christo has been a Kafka contributor since 2021. He has made over 50
> > commits. He authored KIP-902, KIP-963, and KIP-1005, as well as many
> tiered
> > storage related tasks. He also co-drives the migration from EasyMock to
> > Mockito and from Junit 4 to JUnit 5.
> >
> > Congratulations, Christo!
> >
> > Thanks,
> > Luke (on behalf of the Apache Kafka PMC)
> >
>

Re: [DISCUSS] Different retention semantics for active segment rotation

2024-03-21 Thread Jorge Esteban Quilcate Otoya

Sure! good to know that is tracked.

Thanks, Luke!

On Thu, 21 Mar 2024 at 07:52, Luke Chen  wrote:

> Hi Jorge,
>
> You should check the JIRA:
> https://issues.apache.org/jira/browse/KAFKA-16385
> where we had some discussion.
> Welcome to provide your thoughts there.
>
> Thanks.
> Luke
>
> On Thu, Mar 21, 2024 at 3:33 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi dev community,
> >
> > I'd like to share some findings on how rotation of active segment differ
> > depending on whether topic retention is time- or size-based.
> >
> > I was (wrongly) under the assumption that active segments are only
> rotated
> > when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global
> > log configs (log.roll.ms) force it  -- regardless of the retention
> > configuration.
> > This seems to be different depending on how retention is defined:
> >
> > - If a topic has a retention based on time[1], the condition to rotate
> the
> > active segment is based on the latest timestamp. If the difference with
> > current time is largest than retention time, then segment (including
> > active) should be deleted. Active segment is rotated, and in next round
> is
> > deleted.
> >
> > - If a topic has retention based on size[2] though, the condition not
> only
> > depends on the size of the segment itself but first on the total log
> size,
> > forcing to always have at least a single (active) segment: first
> difference
> > between total log size and retention is calculated, let's say a single
> > segment of 5MB and retention is 1MB; then total difference is 4MB, then
> the
> > condition to delete validates if the difference of the current segment
> and
> > the total difference is higher than zero, then delete. As the segment
> size
> > will always be higher than the total difference when there is a single
> > segment, then there will always be at least 1 segment. In this case the
> > only case where active segment is rotated it is when a new message
> arrives.
> >
> > Added steps to reproduce[3].
> >
> > Maybe I missing something obvious, but this seems inconsistent to me.
> > Either both retention configs should rotate active segments, or none of
> > them should and active segment should be only governed by segment
> bytes|ms
> > configs or log.roll config.
> >
> > I believe it's a useful feature to "force" active segment rotation
> without
> > changing segment of global log rotation given that features like
> Compaction
> > and Tiered Storage can benefit from this; but would like to clarify this
> > behavior and make it consistent for both retention options, and/or call
> it
> > out explicitly in the documentation.
> >
> > Looking forward to your feedback!
> >
> > Jorge.
> >
> > [1]:
> >
> >
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566
> > [2]:
> >
> >
> https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583
> >
> > [3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2
> >
>

[DISCUSS] Different retention semantics for active segment rotation

2024-03-21 Thread Jorge Esteban Quilcate Otoya

Hi dev community,

I'd like to share some findings on how rotation of active segment differ
depending on whether topic retention is time- or size-based.

I was (wrongly) under the assumption that active segments are only rotated
when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global
log configs (log.roll.ms) force it -- regardless of the retention
configuration.
This seems to be different depending on how retention is defined:

- If a topic has a retention based on time[1], the condition to rotate the
active segment is based on the latest timestamp. If the difference with
current time is largest than retention time, then segment (including
active) should be deleted. Active segment is rotated, and in next round is
deleted.

- If a topic has retention based on size[2] though, the condition not only
depends on the size of the segment itself but first on the total log size,
forcing to always have at least a single (active) segment: first difference
between total log size and retention is calculated, let's say a single
segment of 5MB and retention is 1MB; then total difference is 4MB, then the
condition to delete validates if the difference of the current segment and
the total difference is higher than zero, then delete. As the segment size
will always be higher than the total difference when there is a single
segment, then there will always be at least 1 segment. In this case the
only case where active segment is rotated it is when a new message arrives.

Added steps to reproduce[3].

Maybe I missing something obvious, but this seems inconsistent to me.
Either both retention configs should rotate active segments, or none of
them should and active segment should be only governed by segment bytes|ms
configs or log.roll config.

I believe it's a useful feature to "force" active segment rotation without
changing segment of global log rotation given that features like Compaction
and Tiered Storage can benefit from this; but would like to clarify this
behavior and make it consistent for both retention options, and/or call it
out explicitly in the documentation.

Looking forward to your feedback!

Jorge.

[1]:
https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566
[2]:
https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583

[3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2

Re: [VOTE] KIP-956: Tiered Storage Quotas

2024-03-19 Thread Jorge Esteban Quilcate Otoya

Thanks Abhjeet! Looking forward for this one.
+1 (non-binding).

On Thu, 14 Mar 2024 at 06:08, Luke Chen  wrote:

> Thanks for the KIP!
> +1 from me.
>
> Luke
>
> On Sun, Mar 10, 2024 at 8:44 AM Satish Duggana 
> wrote:
>
> > Thanks Abhijeet for the KIP, +1 from me.
> >
> >
> > On Sat, 9 Mar 2024 at 1:51 AM, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > +1 (non-binding), Thanks for the KIP, Abhijeet!
> > >
> > > --
> > > Kamal
> > >
> > > On Fri, Mar 8, 2024 at 11:02 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the KIP. +1
> > > >
> > > > Jun
> > > >
> > > > On Fri, Mar 8, 2024 at 3:44 AM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start the vote for KIP-956 - Tiered Storage Quotas
> > > > >
> > > > > The KIP is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > > >
> > > > > Regards.
> > > > > Abhijeet.
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-19 Thread Jorge Esteban Quilcate Otoya

Sorry I missed that comment on the thread. Proposal looks great, thanks,
Abhijeet!

On Sat, 16 Mar 2024 at 13:19, Abhijeet Kumar 
wrote:

> Hi Jorge,
>
> The configs name was chosen to keep it consistent with the other existing
> quota configs, such as
> *replica.alter.log.dirs.io.max.bytes.per.second* as pointed out by Jun in
> the thread.
>
> Also, we can revisit the names of the components during implementation,
> since those are not exposed to the user.
>
> Please let me know if you have any further concerns.
>
> Regards,
> Abhijeet.
>
>
>
> On Mon, Mar 11, 2024 at 6:11 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the KIP! Looks good to me. I just have a minor comments on
> > naming:
> >
> > Would it be work to align the config names to existing quota names?
> > e.g. `remote.log.manager.copy.byte.rate.quota` (or similar) instead of
> > `remote.log.manager.copy.max.bytes.per.second`?
> >
> > Same for new components, could we use the same verbs as in the configs:
> > - RLMCopyQuotaManager
> > - RLMFetchQuotaManager
> >
> >
> > On Fri, 8 Mar 2024 at 13:43, Abhijeet Kumar 
> > wrote:
> >
> > > Thank you all for your comments. As all the comments in the thread are
> > > addressed, I am starting a Vote thread for the KIP. Please have a look.
> > >
> > > Regards.
> > >
> > > On Thu, Mar 7, 2024 at 12:34 PM Luke Chen  wrote:
> > >
> > > > Hi Abhijeet,
> > > >
> > > > Thanks for the update and the explanation.
> > > > I had another look, and it LGTM now!
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > > On Tue, Mar 5, 2024 at 2:50 AM Jun Rao 
> > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the reply. Sounds good to me.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Sat, Mar 2, 2024 at 7:40 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for pointing it out. It makes sense to me. We can have the
> > > > > following
> > > > > > metrics instead. What do you think?
> > > > > >
> > > > > >- remote-(fetch|copy)-throttle-time-avg (The average time in
> ms
> > > > remote
> > > > > >fetches/copies was throttled by a broker)
> > > > > >- remote-(fetch|copy)-throttle-time--max (The maximum time in
> ms
> > > > > remote
> > > > > >fetches/copies was throttled by a broker)
> > > > > >
> > > > > > These are similar to fetch-throttle-time-avg and
> > > > fetch-throttle-time-max
> > > > > > metrics we have for Kafak Consumers?
> > > > > > The Avg and Max are computed over the (sliding) window as defined
> > by
> > > > the
> > > > > > configuration metrics.sample.window.ms on the server.
> > > > > >
> > > > > > (Also, I will update the config and metric names to be
> consistent)
> > > > > >
> > > > > > Regards.
> > > > > >
> > > > > > On Thu, Feb 29, 2024 at 2:51 AM Jun Rao  >
> > > > > wrote:
> > > > > >
> > > > > > > Hi, Abhijeet,
> > > > > > >
> > > > > > > Thanks for the reply.
> > > > > > >
> > > > > > > The issue with recording the throttle time as a gauge is that
> > it's
> > > > > > > transient. If the metric is not read immediately, the recorded
> > > value
> > > > > > could
> > > > > > > be reset to 0. The admin won't realize that throttling has
> > > happened.
> > > > > > >
> > > > > > > For client quotas, the throttle time is tracked as the average
> > > > > > > throttle-time per user/client-id. This makes the metric less
> > > > transient.
> > > > > > >
> > > > > > > Also, the configs use read/write whereas the metrics use
> > > fetch/copy.
> > > > > > Could
> > > > > > >

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-11 Thread Jorge Esteban Quilcate Otoya

Hi Abhijeet,

Thanks for the KIP! Looks good to me. I just have a minor comments on
naming:

Would it be work to align the config names to existing quota names?
e.g. `remote.log.manager.copy.byte.rate.quota` (or similar) instead of
`remote.log.manager.copy.max.bytes.per.second`?

Same for new components, could we use the same verbs as in the configs:
- RLMCopyQuotaManager
- RLMFetchQuotaManager


On Fri, 8 Mar 2024 at 13:43, Abhijeet Kumar 
wrote:

> Thank you all for your comments. As all the comments in the thread are
> addressed, I am starting a Vote thread for the KIP. Please have a look.
>
> Regards.
>
> On Thu, Mar 7, 2024 at 12:34 PM Luke Chen  wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the update and the explanation.
> > I had another look, and it LGTM now!
> >
> > Thanks.
> > Luke
> >
> > On Tue, Mar 5, 2024 at 2:50 AM Jun Rao  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the reply. Sounds good to me.
> > >
> > > Jun
> > >
> > >
> > > On Sat, Mar 2, 2024 at 7:40 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for pointing it out. It makes sense to me. We can have the
> > > following
> > > > metrics instead. What do you think?
> > > >
> > > >- remote-(fetch|copy)-throttle-time-avg (The average time in ms
> > remote
> > > >fetches/copies was throttled by a broker)
> > > >- remote-(fetch|copy)-throttle-time--max (The maximum time in ms
> > > remote
> > > >fetches/copies was throttled by a broker)
> > > >
> > > > These are similar to fetch-throttle-time-avg and
> > fetch-throttle-time-max
> > > > metrics we have for Kafak Consumers?
> > > > The Avg and Max are computed over the (sliding) window as defined by
> > the
> > > > configuration metrics.sample.window.ms on the server.
> > > >
> > > > (Also, I will update the config and metric names to be consistent)
> > > >
> > > > Regards.
> > > >
> > > > On Thu, Feb 29, 2024 at 2:51 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > The issue with recording the throttle time as a gauge is that it's
> > > > > transient. If the metric is not read immediately, the recorded
> value
> > > > could
> > > > > be reset to 0. The admin won't realize that throttling has
> happened.
> > > > >
> > > > > For client quotas, the throttle time is tracked as the average
> > > > > throttle-time per user/client-id. This makes the metric less
> > transient.
> > > > >
> > > > > Also, the configs use read/write whereas the metrics use
> fetch/copy.
> > > > Could
> > > > > we make them consistent?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Feb 28, 2024 at 6:49 AM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Clarified the meaning of the two metrics. Also updated the KIP.
> > > > > >
> > > > > > kafka.log.remote:type=RemoteLogManager,
> > name=RemoteFetchThrottleTime
> > > ->
> > > > > The
> > > > > > duration of time required at a given moment to bring the observed
> > > fetch
> > > > > > rate within the allowed limit, by preventing further reads.
> > > > > > kafka.log.remote:type=RemoteLogManager,
> name=RemoteCopyThrottleTime
> > > ->
> > > > > The
> > > > > > duration of time required at a given moment to bring the observed
> > > > remote
> > > > > > copy rate within the allowed limit, by preventing further copies.
> > > > > >
> > > > > > Regards.
> > > > > >
> > > > > > On Wed, Feb 28, 2024 at 12:28 AM Jun Rao
>  > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi, Abhijeet,
> > > > > > >
> > > > > > > Thanks for the explanation. Makes sense to me now.
> > > > > > >
> > > > > > > Just a minor comment. Could you document the exact meaning of
> the
> > > > > > following
> > > > > > > two metrics? For example, is the time accumulated? If so, is it
> > > from
> > > > > the
> > > > > > > start of the broker or over some window?
> > > > > > >
> > > > > > > kafka.log.remote:type=RemoteLogManager,
> > > name=RemoteFetchThrottleTime
> > > > > > > kafka.log.remote:type=RemoteLogManager,
> > name=RemoteCopyThrottleTime
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Tue, Feb 27, 2024 at 1:39 AM Abhijeet Kumar <
> > > > > > abhijeet.cse@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > The existing quota system for consumers is designed to
> throttle
> > > the
> > > > > > > > consumer if it exceeds the allowed fetch rate.
> > > > > > > > The additional quota we want to add works on the broker
> level.
> > If
> > > > the
> > > > > > > > broker-level remote read quota is being
> > > > > > > > exceeded, we prevent additional reads from the remote storage
> > but
> > > > do
> > > > > > not
> > > > > > > > prevent local reads for the consumer.
> > > > > > > > If the consumer has specified other partitions to read, which
> > can
> > > > be
> > > > > > > served

[jira] [Created] (KAFKA-16264) Expose `producer.id.expiration.check.interval.ms` as dynamic broker configuration

2024-02-16 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-16264:


 Summary: Expose `producer.id.expiration.check.interval.ms` as 
dynamic broker configuration
 Key: KAFKA-16264
 URL: https://issues.apache.org/jira/browse/KAFKA-16264
 Project: Kafka
  Issue Type: Improvement
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Dealing with a scenario where too many producer ids lead to issues (e.g. high 
cpu utilization, see KAFKA-16229) put operators in need to flush producer ids 
more promptly than usual.

Currently, only the expiration timeout `producer.id.expiration.ms` is exposed 
as dynamic config. This is helpful (e.g. by reducing the timeout, less producer 
would eventually be kept in memory), but not enough if the evaluation frequency 
is not sufficiently short to flush producer ids before becoming an issue. Only 
by tuning both, the issue could be workaround.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage

2024-02-06 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-16229:


 Summary: Slow expiration of Producer IDs leading to high CPU usage
 Key: KAFKA-16229
 URL: https://issues.apache.org/jira/browse/KAFKA-16229
 Project: Kafka
  Issue Type: Bug
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Expiration of ProducerIds is implemented with a slow removal of map keys:

```
        producers.keySet().removeAll(keys);
```
 
Unnecessarily going through all producer ids and then throw all expired keys to 
be removed.
This leads to exponential time on worst case when most/all keys need to be 
removed:

```
Benchmark                                        (numProducerIds)  Mode  Cnt    
       Score            Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3    
    9164.043 ±      10647.877  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3    
  341561.093 ±      20283.211  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3    
44957983.550 ±    9389011.290  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
5683374164.167 ± 1446242131.466  ns/op
```

A simple fix is to use map#remove(key) instead, leading to a more linear growth:

```
Benchmark(numProducerIds)  Mode  Cnt
Score Error  Units
ProducerStateManagerBench.testDeleteExpiringIds   100  avgt3
 5779.056 ± 651.389  ns/op
ProducerStateManagerBench.testDeleteExpiringIds  1000  avgt3
61430.530 ±   21875.644  ns/op
ProducerStateManagerBench.testDeleteExpiringIds 1  avgt3   
643887.031 ±  600475.302  ns/op
ProducerStateManagerBench.testDeleteExpiringIds10  avgt3  
7741689.539 ± 3218317.079  ns/op
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-1018: Introduce max remote fetch timeout config

2024-01-31 Thread Jorge Esteban Quilcate Otoya

Hi Kamal,

Thanks for this KIP! It should help to solve one of the main issues with
tiered storage at the moment that is dealing with individual consumer
configurations to avoid flooding logs with interrupted exceptions.

One of the topics discussed in [1][2] was on the semantics of `
fetch.max.wait.ms` and how it's affected by remote storage. Should we
consider within this KIP the update of `fetch.max.wail.ms` docs to clarify
it only applies to local storage?

Otherwise, LGTM -- looking forward to see this KIP adopted.

[1] https://issues.apache.org/jira/browse/KAFKA-15776
[2] https://github.com/apache/kafka/pull/14778#issuecomment-1820588080

On Tue, 30 Jan 2024 at 01:01, Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi all,
>
> I have opened a KIP-1018
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
> >
> to introduce dynamic max-remote-fetch-timeout broker config to give more
> control to the operator.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests
>
> Let me know if you have any feedback or suggestions.
>
> --
> Kamal
>

Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Jorge Esteban Quilcate Otoya

Congratulations Divij!!

On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:

> Congratulations!
>
> On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
>
> > congrats divij!
> >
> > On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> > 
> > wrote:
> >
> > > Congratulations Divij!
> > >
> > > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> wrote:
> > >
> > > > Congratulations Divij!
> > > >
> > > > Regards,
> > > > Gaurav
> > > >
> > > > > On 27-Dec-2023, at 17:44, Mickael Maison  >
> > > > wrote:
> > > > >
> > > > > Congratulations Divij!
> > > > >
> > > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> > > > wrote:
> > > > >>
> > > > >> Congrats Divij! Absolutely well deserved !
> > > > >>
> > > > >> Thanks!
> > > > >> Sagar.
> > > > >>
> > > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen 
> > wrote:
> > > > >>>
> > > > >>> Hi, Everyone,
> > > > >>>
> > > > >>> Divij has been a Kafka committer since June, 2023. He has
> remained
> > > very
> > > > >>> active and instructive in the community since becoming a
> committer.
> > > > It's my
> > > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > > >>>
> > > > >>> Congratulations Divij!
> > > > >>>
> > > > >>> Luke
> > > > >>> on behalf of Apache Kafka PMC
> > > > >>>
> > > >
> > >
> >
>

Re: [VOTE] KIP-963: Additional metrics in Tiered Storage

2023-11-23 Thread Jorge Esteban Quilcate Otoya

A bit late, but happy this KIP is being adopted.

Thanks, Christo!

On Thu 23. Nov 2023 at 15.15, Christo Lolov  wrote:

> Hello all,
>
> With 3 +1 binding and 1 +1 non-binding votes KIP-963 is adopted 拾!
> I will get down to implementing it.
>
> Best,
> Christo
>
> On Tue, 21 Nov 2023 at 07:22, Luke Chen  wrote:
>
> > +1 (binding) from me.
> > Thanks for the KIP.
> >
> > Luke
> >
> > On Tue, Nov 21, 2023 at 11:53 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> >
> > > +1 (binding)
> > > Thanks for the KIP and the discussion.
> > >
> > > Discussion mail thread for the KIP:
> > > https://lists.apache.org/thread/40vsyc240hyody37mf2f0pn90shkzb45
> > >
> > >
> > >
> > > On Tue, 21 Nov 2023 at 05:21, Kamal Chandraprakash
> > >  wrote:
> > > >
> > > > +1 (non-binding). Thanks for the KIP!
> > > >
> > > > On Tue, Nov 21, 2023, 03:04 Divij Vaidya 
> > > wrote:
> > > >
> > > > > + 1 (binding)
> > > > >
> > > > > This Kip will greatly improve Tiered Storage troubleshooting. Thank
> > you
> > > > > Christo.
> > > > >
> > > > > On Mon 20. Nov 2023 at 17:21, Christo Lolov <
> christolo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello all!
> > > > > >
> > > > > > Now that the discussion for KIP-963 has winded down, I would like
> > to
> > > open
> > > > > > it for a vote targeting 3.7.0 as the release. You can find the
> > > current
> > > > > > version of the KIP at
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage
> > > > > >
> > > > > > Best,
> > > > > > Christo
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-20 Thread Jorge Esteban Quilcate Otoya

tric-collection systems allow you to explore the
> > whole
> > > > namespace. For example, I really dislike that while log loading
> happens
> > > > Kafka emits log lines of "X/Y logs loaded" rather than just show me
> the
> > > > progress via a metric. If you are strongly against this, however, I
> am
> > > > happy to scope down on this as well.
> > > >
> > > > 104. Ideally we have only one metadata in remote storage for every
> > > segment
> > > > of the correct lineage. Due to leadership changes, however, this is
> not
> > > > always the case. I envisioned that exposing such a metric will
> showcase
> > > if
> > > > there are problems with too many metadata records not part of the
> > correct
> > > > lineage of a log.
> > > >
> > > > *re: Luke*
> > > >
> > > > 1. I am a bit conflicted on this one. As discussed earlier with
> Jorge,
> > in
> > > > my head such metrics are better left to plugin implementations. If
> you
> > > and
> > > > Kamal feel strongly about this being included I will add it to the
> KIP.
> > > >
> > > > 2. After running tiered storage in production for a while I ran into
> > > > problems where a partition-level metric would have allowed me to zone
> > in
> > > on
> > > > the problem sooner. I tried balancing this with not exposing
> everything
> > > on
> > > > a partition level so not to explode the cardinality too much (point
> 101
> > > > from Satish). I haven't ran into a situation where knowing the
> > > > RemoteLogSizeComputationTime on a partition level helped me, but this
> > > > doesn't mean there isn't one.
> > > >
> > > > 3. I was thinking that the metric can be emitted while reading of
> those
> > > > records is happening i.e. if it takes a long time then it will just
> > > > gradually increase as we read. What do you think?
> > > >
> > > > *re: Jorge*
> > > >
> > > > 3.5. Sure, I will aim to add my thoughts to the KIP
> > > >
> > > > 4. Let me check and come back to you on this one. I have a vague
> memory
> > > > this wasn't as easy to calculate, but if it is, I will include
> > > > RemoteDeleteBytesPerSec as well.
> > > >
> > > > 7. Yes, this is I believe a better explanation than the one I have in
> > the
> > > > KIP, so I will aim to update it with your one. Thank you!
> > LocalDeleteLag
> > > > makes sense to me as well, I will aim to include it.
> > > >
> > > > *re: Kamal*
> > > >
> > > > 1. I can add this to the KIP, but similar to what I have mentioned
> > > earlier,
> > > > I believe these are better left to plugin implementations, no?
> > > >
> > > > 2. Yeah, this makes sense.
> > > >
> > > > Best,
> > > > Christo
> > > >
> > > > On Fri, 10 Nov 2023 at 09:33, Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Thanks Christo for the KIP and the interesting discussion.
> > > > >
> > > > > 101. Adding metrics at partition level may increase the cardinality
> > of
> > > > > these metrics. We should be cautious of that and see whether they
> are
> > > > > really needed. RLM related operations do not generally affect based
> > on
> > > > > partition(s) but it is mostly because of the remote storage or
> broker
> > > > > level issues.
> > > > >
> > > > > 102. I am not sure whether the records metric is much useful when
> we
> > > > > have other bytes and segments related metrics available. If needed,
> > > > > records level information can be derived once we have
> segments/bytes
> > > > > metrics.
> > > > >
> > > > > 103. Regarding RemoteLogSizeComputationTime, we can add logs for
> > > > > debugging purposes to print the required duration while computing
> > size
> > > > > instead of generating a metric. If there is any degradation in
> remote
> > > > > log size computation, it will have an effect on RLM task leading to
> > > > > remote log copy and delete lags.
> > > > >
> > > > > RLMM and RSM implementations can always add more m

Re: [VOTE] KIP-997: Partition-Level Throughput Metrics

2023-11-15 Thread Jorge Esteban Quilcate Otoya

Qichao, thanks again for leading this proposal!

+1 (non-binding)

Cheers,
Jorge.

On Wed, 15 Nov 2023 at 19:17, Divij Vaidya  wrote:

> +1 (binding)
>
> I was involved in the discussion thread for this KIP and support it in its
> current form.
>
> --
> Divij Vaidya
>
>
>
> On Wed, Nov 15, 2023 at 10:55 AM Qichao Chu 
> wrote:
>
> > Hi all,
> >
> > I'd like to call a vote for KIP-977: Partition-Level Throughput Metrics.
> >
> > Please take a look here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> >
> > Best,
> > Qichao Chu
> >
>

Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

2023-11-13 Thread Jorge Esteban Quilcate Otoya

Divij, thanks for your prompt feedback!

1. Agree, caching at the plugin level was my initial idea as well; though,
keeping two caches for the same data both at the broker and at the plugin
seems wasteful. (added this as a rejected alternative in the meantime)

2. Not necessarially. The API allows to request a set of indexes. In the
case of the `RemoteIndexCache`, as it's currently implemented, it would be
using: [offset, time, transaction] index types.

However, I see your point that there may be scenarios where only 1 of the 3
indexes are used:
- Time index used mostly once when fetching sequentially by seeking offset
by time.
- Offset and Transaction indexes are probably the only ones that make sense
to cache as are used on every fetch.
Arguably, Transaction indexes are not as common, reducing the benefits of
the proposed approach:
from initially expecting to fetch 3 indexes at once, to potentially
fetching only 2 (offset, txn), but most probably fetching 1 (offset).

If there's value perceived from fetching Offset and Transaction together,
we can keep discussing this KIP. In the meantime, I will look into the
approach to lazily fetch indexes while waiting for additional feedback.

Cheers,
Jorge.

On Mon, 13 Nov 2023 at 16:51, Divij Vaidya  wrote:

> Hi Jorge
>
> 1. I don't think we need a new API here because alternatives solutions
> exist even with the current API. As an example, when the first index is
> fetched, the RSM plugin can choose to download all indexes and cache it
> locally. On the next call to fetch an index from the remote tier, we will
> hit the cache and retrieve the index from there.
>
> 2. The KIP assumes that all indexes are required at all times. However,
> indexes such as transaction indexes are only required for read_committed
> fetches and time index is only required when a fetch call wants to search
> offset by timestamp. As a future step in Tiered Storage, I would actually
> prefer to move towards a direction where we are lazily fetching indexes
> on-demand instead of fetching them together as proposed in the KIP.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Nov 10, 2023 at 4:00 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I would like to start the discussion on a KIP for Tiered Storage. It's
> > about improving cross-segment latencies by reducing calls to fetch
> indexes
> > individually.
> > Have a look:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once
> >
> > Cheers,
> > Jorge
> >
>

[DISCUSS] KIP-1003: Signal next segment when remote fetching

2023-11-10 Thread Jorge Esteban Quilcate Otoya

Hi there,

I would like to start the discussion on a KIP for Tiered Storage. It's
about improving cross-segment latencies by enabling Remote Storage Manager
implementation to pre-fetch across segments.
Have a look:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1003%3A+Signal+next+segment+when+remote+fetching

Cheers,
Jorge

[DISCUSS] KIP-1002: Fetch remote segment indexes at once

2023-11-10 Thread Jorge Esteban Quilcate Otoya

Hello everyone,

I would like to start the discussion on a KIP for Tiered Storage. It's
about improving cross-segment latencies by reducing calls to fetch indexes
individually.
Have a look:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once

Cheers,
Jorge

[jira] [Created] (KAFKA-15806) Signal next segment when remote fetching

2023-11-10 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15806:


 Summary: Signal next segment when remote fetching 
 Key: KAFKA-15806
 URL: https://issues.apache.org/jira/browse/KAFKA-15806
 Project: Kafka
  Issue Type: Improvement
  Components: Tiered-Storage
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Improve remote fetching performance when fetching across segment by signaling 
the next segment and allow Remote Storage Manager implementations to optimize 
their pre-fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15805) Fetch Remote Indexes at once

2023-11-10 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15805:


 Summary: Fetch Remote Indexes at once
 Key: KAFKA-15805
 URL: https://issues.apache.org/jira/browse/KAFKA-15805
 Project: Kafka
  Issue Type: Improvement
  Components: Tiered-Storage
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Reduce Tiered Storage latency when fetching indexes by allowing to fetch many 
indexes at once.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-09 Thread Jorge Esteban Quilcate Otoya

Hi Qichao,

Thanks for updating the KIP, all updates look good to me.

Looking forward to see this KIP moving forward!

Cheers,
Jorge.



On Wed, 8 Nov 2023 at 08:55, Qichao Chu  wrote:

> Hi Divij,
>
> Thank you for the feedback. I updated the KIP to make it a little bit more
> generic: filters will stay in an array instead of different top-level
> objects. In this way, if we need language filters in the future. The logic
> relationship of filters is also added.
>
> Hi Jorge,
>
> Thank you for the review and great comments. Here is the reply for each of
> the suggestions:
>
> 1) The words describing the property are now updated to include more
> details of the keys in the JSON. It also explicitly mentions the JSON
> nature of the config now.
> 2) The JSON entries should be non-conflict so the order is not relevant. If
> there's conflict, the conflict resolution rules are stated in the KIP. To
> make it more clear, ordering and duplication rules are updated in the
> Restrictions section of the *level* property.
> 3) Yeah we did take a look at the RecordingLevel config and it does not
> work for this case. The RecodingLevel config does not offer the capability
> of filtering and it has a drawback of needing to be added to all the future
> sensors. To reduce the duplication, I propose we merge the RecordingLevel
> to this more generic config in the future. Please take a look into the
> *Using
> the Existing RecordingLevel Config* section under *Rejected Alternatives*
> for more details.
> 4) This suggestion makes a lot of sense. My idea is to create a
> table/form/doc in the documentation for the verbosity levels of all metric
> series. If it's too verbose to be in the docs, I will update the KIP to
> include this info. I will create a JIRA for this effort once the KIP is
> approved.
> 5) Sure we can expand to all other series, added to the KIP.
> 6) Added a new section(*Working with the Configuration via CLI)* with the
> user experience details
> 7) Links are updated.
>
> Please take another look and let me know if you have any more concerns.
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] <https://uber.com/>
>
>
> On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Qichao,
> >
> > Thanks for the KIP! This will be a valuable contribution and improve the
> > tooling for troubleshooting.
> >
> > I have a couple of comments:
> >
> > 1. It's unclear from the `metrics.verbosity` description what the
> supported
> > values are. In the description mentions "If the value is high ... In the
> > low settings" but I think it's referring to the `level` property
> > specifically instead of the whole value that is now JSON. Could you
> clarify
> > this?
> >
> > 2. Could we state in which order the JSON entries are going to be
> > evaluated? I guess the last entry wins if it overlaps previous values,
> but
> > better to make this explicit.
> >
> > 3. Kafka metrics library has a `RecordingLevel` configuration -- have we
> > considered aligning these concepts and maybe reuse it instead of
> > `verbosityLevel`? Then we can reuse the levels: INFO, DEBUG, TRACE.
> >
> > 4. Not sure if within the scope of the KIP, but would be helpful to
> > document the metrics with the verbosity level attached to the metrics.
> > Maybe creating a JIRA ticket to track this would be enough if we can't
> > cover it as part of the KIP.
> >
> > 5. Could we consider the following client-related metrics as well:
> >   - BytesRejectedPerSec
> >   - TotalProduceRequestsPerSec
> >   - TotalFetchRequestsPerSec
> >   - FailedProduceRequestsPerSec
> >   - FailedFetchRequestsPerSec
> >   - FetchMessageConversionsPerSec
> >   - ProduceMessageConversionsPerSec
> > Would be great to have these from day 1 instead of requiring a following
> > KIP to extend this. Could be implemented in separate PRs if needed.
> >
> > 6. To make it clearer how the user experience would be, could we provide
> an
> > example of:
> > - how the broker configuration will be provided by default, and
> > - how the CLI tooling would be used to change the configuration?
> > - Maybe a couple of scenarios: adding a new metric config, a second one
> > with overlapping values, and
> > - describing the expected metrics to be mapped
> >
> > A couple of nits:
> > - The first link "MessagesInPerSec metrics" is pointing to
> > https://kafka.apache.org/documentation/#uses_metrics -- is this the
> > correct
> > reference? It doesn't s

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-09 Thread Jorge Esteban Quilcate Otoya

Hi Christo,

I'd like to add another suggestion:

7. Adding on TS lag formulas, my understanding is that per pertition:
- RemoteCopyLag: difference between: latest local segment candidate for
upload - latest remote segment
  - Represents how Remote Log Manager task is handling backlog of segments.
  - Ideally, this lag is zero -- grows when upload is slower than the
increase on candidate segments to upload

- RemoteDeleteLag: difference between: latest remote candidate segment to
keep based on retention - oldest remote segment
  - Represents how many segments Remote Log Manager task is missing to
delete at a given point in time
  - Ideally, this lag is zero -- grows when retention condition changes but
RLM task is not able to schedule deletion yet.

Is my understanding of these lags correct?

I'd like to also consider an additional lag:
- LocalDeleteLag: difference between: latest local candidate segment to
keep based on local retention - oldest local segment
  - Represents how many segments are still available locally when they are
candidate for deletion. This usually happens when log cleaner has not been
scheduled yet. It's important because it represents how much data is stored
locally when it could be removed, and it represents how much data that can
be sourced from remote tier is still been sourced from local tier.
  - Ideally, this lag returns to zero when log cleaner runs; but could be
growing if there are issues uploading data (other lag) or removing data
internally.

Thanks,
Jorge.

On Thu, 9 Nov 2023 at 10:51, Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the KIP!
>
> Some comments:
> 1. I agree with Kamal that a metric to cover the time taken to read data
> from remote storage is helpful.
>
> 2. I can see there are some metrics are only on topic level, but some are
> on partition level.
> Could you explain why some of them are only on topic level?
> Like RemoteLogSizeComputationTime, it's different from partition to
> partition, will it be better to be exposed as partition metric?
>
> 3. `RemoteLogSizeBytes` metric hanging.
> To compute the RemoteLogSizeBytes, we need to wait until all records in the
> metadata topic loaded.
> What will happen if it takes long to load the data from metadata topic?
> Should we instead return -1 or something to indicate it's still loading?
>
> Thanks.
> Luke
>
> On Fri, Nov 3, 2023 at 1:53 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Christo,
> >
> > Thanks for expanding the scope of the KIP!  We should also cover the time
> > taken to
> > read data from remote storage. This will give our users a fair idea about
> > the P99, P95,
> > and P50 Fetch latency to read data from remote storage.
> >
> > The Fetch API request metrics currently provides a breakdown of the time
> > spent on each item:
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L517
> > Should we also provide `RemoteStorageTimeMs` item (only for FETCH API) so
> > that users can
> > understand the overall and per-step time taken?
> >
> > Regarding the Remote deletion metrics, should we also emit a metric to
> > expose the oldest segment time?
> > Users can configure the topic retention either by size (or) time. If time
> > is configured, then emitting
> > the oldest segment time allows the user to configure an alert on top of
> it
> > and act accordingly.
> >
> > On Wed, Nov 1, 2023 at 7:07 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Thanks, Christo!
> > >
> > > 1. Agree. Having a further look into how many latency metrics are
> > included
> > > on the broker side there are only a few of them (e.g. request
> lifecycle)
> > —
> > > but seems mostly delegated to clients, or plugin in this case, to
> measure
> > > this.
> > >
> > > 3.2. Personally, I find the record-based lag less useful as records
> can't
> > > be relied as a stable unit of measure. So, if we can keep bytes- and
> > > segment-based lag, LGTM.
> > > 3.4.  Agree, these metrics should be on the broker side. Though if
> plugin
> > > decides to take deletion as a background process, then it should have
> > it's
> > > own metrics. That's why I was thinking the calculation should be fairly
> > > similar to the CopyLag: "these segments are available for deletion but
> > > haven't been deleted yet"
> > > 3.5. For lag metrics: could we add an explanation on how each lag will
> be
> > > calculated, e.g. using which values, from which components, under which
> > > circum

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-07 Thread Jorge Esteban Quilcate Otoya

Hi Qichao,

Thanks for the KIP! This will be a valuable contribution and improve the
tooling for troubleshooting.

I have a couple of comments:

1. It's unclear from the `metrics.verbosity` description what the supported
values are. In the description mentions "If the value is high ... In the
low settings" but I think it's referring to the `level` property
specifically instead of the whole value that is now JSON. Could you clarify
this?

2. Could we state in which order the JSON entries are going to be
evaluated? I guess the last entry wins if it overlaps previous values, but
better to make this explicit.

3. Kafka metrics library has a `RecordingLevel` configuration -- have we
considered aligning these concepts and maybe reuse it instead of
`verbosityLevel`? Then we can reuse the levels: INFO, DEBUG, TRACE.

4. Not sure if within the scope of the KIP, but would be helpful to
document the metrics with the verbosity level attached to the metrics.
Maybe creating a JIRA ticket to track this would be enough if we can't
cover it as part of the KIP.

5. Could we consider the following client-related metrics as well:
  - BytesRejectedPerSec
  - TotalProduceRequestsPerSec
  - TotalFetchRequestsPerSec
  - FailedProduceRequestsPerSec
  - FailedFetchRequestsPerSec
  - FetchMessageConversionsPerSec
  - ProduceMessageConversionsPerSec
Would be great to have these from day 1 instead of requiring a following
KIP to extend this. Could be implemented in separate PRs if needed.

6. To make it clearer how the user experience would be, could we provide an
example of:
- how the broker configuration will be provided by default, and
- how the CLI tooling would be used to change the configuration?
- Maybe a couple of scenarios: adding a new metric config, a second one
with overlapping values, and
- describing the expected metrics to be mapped

A couple of nits:
- The first link "MessagesInPerSec metrics" is pointing to
https://kafka.apache.org/documentation/#uses_metrics -- is this the correct
reference? It doesn't seem too relevant.
- Also, the link to ReplicaManager points to a line that has change
already; better to have a permalink to a specific commit: e.g.
https://github.com/apache/kafka/blob/edc7e10a745c350ad1efa9e4866370dc8ea0e034/core/src/main/scala/kafka/server/ReplicaManager.scala#L1218

Cheers,
Jorge.

On Tue, 7 Nov 2023 at 17:06, Qichao Chu  wrote:

> Hi Divij,
>
> It would be very nice if you could take a look at the recent changes, thank
> you!
> If there's no more required changes, shall we move to vote stage?
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] 
>
>
> On Thu, Nov 2, 2023 at 12:06 AM Qichao Chu  wrote:
>
> > Hi Divij,
> >
> > Thank you for the very quick response and the nice suggestions. I have
> > updated the KIP with the following thoughts.
> >
> > 1. I checked the Java documentation and it seems the regex engine in
> utils
> > <
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html>
> is
> > not 100% compatible with PCRE, though it is very close. I stated
> > the Java implementation as the requirement since we are most likely to
> > target a JVM language.
> > 2. Agreed with the filter limitation. For now, let's keep it topic only.
> > With that in mind, I feel we do have cases where a user wants to list
> many
> > topics. Although regex is also possible, an array will make things
> faster.
> > This makes me add two options for the topic filter.
> > 3. It seems not many configs are using JSON, this was the intention for
> me
> > to use a compound string. However since JSON is used widely in the
> project,
> > and given the benefits you mentioned earlier, I tried to make the config
> a
> > JSON array. The change is to make it compatible with multi-level
> settings.
> >
> > Let me know what you think. Many thanks!
> >
> > Best,
> > Qichao Chu
> > Software Engineer | Data - Kafka
> > [image: Uber] 
> >
> >
> > On Wed, Nov 1, 2023 at 9:43 PM Divij Vaidya 
> > wrote:
> >
> >> Thank you for making the changes Qichao.
> >>
> >> We are now entering in the territory of defining a declarative schema
> for
> >> filters. In the new input format, the type is string but we are
> imposing a
> >> schema for the string and we should clearly call out the schema. You can
> >> perhaps choose to adopt a schema such as below:
> >>
> >> metricLevel = High | Low (default: Low)
> >> metricNameRegEx = regEx (default: .*)
> >> nameOfDimension = string
> >> dimensionRegEx = regEx
> >> dimensionFilter = [=] (default: [])
> >>
> >> Final Value schema = "level"=$metricLevel, "name"=$metricNameRegEx,
> >> $dimensionFilter
> >>
> >> Further we need to answer questions such as :
> >> 1. which regEx format do we support (it should probably be
> Perl-compatible
> >> regular expressions (PCRE) because Java's regEx is compatible with it)
> >> 2. should we restrict the dimensionFilter to at max length 1 and value
> >> "topic" only for

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-01 Thread Jorge Esteban Quilcate Otoya

Thanks, Christo!

1. Agree. Having a further look into how many latency metrics are included
on the broker side there are only a few of them (e.g. request lifecycle) —
but seems mostly delegated to clients, or plugin in this case, to measure
this.

3.2. Personally, I find the record-based lag less useful as records can't
be relied as a stable unit of measure. So, if we can keep bytes- and
segment-based lag, LGTM.
3.4.  Agree, these metrics should be on the broker side. Though if plugin
decides to take deletion as a background process, then it should have it's
own metrics. That's why I was thinking the calculation should be fairly
similar to the CopyLag: "these segments are available for deletion but
haven't been deleted yet"
3.5. For lag metrics: could we add an explanation on how each lag will be
calculated, e.g. using which values, from which components, under which
circumstances do we expect these values to increase/decrease, etc. This
will clarify 3.4. and make it easier to agree and eventually test.

4. Sorry I wasn't clear. I meant similar to `RemoteCopyBytesPerSec` and
`RemoteFetchBytesPerSec`, we could consider to include
`RemoteDeleteBytesPerSec`.

5. and 6. Thanks for the explanation! It surely benefits to have these as
part of the set of metrics.

Cheers,
Jorge.

On Mon, 30 Oct 2023 at 16:07, Christo Lolov  wrote:

> Heya Jorge,
>
> Thank you for the insightful comments!
>
> 1. I see a value in such latency metrics but in my opinion the correct
> location for such metrics is in the plugins providing the underlying
> functionality. What are your thoughts on the matter?
>
> 2. Okay, I will look for and adjust the formatting today/tomorrow!
>
> 3.1 Done.
> 3.2 Sure, I will add this to the KIP later today, the suggestion makes
> sense to me. However, my question is, would you still find value in
> emitting metrics for all three i.e. RemoteCopyLagRecords,
> RemoteCopyLagBytes and RemoteCopyLagSegments or would you only keep
> RemoteCopyLagBytes and RemoteCopyLagSegments?
> 3.3. Yes, RemoteDeleteLagRecords was supposed to be an equivalent of
> RemoteCopyLagRecords. Once I have your opinion on 3.2 I will make the
> respective changes.
> 3.4. I envision these metrics to be added to Kafka rather than the plugins.
> Today Kafka sends deletes to remote storage but does not know whether those
> segments have been deleted immediately when the request has been sent or
> have been given to a background process to carry out the actual reclamation
> of space. The purpose of this metric is to give an estimate in time which
> says "hey, we have called this many segments or bytes to be deleted".
>
> 4. I believe this goes down the same line of thinking as what you mentioned
> in 3.3 - have I misunderstood something?
>
> 5. I have on a number of occasions found I do not have a metric to quickly
> point me to what part of tiered storage functionality is experiencing an
> issue, in some scenarios a follower failing to build an auxiliary state. An
> increase in the number of BuildRemoteLogAuxState requests per second can
> uncover problems for specific topics warranting a further investigation,
> which I tend to find difficult to judge purely based on parsing log
> statements. An increase in the number of errors can quickly zone in on
> followers failing as part of tiered storage and point me to look in the
> logs specifically for that component.
>
> 6. I find it useful start my investigations with respect to tiering
> problems by checking the rough size distribution of topics in remote. From
> then on I try to correlate whether a historically high-volume topic started
> experiencing a decrease in volume due to a decrease in produce traffic to
> that topic or due to an increase in lag on local storage due to the broker
> slowing down for whatever reason. Besides correlation I would use such a
> metric to also confirm whether my rate calculations are correct i.e. if
> topic A receives X MB/s and rolls a segment every Y seconds with an upload
> rate of Z MB/s do I see that much data actually being written in remote
> storage. Do these two scenarios demonstrate the usefulness I would have
> from such a metric and do the benefits make sense to you?
>
> 7. I agree. I have changed TotalRemoteLogSizeComputationTime,
> TotalRemoteLogSizeBytes, and TotalRemoteLogMetadataCount to
> RemoteLogSizeComputationTime, RemoteLogSizeBytes and RemoteLogMetadataCount
> respectively.
>
> On Fri, 27 Oct 2023 at 15:24, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Christo,
> >
> > Thanks for proposing KIP, this metrics will certainly be useful to
> operate
> > Kafka Tiered Storage as it becomes production-ready.
> >
> > 1. Given that the scope of the KIPs has grown to c

Re: [ANNOUNCE] New Kafka PMC Member: Satish Duggana

2023-10-27 Thread Jorge Esteban Quilcate Otoya

Congratulations Satish!!

On Fri, 27 Oct 2023 at 18:38, Mickael Maison 
wrote:

> Congratulations Satish!
>
> On Fri, Oct 27, 2023 at 5:18 PM Lucas Brutschy
>  wrote:
> >
> > Congrats!
> >
> > On Fri, Oct 27, 2023 at 5:06 PM Manikumar 
> wrote:
> > >
> > > Congrats!
> > >
> > > On Fri, Oct 27, 2023 at 8:35 PM Jun Rao 
> wrote:
> > >
> > > > Hi, Everyone,
> > > >
> > > > Satish Duggana has been a Kafka committer since 2022. He has been
> very
> > > > instrumental to the community since becoming a committer. It's my
> pleasure
> > > > to announce that Satish is now a member of Kafka PMC.
> > > >
> > > > Congratulations Satish!
> > > >
> > > > Jun
> > > > on behalf of Apache Kafka PMC
> > > >
>

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-27 Thread Jorge Esteban Quilcate Otoya

Hi Christo,

Thanks for proposing KIP, this metrics will certainly be useful to operate
Kafka Tiered Storage as it becomes production-ready.

1. Given that the scope of the KIPs has grown to cover more metrics, what
do you think about introducing latency metrics for RSM operations?
Copy and delete time metrics are quite obvious/simple on what they
represent; but fetch latency metrics would be helpful as remote fetching
clients directly. e.g. having a "time to first byte" metric could help to
know how much time is introduced by the remote tier to start serving
results to the consumer, or measuring how long it takes to return a
response to consumers.

2. Couple of requests/nits on the metrics table, could you:
- highlight the names (or have them on a separate column, as you prefer) to
make it easier to read? If you choose to have another column, maybe sort
them as "Name, Description, MBean" and adjust the width.
- group the related metrics in separate groups, e.g. Lag, Remote Delete,
Remote Log Aux State, Remote Log Size; so we can elaborate on why these set
of metrics are needed. Maybe adding some examples on usage and how
actionable they are as the ones shared in previous emails would be useful
to keep as part of the KIP.

3. On Lag metrics:
3.1 I would suggest the following renames:
- TotalRemoteRecordsLag -> RemoteCopyLagRecords
- TotalRemoteBytesLag -> RemoteCopyLagBytes
- DeleteRemoteLag -> RemoteDeleteLagRecords
3.2. I agree with Kamal that having a lag based on the number of segments
would be useful to include. Segments could give a faster proxy to
understand whether the lag is meaningful or not. e.g. if the number of
records and bytes are high, but the segment lag is only small (e.g. 1), it
may be ok; but if the number of segments is high, then it can be more
relevant to operators.
3.3. Could we consider having the same metrics for Delete Lag as there are
for Copy Lag? i.e. including RemoteDeleteLagBytes, and (potentially)
RemoteDeleteLag for segments.
3.4. The description of delete lag is unclear to me: I though it was about
the remote segments to be deleted (because of total retention) but not
deleted yet; however from description it seems that it's related to local
segments that are marked for deletion. Is this correct?

4. On Remote Delete metrics:
- Could we also include bytes-based metric as with Copy and Fetch? t would
be useful to know how many bytes are being deleted. If aggregated and
compared with copied bytes, we can get a sense of the amount of data stored
remotely, even if not exact (only considers segment size, not indexes)

5. On RemoteLogAuxState metrics: could you elaborate a bit more on the
purpose of this component and why the metrics proposed are needed?

6. On Total Remote Log Size metrics: similarly, could you elaborate on why
this metric is needed? I'm missing what makes this operation as relevant
(compared to other internals) to have some metrics attached -- maybe if you
could shared scenarios where this metrics would be useful would be helpful.

7. On the metrics naming: not sure the `Total*` prefix is really needed or
adds meaning. When I found it useful is when there are related metric that
are a subset, then the total prefix helps: e.g.
`TotalProduceRequestsPerSec` and `FailedProduceRequestsPerSec`

Cheers,
Jorge.


On Tue, 24 Oct 2023 at 12:24, Christo Lolov  wrote:

> Hello all,
>
> Now that 3.6 has been released, I would like to bring back attention to the
> following KIP for adding metrics to tiered storage targeting 3.7 -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Add+more+metrics+to+Tiered+Storage
> .
>
> Let me know your thoughts about the list of metrics and their granularity!
>
> Best,
> Christo
>
> On Fri, 13 Oct 2023 at 10:14, Christo Lolov 
> wrote:
>
> > Heya Gantigmaa,
> >
> > Apologies for the (very) late reply!
> >
> > Now that 3.6 has been released and reviewers have a bit more time I will
> > be picking up this KIP again. I am more than happy to add useful new
> > metrics to the KIP, I would just ask for a couple of days to review your
> > pull request and I will come back to you.
> >
> > Best,
> > Christo
> >
> > On Mon, 25 Sept 2023 at 10:49, Gantigmaa Selenge 
> > wrote:
> >
> >> Hi Christo,
> >>
> >> Thank you for writing the KIP.
> >>
> >> I recently raised a PR to add metrics for tracking remote segment
> >> deletions
> >> (https://github.com/apache/kafka/pull/14375) but realised those metrics
> >> were not mentioned in the original KIP-405 or KIP-930. Do you think
> these
> >> would make sense to be added to this KIP and get included in the
> >> discussion?
> >>
> >> Regards,
> >> Gantigmaa
> >>
> >> On Wed, Aug 9, 2023 at 1:53 PM Christo Lolov 
> >> wrote:
> >>
> >> > Heya Kamal,
> >> >
> >> > Thank you for going through the KIP and for the question!
> >> >
> >> > I have been thinking about this and as an operator I might find it the
> >> most
> >> > useful to know all three of them actually.
> >> >
> >> > I would find

Re: [ANNOUNCE] New committer: Yash Mayya

2023-09-21 Thread Jorge Esteban Quilcate Otoya

Congratulations, Yash!

On Thu 21. Sep 2023 at 21.57, Randall Hauch  wrote:

> Congratulations, Yash!
>
> On Thu, Sep 21, 2023 at 12:31 PM Satish Duggana 
> wrote:
>
> > Congratulations Yash!!
> >
> > On Thu, 21 Sept 2023 at 22:57, Viktor Somogyi-Vass
> >  wrote:
> > >
> > > Congrats Yash!
> > >
> > > On Thu, Sep 21, 2023 at 7:04 PM Josep Prat  >
> > > wrote:
> > >
> > > > Congrats Yash!
> > > >
> > > > ———
> > > > Josep Prat
> > > >
> > > > Aiven Deutschland GmbH
> > > >
> > > > Alexanderufer 3-7, 10117 Berlin
> > > >
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > >
> > > > m: +491715557497
> > > >
> > > > w: aiven.io
> > > >
> > > > e: josep.p...@aiven.io
> > > >
> > > > On Thu, Sep 21, 2023, 18:55 Raymond Ng 
> > wrote:
> > > >
> > > > > Congrats Yash! Well-deserved!
> > > > >
> > > > > /Ray
> > > > >
> > > > > On Thu, Sep 21, 2023 at 9:40 AM Kamal Chandraprakash <
> > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > >
> > > > > > Congratulations Yash!
> > > > > >
> > > > > > On Thu, Sep 21, 2023, 22:03 Bill Bejeck 
> > wrote:
> > > > > >
> > > > > > > Congrats Yash!
> > > > > > >
> > > > > > > On Thu, Sep 21, 2023 at 12:26 PM Divij Vaidya <
> > > > divijvaidy...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Yash!
> > > > > > > >
> > > > > > > > Divij Vaidya
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Sep 21, 2023 at 6:18 PM Sagar <
> > sagarmeansoc...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Congrats Yash !
> > > > > > > > > On Thu, 21 Sep 2023 at 9:38 PM, Ashwin
> > > > >  > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Awesome ! Congratulations Yash !!
> > > > > > > > > >
> > > > > > > > > > On Thu, Sep 21, 2023 at 9:25 PM Edoardo Comar <
> > > > > > edoardli...@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Congratulations Yash
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 21 Sept 2023 at 16:28, Bruno Cadonna <
> > > > > cado...@apache.org
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > The PMC of Apache Kafka is pleased to announce a new
> > Kafka
> > > > > > > > committer
> > > > > > > > > > > > Yash Mayya.
> > > > > > > > > > > >
> > > > > > > > > > > > Yash's major contributions are around Connect.
> > > > > > > > > > > >
> > > > > > > > > > > > Yash authored the following KIPs:
> > > > > > > > > > > >
> > > > > > > > > > > > KIP-793: Allow sink connectors to be used with
> > > > topic-mutating
> > > > > > > SMTs
> > > > > > > > > > > > KIP-882: Kafka Connect REST API configuration
> > validation
> > > > > > timeout
> > > > > > > > > > > > improvements
> > > > > > > > > > > > KIP-970: Deprecate and remove Connect's redundant
> task
> > > > > > > > configurations
> > > > > > > > > > > > endpoint
> > > > > > > > > > > > KIP-980: Allow creating connectors in a stopped state
> > > > > > > > > > > >
> > > > > > > > > > > > Overall, Yash is known for insightful and friendly
> > input to
> > > > > > > > discussions
> > > > > > > > > > > > and his high quality contributions.
> > > > > > > > > > > >
> > > > > > > > > > > > Congratulations, Yash!
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Bruno (on behalf of the Apache Kafka PMC)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

[jira] [Created] (KAFKA-15314) No Quota applied if client-id is null or empty

2023-08-08 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15314:


 Summary: No Quota applied if client-id is null or empty
 Key: KAFKA-15314
 URL: https://issues.apache.org/jira/browse/KAFKA-15314
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


When Quotas where proposed, KIP-13[1] stated:

>  In addition, there will be a quota reserved for clients not presenting a 
>client id (for e.g. simple consumers not setting the id). This will default to 
>an empty client id ("") and all such clients will share the quota for that 
>empty id (which should be the default quota).

Though, seems that when client-id is null or empty and a default quota for 
client-id is present, no quota is applied.

Even though Java clients set a default value [2][3], the protocol accepts null 
client-id[4], and other clients implementations could send a null value to 
by-pass a quota.

Related code[5][6] shows that preparing metric pair for quotas with client-id 
(potentially null) and setting quota to null when both client-id and (sanitize) 
user are null.

Adding some tests to showcase this: 
[https://github.com/apache/kafka/pull/14165] 

 

Is it expected for client-id=null to by-pass quotas? If it is, then KIP or 
documentation to clarify this; otherwise we should amend this behavior bug. e.g 
we could "sanitize" client-id similar to user name to be empty string when 
input is null or empty.


 

As a side-note, similar behavior could happen with user I guess. Even though 
value is default to ANONYMOUS, if a client implementation sends empty value, it 
may as well by-pass the default quota – though I need to further test this once 
this is considered a bug.

 

[1]: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas] 


[2]: 
[https://github.com/apache/kafka/blob/e98508747acc8972ac5ceb921e0fd3a7d7bd5e9c/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L498-L508]
 


[3]: 
[https://github.com/apache/kafka/blob/ab71c56973518bac8e1868eccdc40b17d7da35c1/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java#L616-L628]

[4]: 
[https://github.com/apache/kafka/blob/9f26906fcc2fd095b7d27c504e342b9a8d619b4b/clients/src/main/resources/common/message/RequestHeader.json#L34-L40]
 


[5]: 
[https://github.com/apache/kafka/blob/322ac86ba282f35373382854cc9e790e4b7fb5fc/core/src/main/scala/kafka/server/ClientQuotaManager.scala#L588-L628]
 


[6]: 
[https://github.com/apache/kafka/blob/322ac86ba282f35373382854cc9e790e4b7fb5fc/core/src/main/scala/kafka/server/ClientQuotaManager.scala#L651-L652]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-07-25 Thread Jorge Esteban Quilcate Otoya

KIP is updated now:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations

Looking forward to your feedback,

Many thanks,
Jorge.

On Tue, 25 Jul 2023 at 16:59, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi Colin, sorry for the belated follow up.
>
> If I understand correctly, on your latest reply proposed to have a new
> API. From the proposed alternatives, I lean towards the first alternative
> proposed with 2 config maps, old (before-alter) and new (after-alter).
> Deleting a config is effectively returning to the default value, then users
> can use the old value and compare against default if new is null.
>
> This would require a bit broader changes, starting with a new config. I
> will work on the KIP updates considering: `AlterConfigV2Policy` interface,
> and config `alter.config.policy.v2.class.name`. Let me know if there's
> any issues with this; otherwise I will update the mail thread once the KIP
> is updated.
>
> Many thanks,
> Jorge.
>
> On Tue, 20 Jun 2023 at 11:56, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
>> Thanks Colin! You're right. I started this KIP only thinking on the
>> latest incremental API, and haven't thought much on the legacy one.
>>
>> After taking a another look at both APIs, I can see some inconsistencies
>> on how the policies are applied in each case. I have added a section
>> "Current workflow" [1] to the current proposal to summarize how alter
>> config works in both cases (legacy and incremental) and for both back-ends
>> (ZK, KRaft).
>>
>> In summary:
>> - On Legacy Alter Config, the set of changes proposed is the same as the
>> new config with the difference that null values are removed from the new
>> config.
>> - On Incremental Alter Config, the set of changes proposed is not the
>> same as the new config. It only contains explicit changes to the config
>> - Implicit deletes are a set of configs inferred on legacy alter config
>> when no value is provided but it exists on the current config
>> - Even though alter config policy receives the "requested"
>> configurations, these have 2 different meanings depending on the API used
>> to update configs.
>>   - When validating policies on Legacy Alter Config, it means: requested
>> changes that is equal to new config state including explicit deletes
>>   - When validating policies on Incremental Alter Config, it means: only
>> requested changes including explicit deletes but without any other config
>> from current or new status
>>   - Plugin implementations *do not know which one are they actually
>> dealing with*, and as incremental (new) API becomes broadly adopted, then
>> current status configs not included in the request are not considered.
>>
>> The problem is a bit larger than the one framed on the motivation. It's
>> not only that we don't have the current configs to compare with; but
>> depending on the API used to alter configs we may have them or not.
>>
>> Is this assessment correct?
>> If it is, then we may discuss approaching this issue as a bug instead. We
>> could consider aligning the semantics of the configs passed to the policy.
>> At the moment the "requested configs" are passed to policies when either
>> API is called, but both have _different meaning depending on which API is
>> used_. We could instead align the meaning, and pass the "new configs,
>> including explicit deletes" as we do on legacy when doing incremental
>> updates as well.
>>
>> Looking forward to your feedback and many thanks again!
>> Jorge.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations#KIP935:ExtendAlterConfigPolicywithexistingconfigurations-Currentworkflow
>>
>> On Thu, 15 Jun 2023 at 22:07, Colin McCabe  wrote:
>>
>>> Hi Jorge,
>>>
>>> I appreciate you trying to solve the issue. However, from the
>>> perspective of someone using the plugin API, it's quite messy: what is the
>>> difference between "proposed" and "resulting"? They sound the same.
>>>
>>> I think there are two APIs that make sense:
>>>
>>> 1. A (prev, next) based one where you just get the previous set of
>>> configs, and the new one, and can draw your own conclusions
>>>
>>> 2. A (prev, changed, removed) one where you get the previous set of
>>> configs, plus the changes (additions or modifications), and deletions.
>>>
>&

Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-07-25 Thread Jorge Esteban Quilcate Otoya

Hi Colin, sorry for the belated follow up.

If I understand correctly, on your latest reply proposed to have a new API.
>From the proposed alternatives, I lean towards the first alternative
proposed with 2 config maps, old (before-alter) and new (after-alter).
Deleting a config is effectively returning to the default value, then users
can use the old value and compare against default if new is null.

This would require a bit broader changes, starting with a new config. I
will work on the KIP updates considering: `AlterConfigV2Policy` interface,
and config `alter.config.policy.v2.class.name`. Let me know if there's any
issues with this; otherwise I will update the mail thread once the KIP is
updated.

Many thanks,
Jorge.

On Tue, 20 Jun 2023 at 11:56, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Colin! You're right. I started this KIP only thinking on the latest
> incremental API, and haven't thought much on the legacy one.
>
> After taking a another look at both APIs, I can see some inconsistencies
> on how the policies are applied in each case. I have added a section
> "Current workflow" [1] to the current proposal to summarize how alter
> config works in both cases (legacy and incremental) and for both back-ends
> (ZK, KRaft).
>
> In summary:
> - On Legacy Alter Config, the set of changes proposed is the same as the
> new config with the difference that null values are removed from the new
> config.
> - On Incremental Alter Config, the set of changes proposed is not the same
> as the new config. It only contains explicit changes to the config
> - Implicit deletes are a set of configs inferred on legacy alter config
> when no value is provided but it exists on the current config
> - Even though alter config policy receives the "requested" configurations,
> these have 2 different meanings depending on the API used to update configs.
>   - When validating policies on Legacy Alter Config, it means: requested
> changes that is equal to new config state including explicit deletes
>   - When validating policies on Incremental Alter Config, it means: only
> requested changes including explicit deletes but without any other config
> from current or new status
>   - Plugin implementations *do not know which one are they actually
> dealing with*, and as incremental (new) API becomes broadly adopted, then
> current status configs not included in the request are not considered.
>
> The problem is a bit larger than the one framed on the motivation. It's
> not only that we don't have the current configs to compare with; but
> depending on the API used to alter configs we may have them or not.
>
> Is this assessment correct?
> If it is, then we may discuss approaching this issue as a bug instead. We
> could consider aligning the semantics of the configs passed to the policy.
> At the moment the "requested configs" are passed to policies when either
> API is called, but both have _different meaning depending on which API is
> used_. We could instead align the meaning, and pass the "new configs,
> including explicit deletes" as we do on legacy when doing incremental
> updates as well.
>
> Looking forward to your feedback and many thanks again!
> Jorge.
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations#KIP935:ExtendAlterConfigPolicywithexistingconfigurations-Currentworkflow
>
> On Thu, 15 Jun 2023 at 22:07, Colin McCabe  wrote:
>
>> Hi Jorge,
>>
>> I appreciate you trying to solve the issue. However, from the perspective
>> of someone using the plugin API, it's quite messy: what is the difference
>> between "proposed" and "resulting"? They sound the same.
>>
>> I think there are two APIs that make sense:
>>
>> 1. A (prev, next) based one where you just get the previous set of
>> configs, and the new one, and can draw your own conclusions
>>
>> 2. A (prev, changed, removed) one where you get the previous set of
>> configs, plus the changes (additions or modifications), and deletions.
>>
>> 3. Same as 2 but you have a "changed" map whose values are Optionals, and
>> express deletions as Optional.empty
>>
>> The old API should just stay the same, bugs and all, for compatibility
>> reasons. But for the new API we should choose one of the above, I think.
>> I'm not completely sure which...
>>
>> best,
>> Colin
>>
>> On Mon, Jun 12, 2023, at 07:08, Jorge Esteban Quilcate Otoya wrote:
>> > Thanks Colin! You're right. I have added some notes about this to the
>> KIP,
>> > and clarify how this KIP is related to legacy and incremental

[VOTE] KIP-934: Add DeleteTopicPolicy

2023-07-25 Thread Jorge Esteban Quilcate Otoya

Hi All,

I'd like to start the vote for KIP-934: Add DeleteTopicPolicy:
https://cwiki.apache.org/confluence/x/-xE0Dw

Regards,
Jorge.

Re: [VOTE] KIP-930: Tiered Storage Metrics

2023-07-25 Thread Jorge Esteban Quilcate Otoya

+1 (non-binding)

Thanks, Abhijeet!


On Tue, 25 Jul 2023 at 14:22, Abhijeet Kumar 
wrote:

> Please find the updated link to the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Rename+ambiguous+Tiered+Storage+Metrics
>
> Updated the KIP as per our conversation on the discussion thread.
>
> On Tue, Jul 25, 2023 at 11:29 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I would like to start the vote for KIP-930 Tiered Storage Metrics.
> >
> > The KIP is here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> >
> > Regards
> > Abhijeet.
> >
> >
>
> --
> Abhijeet.
>

Re: [DISCUSS] KIP-930: Tiered Storage Metrics

2023-07-25 Thread Jorge Esteban Quilcate Otoya

Hi Abhijeet,

Thanks for this KIP, I pretty much agree with the renaming and new names
look good to me.

Cheers,
Jorge.

On Tue, 25 Jul 2023 at 12:56, Satish Duggana 
wrote:

> Hi Abhijeet,
> Thanks for keeping this KIP only to renaming the existing metrics for
> better clarity. These new names look good to me.
>
> ~Satish.
>
> On Tue, 25 Jul 2023 at 13:12, Luke Chen  wrote:
> >
> > Hi Abhijeet,
> >
> > Thanks for the KIP!
> > I don't have much preference for the name changing.
> > But if it could confuse other people, it's good to make it clear.
> >
> > Thank you.
> > Luke
> >
> > On Tue, Jul 25, 2023 at 2:53 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Kamal,
> > >
> > > As we discussed offline, I will rename this KIP so that it only
> captures
> > > the aspect of renaming the previously added metrics to remove
> ambiguity.
> > > I will create another KIP for RemoteIndexCache metrics and other
> relevant
> > > tiered storage metrics.
> > >
> > > On Tue, Jul 25, 2023 at 12:03 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Abhijeet,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > We are changing the metric names from what was proposed in the
> KIP-405
> > > and
> > > > adding new metrics for RemoteIndexCache.
> > > > In the KIP, it's not clear whether we are renaming the aggregate
> broker
> > > > level metrics for remote copy/fetch/failed-copy/failed-fetch.
> > > >
> > > > Are these metrics enough to monitor all the aspects of tiered
> storage?
> > > >
> > > > (eg)
> > > > 1. Metrics to see the Tier Lag Status by number of pending
> > > > segments/records.
> > > > 2. Similar to log-start-offset and log-end-offset metrics.  Should we
> > > > expose local-log-start-offset and
> > > highest-offset-uploaded-to-remote-storage
> > > > as metric?
> > > >
> > > > Thanks,
> > > > Kamal
> > > >
> > > > On Mon, Jul 24, 2023 at 2:08 PM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I created KIP-930 for adding RemoteIndexCache stats and also to
> rename
> > > > some
> > > > > tiered storage metrics added as part of KIP-405
> > > > > <
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
> > > > > >
> > > > > to remove ambiguity.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> > > > >
> > > > > Feedback and suggestions are welcome.
> > > > >
> > > > > Regards,
> > > > > Abhijeet.
> > > > >
> > > >
> > >
> > >
> > > --
> > > Abhijeet.
> > >
>

[jira] [Created] (KAFKA-15231) Add ability to pause/resume Remote Log Manager tasks

2023-07-21 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15231:


 Summary: Add ability to pause/resume Remote Log Manager tasks 
 Key: KAFKA-15231
 URL: https://issues.apache.org/jira/browse/KAFKA-15231
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


Once Tiered Storage is enabled, there may be situations where needed to pause 
uploading tasks to a remote-tier. e.g. remote storage maintenance, 
troubleshooting, etc.

An RSM implementation may not be able to do this by itself without throwing 
exceptions, polluting the logs, etc.

Could we consider adding this ability to the Tiered Storage framework? Remote 
Log Manager seems like a good candidate place for this; though I'm wondering on 
how to expose it.

Would be interested to hear if this sounds like a good idea, and what options 
we have to include these.

We have been considering extending RLM tasks with a pause flag, and having an 
MBean to switch them on demand. Another option may be to extend the Kafka 
protocol to expose this – but seems much moved involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-852: Optimize calculation of size for log in remote tier

2023-07-13 Thread Jorge Esteban Quilcate Otoya

+1 (non-binding)

Thanks for the KIP!

Jorge.

On Thu, 13 Jul 2023 at 12:26, Luke Chen  wrote:

> +1 (binding) from me.
>
> Thanks for the KIP!
>
> Luke
>
> On Sun, Jul 2, 2023 at 11:49 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > +1 (non-binding). Thanks for the KIP!
> >
> > —
> > Kamal
> >
> > On Mon, 7 Nov 2022 at 2:20 AM, John Roesler  wrote:
> >
> > > Hi Divij,
> > >
> > > Thanks for the KIP!
> > >
> > > I’ve read through your write-up, and it sounds reasonable to me.
> > >
> > > I’m +1 (binding)
> > >
> > > Thanks,
> > > John
> > >
> > > On Tue, Nov 1, 2022, at 05:03, Divij Vaidya wrote:
> > > > Hey folks
> > > >
> > > > The discuss thread for this KIP has been open for a few months with
> no
> > > > concerns being surfaced. I would like to start a vote for the
> > > > implementation of this KIP.
> > > >
> > > > The KIP is available at
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > >
> > > >
> > > > Regards
> > > > Divij Vaidya
> > >
> >
>

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-13 Thread Jorge Esteban Quilcate Otoya

Thanks Divij.

I was confusing with the metric tags used by clients that are based on
topic and partition. Ideally partition label could be at a DEBUG recording
level, but that's outside the scope of this KIP.

Looks good to me, thanks again!

Jorge.

On Wed, 12 Jul 2023 at 15:55, Divij Vaidya  wrote:

> Jorge,
> About API name: Good point. I have changed it to remoteLogSize instead of
> getRemoteLogSize
>
> About partition tag in the metric: We don't use partition tag across any of
> the RemoteStorage metrics and I would like to keep this metric aligned with
> the rest. I will change the metric though to type=BrokerTopicMetrics
> instead of type=RemoteLogManager, since this is topic level information and
> not specific to RemoteLogManager.
>
>
> Satish,
> Ah yes! Updated from "This would increase the broker start-up time." to
> "This would increase the bootstrap time for the remote storage thread pool
> before the first eligible segment is archived."
>
> --
> Divij Vaidya
>
>
>
> On Mon, Jul 3, 2023 at 2:07 PM Satish Duggana 
> wrote:
>
> > Thanks Divij for taking the feedback and updating the motivation
> > section in the KIP.
> >
> > One more comment on Alternative solution-3, The con is not valid as
> > that will not affect the broker restart times as discussed in the
> > earlier email in this thread. You may want to update that.
> >
> > ~Satish.
> >
> > On Sun, 2 Jul 2023 at 01:03, Divij Vaidya 
> wrote:
> > >
> > > Thank you folks for reviewing this KIP.
> > >
> > > Satish, I have modified the motivation to make it more clear. Now it
> > says,
> > > "Since the main feature of tiered storage is storing a large amount of
> > > data, we expect num_remote_segments to be large. A frequent linear scan
> > > (i.e. listing all segment metadata) could be expensive/slower because
> of
> > > the underlying storage used by RemoteLogMetadataManager. This slowness
> to
> > > list all segment metadata could result in the loss of availability"
> > >
> > > Jun, Kamal, Satish, if you don't have any further concerns, I would
> > > appreciate a vote for this KIP in the voting thread -
> > > https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > Thanks for the explanation. LGTM.
> > > >
> > > > --
> > > > Kamal
> > > >
> > > > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Divij,
> > > > > I am fine with having an API to compute the size as I mentioned in
> my
> > > > > earlier reply in this mail thread. But I have the below comment for
> > > > > the motivation for this KIP.
> > > > >
> > > > > As you discussed offline, the main issue here is listing calls for
> > > > > remote log segment metadata is slower because of the storage used
> for
> > > > > RLMM. These can be avoided with this new API.
> > > > >
> > > > > Please add this in the motivation section as it is one of the main
> > > > > motivations for the KIP.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Sat, 1 Jul 2023 at 01:43, Jun Rao 
> > wrote:
> > > > > >
> > > > > > Hi, Divij,
> > > > > >
> > > > > > Sorry for the late reply.
> > > > > >
> > > > > > Given your explanation, the new API sounds reasonable to me. Is
> > that
> > > > > enough
> > > > > > to build the external metadata layer for the remote segments or
> do
> > you
> > > > > need
> > > > > > some additional API changes?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya <
> > divijvaidy...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thank you for looking into this Kamal.
> > > > > > >
> > > > > > > You are right in saying that a cold start (i.e. leadership
> > failover
> > > > or
> > > > > > > broker startup) does not impact the broker startup duration.
> But
> > it
> > > > > does
> > > > > > > have the following impact:
> > > > > > > 1. It leads to a burst of full-scan requests to RLMM in case
> > multiple
> > > > > > > leadership failovers occur at the same time. Even if the RLMM
> > > > > > > implementation has the capability to serve the total size from
> an
> > > > index
> > > > > > > (and hence handle this burst), we wouldn't be able to use it
> > since
> > > > the
> > > > > > > current API necessarily calls for a full scan.
> > > > > > > 2. The archival (copying of data to tiered storage) process
> will
> > > > have a
> > > > > > > delayed start. The delayed start of archival could lead to
> local
> > > > build
> > > > > up
> > > > > > > of data which may lead to disk full.
> > > > > > >
> > > > > > > The disadvantage of adding this new API is that every provider
> > will
> > > > > have to
> > > > > > > implement it, agreed. But I believe that this tradeoff is
> > worthwhile
> > > > > since
> > > > > > > the

[jira] [Created] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-07-12 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15181:


 Summary: Race condition on partition assigned to 
TopicBasedRemoteLogMetadataManager 
 Key: KAFKA-15181
 URL: https://issues.apache.org/jira/browse/KAFKA-15181
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared whever 
partitions are assigned.

When partitions are assigned to the TBRLMM instance, a consumer is started to 
keep the cache up to date.

If the cache hasn't finalized to build, TBRLMM fails to return remote metadata 
about partitions that are store on the backing topic. TBRLMM may not recover 
from this failing state.

A proposal to fix this issue would be wait after a partition is assigned for 
the consumer to catch up. A similar logic is used at the moment when TBRLMM 
writes to the topic, and uses send callback to wait for consumer to catch up. 
This logic can be reused whever a partition is assigned, so when TBRLMM is 
marked as initialized, cache is ready to serve requests.


Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New committer: Greg Harris

2023-07-11 Thread Jorge Esteban Quilcate Otoya

Congrats Greg!!

On Tue 11. Jul 2023 at 15.20, Federico Valeri  wrote:

> Congrats Greg!
>
> On Tue, Jul 11, 2023 at 3:55 AM Luke Chen  wrote:
> >
> > Congrats Greg!
> >
> > Luke
> >
> > On Tue, Jul 11, 2023 at 8:19 AM Matthew de Detrich
> >  wrote:
> >
> > > Congratulations, well deserved!
> > >
> > > On Mon, Jul 10, 2023 at 5:45 PM Chris Egerton 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > The PMC for Apache Kafka has invited Greg Harris to become a
> committer,
> > > and
> > > > we are happy to announce that he has accepted!
> > > >
> > > > Greg has been contributing to Kafka since 2019. He has made over 50
> > > commits
> > > > mostly around Kafka Connect and Mirror Maker 2. His most notable
> > > > contributions include KIP-898: "Modernize Connect plugin discovery"
> and a
> > > > deep overhaul of the offset syncing logic in MM2 that addressed
> several
> > > > technically-difficult, long-standing, high-impact issues.
> > > >
> > > > He has also been an active participant in discussions and reviews on
> the
> > > > mailing lists and on GitHub.
> > > >
> > > > Thanks for all of your contributions, Greg. Congratulations!
> > > >
> > >
> > >
> > > --
> > >
> > > Matthew de Detrich
> > >
> > > *Aiven Deutschland GmbH*
> > >
> > > Immanuelkirchstraße 26, 10405 Berlin
> > >
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >
> > > *m:* +491603708037
> > >
> > > *w:* aiven.io *e:* matthew.dedetr...@aiven.io
> > >
>

[jira] [Created] (KAFKA-15147) Measure pending and outstanding Remote Segment operations

2023-07-05 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15147:


 Summary: Measure pending and outstanding Remote Segment operations
 Key: KAFKA-15147
 URL: https://issues.apache.org/jira/browse/KAFKA-15147
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


Remote Log Segment operations (copy/delete) are executed by the Remote Storage 
Manager, and recorded by Remote Log Metadata Manager (e.g. default 
TopicBasedRLMM writes to the internal Kafka topic state changes on remote log 
segments).

As executions run, fail, and retry; it will be important to know how many 
operations are pending and outstanding over time to alert operators.

Pending operations are not enough to alert, as values can oscillate closer to 
zero. An additional condition needs to apply (running time > threshold) to 
consider an operation outstanding.

Proposal:

RemoteLogManager could be extended with 2 concurrent maps 
(pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure 
segmentId time when operation started, and based on this expose 2 metrics per 
operation:
 * pendingSegmentCopies: gauge of pendingSegmentCopies map
 * outstandingSegmentCopies: loop over pending ops, and if now - startedTime > 
timeout, then outstanding++ (maybe on debug level?)

Is this a valuable metric to add to Tiered Storage? or better to solve on a 
custom RLMM implementation?

Also, does it require a KIP?

Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15142) Add Client Metadata to RemoteStorageFetchInfo

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15142:


 Summary: Add Client Metadata to RemoteStorageFetchInfo
 Key: KAFKA-15142
 URL: https://issues.apache.org/jira/browse/KAFKA-15142
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Once Tiered Storage is deployed, it will be important to understand how remote 
data is accessed and what consumption patterns emerge on each deployment.

To do this, tiered storage logs/metrics could provide more context about which 
client is fetching which partition/offset range and when.

At the moment, Client metadata is not propagated to the tiered-storage 
framework. To fix this, {{RemoteStorageFetchInfo}} can be extended with 
{{Optional[ClientMetadata]}} available on {{{}FetchParams{}}}, and have this 
bits of data available to improve the logging/metrics when fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-15131) Improve RemoteStorageManager exception handling documentation

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-15131.
--
Resolution: Fixed

> Improve RemoteStorageManager exception handling documentation
> -
>
> Key: KAFKA-15131
> URL: https://issues.apache.org/jira/browse/KAFKA-15131
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>        Reporter: Jorge Esteban Quilcate Otoya
>    Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> As discussed here[1], RemoteStorageManager javadocs requires clarification 
> regarding error handling:
>  * Remove ambiguity on `RemoteResourceNotFoundException` description
>  * Describe when `RemoteResourceNotFoundException` can/should be thrown
>  * Describe expectations for idempotent operations when copying/deleting
>  
> [1] 
> https://issues.apache.org/jira/browse/KAFKA-7739?focusedCommentId=17720936=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17720936



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-02 Thread Jorge Esteban Quilcate Otoya

Thanks Divij, this KIP is a super useful improvement to Tiered Storage.

I have a couple of minor comments to the KIP, otherwise I'm +1 on this
proposal:

1. APIs haven't used getter naming convention on TS as far as I can see
(e.g `RLMM#remoteLogSegmentMetadata()`). We could rename the proposed
method to `RemoteLogMetadataManager#remoteLogSize(...)` to keep it
consistent,
2. The proposal for a new metric only includes `topic` as a label. Could we
also include a `partition` label?

Cheers,
Jorge.


On Sat, 1 Jul 2023 at 22:33, Divij Vaidya  wrote:

> Thank you folks for reviewing this KIP.
>
> Satish, I have modified the motivation to make it more clear. Now it says,
> "Since the main feature of tiered storage is storing a large amount of
> data, we expect num_remote_segments to be large. A frequent linear scan
> (i.e. listing all segment metadata) could be expensive/slower because of
> the underlying storage used by RemoteLogMetadataManager. This slowness to
> list all segment metadata could result in the loss of availability"
>
> Jun, Kamal, Satish, if you don't have any further concerns, I would
> appreciate a vote for this KIP in the voting thread -
> https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
>
> --
> Divij Vaidya
>
>
>
> On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Divij,
> >
> > Thanks for the explanation. LGTM.
> >
> > --
> > Kamal
> >
> > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana 
> > wrote:
> >
> > > Hi Divij,
> > > I am fine with having an API to compute the size as I mentioned in my
> > > earlier reply in this mail thread. But I have the below comment for
> > > the motivation for this KIP.
> > >
> > > As you discussed offline, the main issue here is listing calls for
> > > remote log segment metadata is slower because of the storage used for
> > > RLMM. These can be avoided with this new API.
> > >
> > > Please add this in the motivation section as it is one of the main
> > > motivations for the KIP.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Sat, 1 Jul 2023 at 01:43, Jun Rao  wrote:
> > > >
> > > > Hi, Divij,
> > > >
> > > > Sorry for the late reply.
> > > >
> > > > Given your explanation, the new API sounds reasonable to me. Is that
> > > enough
> > > > to build the external metadata layer for the remote segments or do
> you
> > > need
> > > > some additional API changes?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya  >
> > > wrote:
> > > >
> > > > > Thank you for looking into this Kamal.
> > > > >
> > > > > You are right in saying that a cold start (i.e. leadership failover
> > or
> > > > > broker startup) does not impact the broker startup duration. But it
> > > does
> > > > > have the following impact:
> > > > > 1. It leads to a burst of full-scan requests to RLMM in case
> multiple
> > > > > leadership failovers occur at the same time. Even if the RLMM
> > > > > implementation has the capability to serve the total size from an
> > index
> > > > > (and hence handle this burst), we wouldn't be able to use it since
> > the
> > > > > current API necessarily calls for a full scan.
> > > > > 2. The archival (copying of data to tiered storage) process will
> > have a
> > > > > delayed start. The delayed start of archival could lead to local
> > build
> > > up
> > > > > of data which may lead to disk full.
> > > > >
> > > > > The disadvantage of adding this new API is that every provider will
> > > have to
> > > > > implement it, agreed. But I believe that this tradeoff is
> worthwhile
> > > since
> > > > > the default implementation could be the same as you mentioned, i.e.
> > > keeping
> > > > > cumulative in-memory count.
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 4, 2023 at 5:48 PM Kamal Chandraprakash <
> > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > >
> > > > > > Hi Divij,
> > > > > >
> > > > > > Thanks for the KIP! Sorry for the late reply.
> > > > > >
> > > > > > Can you explain the rejected alternative-3?
> > > > > > Store the cumulative size of remote tier log in-memory at
> > > > > RemoteLogManager
> > > > > > "*Cons*: Every time a broker starts-up, it will scan through all
> > the
> > > > > > segments in the remote tier to initialise the in-memory value.
> This
> > > would
> > > > > > increase the broker start-up time."
> > > > > >
> > > > > > Keeping the source of truth to determine the remote-log-size in
> the
> > > > > leader
> > > > > > would be consistent across different implementations of the
> plugin.
> > > The
> > > > > > concern posted in the KIP is that we are calculating the
> > > remote-log-size
> > > > > on
> > > > > > each iteration of the cleaner thread (say 5 mins). If we
> calculate
> > > only
> > > > > > once during broker startup or during the leadership reassignment,
> > do
> > > we
> > > > > > still need the cache?
> > > > > >
> > > > > > The broker

Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

2023-06-30 Thread Jorge Esteban Quilcate Otoya

Thank you both for the replies! A couple more comments:

On Tue, 27 Jun 2023 at 14:57, Edoardo Comar  wrote:

> Hi Jorge
> thanks for the feedback. Comments inline below
>
> > 1. Similar to Kirk's first point, I'm also concerned on how would the
> > plugin developers / operators be able to apply multiple policies and how
> > configurations would be passed to each policy.
>
> We’ve only attempted to tackle the “one plugin per broker” model with
> this KIP, as that’s the use-case we most clearly understand. Although,
> as noted in the rejected alternatives section, it would be possible to
> use a facade-like pattern to delegate from one plugin implementation
> to others. The reason we’ve avoided tackling multiple plugins is that
> it introduces further complexity (which takes precedence? Is
> configuration needed to say which plugin applies to which topic?
> Etc.), and we are concerned that without a clear use-case we might
> make decisions we later come to regret. Hopefully by offering minimal
> configuration options, we don’t hinder a future “support multiple
> record validation policies” KIP.
>

Got it. Thanks!


>
> > Some approaches from other plugins we can get some inspiration from:
> >
> > - AlterConfig, CreateTopic policies are designed to be 1 policy
> > implementing the different APIs. Up to the plugin developer to pull
> > policies together and configure it on the broker side. I guess for Record
> > Validation this may be cumbersome considering some Schema Registry
> > providers may want to offer implementations for their own backend.
> >
> > - Connect Transforms: here there's a named set of plugins to apply per
> > connector, and each transform has its own configuration defined by
> prefix.
> > Personally, I'd consider this one an interesting approach if we decide to
> > allow multiple record validations to be configured.
> >
> > - Tiered Storage (probably Connectors as well) have class-loader aware
> > implementations with class path specific to the plugin. Not sure if this
> is
> > something to discuss on the KIP or later on the PR, but we could mention
> > something on how this plugin would deal with dependency conflicts (e.g.
> > different jackson version between broker, plugin(s)).
>
>
> Thanks for highlighting all of these places where we can draw
> inspiration. We’ve updated the KIP with an additional classloader
> property to match the tiered storage implementation. It seems likely
> that record validation policy implementations will live in the
> codebase of their corresponding schema registry (as is the case,
> today, for the client serdes used to integrate with a schema registry)
> - so it makes sense to insulate their implementation from specific
> .jar versions that may (or may not) be present in a particular version
> of the broker.
>
> > Also, by potentially supporting multiple plugins for record validation,
> it
> > would be important to consider if it's an all or nothing relation, or
> > posible to choose _some_ policies apply per topic.
> > I see there's some preference for setting the validation policy name on
> the
> > topic, though this could be cumbersome to operate: topic creation users
> may
> > not be aware of the record validation (similar to CreateTopic/AlterConfig
> > policies) and would impose additional coordination.
> > Maybe a flag to whether apply policies or not would be a better approach?
>
> Could you elaborate more on your comments about “maybe a flag to
> whether to apply policies or not would be a better approach?”. We
> thought that setting the ‘record.validation.policy’ property on a
> topic to a value supported by the plugin was such a flag - but it
> sounds like you might have a different approach in mind?
>
>
The current proposal is to have ‘record.validation.policy’ per topic
(default null). A flag would be something like
‘record.validation.policy.enable’ (default=false) may be simpler to
configure from the user perspective.

Also, at the moment, is a bit unclear to me what value the topic config
‘record.validation.policy’ should contain: is the policy class name? How is
the policy expected to use the name received?


> > 2. Have you consider adding the record metadata to the API? It may be
> > useful for logging purposes (e.g. if record validation fails, to log
> > topic-partition), or some policies are interested on record metadata
> (e.g.
> > compression, timestamp type, etc.)
>
> The topic/partition is available to the plugin via the TopicMetadata
> interace. Additional record properties could be added to the
> ‘RecordProxy’ interface, however the topic of how rich to make the
> interface was a sticking point for KIP-729. The intent behind the
> ‘TopicMetadata’ and ‘RecordProxy’ classes is that they can be extended
> in the future without breaking existing plugin implementations - so
> we’re not precluding further properties from being added if we’ve been
> too austere.
>

I see, agree.


>
> > 3. A minor comment for consistency regarding the

[jira] [Created] (KAFKA-15135) RLM listener configurations passed but ignored by RLMM

2023-06-30 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15135:


 Summary: RLM listener configurations passed but ignored by RLMM
 Key: KAFKA-15135
 URL: https://issues.apache.org/jira/browse/KAFKA-15135
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


As describe here [1] properties captured from listener are passed but ignored 
by TopicBasedRLMM.

 

[1] https://github.com/apache/kafka/pull/13828#issuecomment-1611155345



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15131) Improve RemoteStorageManager exception handling

2023-06-28 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15131:


 Summary: Improve RemoteStorageManager exception handling
 Key: KAFKA-15131
 URL: https://issues.apache.org/jira/browse/KAFKA-15131
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


As discussed here[1], RemoteStorageManager javadocs requires clarification 
regarding error handling:
 * Remove ambiguity on `RemoteResourceNotFoundException` description
 * Describe when `RemoteResourceNotFoundException` can/should be thrown
 * Describe expectations for idempotent operations when copying/deleting

 

[1] 
https://issues.apache.org/jira/browse/KAFKA-7739?focusedCommentId=17720936=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17720936



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

2023-06-21 Thread Jorge Esteban Quilcate Otoya

Hi Eduardo, Adrian.

Thanks for the KIP. I agree that allowing custom validations on the
broker-side addresses a real gap as you clearly stated on the motivation.

Some initial thoughts from my side:

1. Similar to Kirk's first point, I'm also concerned on how would the
plugin developers / operators be able to apply multiple policies and how
configurations would be passed to each policy.

Some approaches from other plugins we can get some inspiration from:

- AlterConfig, CreateTopic policies are designed to be 1 policy
implementing the different APIs. Up to the plugin developer to pull
policies together and configure it on the broker side. I guess for Record
Validation this may be cumbersome considering some Schema Registry
providers may want to offer implementations for their own backend.

- Connect Transforms: here there's a named set of plugins to apply per
connector, and each transform has its own configuration defined by prefix.
Personally, I'd consider this one an interesting approach if we decide to
allow multiple record validations to be configured.

- Tiered Storage (probably Connectors as well) have class-loader aware
implementations with class path specific to the plugin. Not sure if this is
something to discuss on the KIP or later on the PR, but we could mention
something on how this plugin would deal with dependency conflicts (e.g.
different jackson version between broker, plugin(s)).

Also, by potentially supporting multiple plugins for record validation, it
would be important to consider if it's an all or nothing relation, or
posible to choose _some_ policies apply per topic.
I see there's some preference for setting the validation policy name on the
topic, though this could be cumbersome to operate: topic creation users may
not be aware of the record validation (similar to CreateTopic/AlterConfig
policies) and would impose additional coordination.
Maybe a flag to whether apply policies or not would be a better approach?

2. Have you consider adding the record metadata to the API? It may be
useful for logging purposes (e.g. if record validation fails, to log
topic-partition), or some policies are interested on record metadata (e.g.
compression, timestamp type, etc.)

3. A minor comment for consistency regarding the APIs. As far as I have
seen, we tend to use the name of the object returned directly instead of
getters notation, see `AlterConfigPolicy.RecordMetadata` [1]. We may rename
some of the proposed APIs accordingly:

- `RecordProxy#headers()|key()|value()`
- `TopicMetadata#topicPartition()`

4. For the `RecordIntrospectionHints`, I'm struggling to see how this may
be used by the policy developers. Would you mind adding some examples on
how the policy in general may be used?
Specifically, `long needKeyBytes|needKeyValue` are difficult to interpret
to me.
nit: maybe replace `need*` with `requiresAccess*` or simply `access*`, or
similar.

Many thanks,

Jorge.

[1]
https://github.com/apache/kafka/blob/3c059133d3008d87f018f2efa4af27027fd5d18e/clients/src/main/java/org/apache/kafka/server/policy/AlterConfigPolicy.java#L41

On Wed, 21 Jun 2023 at 19:08, Kirk True  wrote:

> Hi Edo/Adrian!
>
> Thanks for the KIP.
>
> I have some questions, and apologies that the may fall under the “stupid”
> column because I’m not that familiar with this area :)
>
> 1. Does record.validation.policy.class.name support multiple classes, or
> just one? I’m probably not wrapping my head around it, but I’d imagine
> different policies for different families or groupings of topics, thus the
> need for supporting multiple policies. But if there are multiple then you
> run the risk of conflicts of ownership of validation, so ¯\_(ツ)_/¯
>
> 2. Is there any concern that a validation class may alter the contents of
> the ByteBuffer of the key and/or value? Perhaps that’s already handled
> and/or outside the scope of this KIP?
>
> 3. What is the benefit to introducing the inner TopicMetadata and
> RecordProxy interfaces vs. passing the TopicPartition, String (validation
> policy), and Record into the validate() method directly?
>
> Thanks,
> Kirk
>
> > On Jun 20, 2023, at 2:28 AM, Edoardo Comar 
> wrote:
> >
> > Thanks Николай,
> > We’d like to open a vote next week.
> > Hopefully getting some more feedback before then.
> >
> > Edo
> >
> >
> > On Wed, 7 Jun 2023 at 19:15, Николай Ижиков  wrote:
> >
> >> Hello.
> >>
> >> As author of one of related KIPs I’m +1 for this change.
> >> Long waited feature.
> >>
> >>> 7 июня 2023 г., в 19:02, Edoardo Comar  написал(а):
> >>>
> >>> Dear all,
> >>> Adrian and I would like to start a discussion thread on
> >>>
> >>> KIP-940: Broker extension point for validating record contents at
> >> produce time
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-940%3A+Broker+extension+point+for+validating+record+contents+at+produce+time
> >>>
> >>> This KIP proposes a new broker-side extension point (a “record
> >> validation policy”) that can be used to reject records

Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-06-20 Thread Jorge Esteban Quilcate Otoya

Thanks Colin! You're right. I started this KIP only thinking on the latest
incremental API, and haven't thought much on the legacy one.

After taking a another look at both APIs, I can see some inconsistencies on
how the policies are applied in each case. I have added a section "Current
workflow" [1] to the current proposal to summarize how alter config works
in both cases (legacy and incremental) and for both back-ends (ZK, KRaft).

In summary:
- On Legacy Alter Config, the set of changes proposed is the same as the
new config with the difference that null values are removed from the new
config.
- On Incremental Alter Config, the set of changes proposed is not the same
as the new config. It only contains explicit changes to the config
- Implicit deletes are a set of configs inferred on legacy alter config
when no value is provided but it exists on the current config
- Even though alter config policy receives the "requested" configurations,
these have 2 different meanings depending on the API used to update configs.
  - When validating policies on Legacy Alter Config, it means: requested
changes that is equal to new config state including explicit deletes
  - When validating policies on Incremental Alter Config, it means: only
requested changes including explicit deletes but without any other config
from current or new status
  - Plugin implementations *do not know which one are they actually dealing
with*, and as incremental (new) API becomes broadly adopted, then current
status configs not included in the request are not considered.

The problem is a bit larger than the one framed on the motivation. It's not
only that we don't have the current configs to compare with; but depending
on the API used to alter configs we may have them or not.

Is this assessment correct?
If it is, then we may discuss approaching this issue as a bug instead. We
could consider aligning the semantics of the configs passed to the policy.
At the moment the "requested configs" are passed to policies when either
API is called, but both have _different meaning depending on which API is
used_. We could instead align the meaning, and pass the "new configs,
including explicit deletes" as we do on legacy when doing incremental
updates as well.

Looking forward to your feedback and many thanks again!
Jorge.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations#KIP935:ExtendAlterConfigPolicywithexistingconfigurations-Currentworkflow

On Thu, 15 Jun 2023 at 22:07, Colin McCabe  wrote:

> Hi Jorge,
>
> I appreciate you trying to solve the issue. However, from the perspective
> of someone using the plugin API, it's quite messy: what is the difference
> between "proposed" and "resulting"? They sound the same.
>
> I think there are two APIs that make sense:
>
> 1. A (prev, next) based one where you just get the previous set of
> configs, and the new one, and can draw your own conclusions
>
> 2. A (prev, changed, removed) one where you get the previous set of
> configs, plus the changes (additions or modifications), and deletions.
>
> 3. Same as 2 but you have a "changed" map whose values are Optionals, and
> express deletions as Optional.empty
>
> The old API should just stay the same, bugs and all, for compatibility
> reasons. But for the new API we should choose one of the above, I think.
> I'm not completely sure which...
>
> best,
> Colin
>
> On Mon, Jun 12, 2023, at 07:08, Jorge Esteban Quilcate Otoya wrote:
> > Thanks Colin! You're right. I have added some notes about this to the
> KIP,
> > and clarify how this KIP is related to legacy and incremental alter
> config
> > APIs.
> >
> > Let me know if there's any gaps on the current proposal.
> >
> > Many thanks,
> > Jorge.
> >
> > On Mon, 12 Jun 2023 at 11:04, Colin McCabe  wrote:
> >
> >> See KAFKA-14195. Some deletions are not handled correctly. And this
> cannot
> >> be fixed without a kip because of backwards compatibility.
> >>
> >> Colin
> >>
> >> On Wed, Jun 7, 2023, at 17:07, Jorge Esteban Quilcate Otoya wrote:
> >> > Thank Colin.
> >> >
> >> > I've took a closer look on how configs are passed to the policy when
> >> > delete
> >> > configs are requested, and either null (KRaft) or empty values
> >> > (ZkAdminManager) are passed:
> >> > - ZkAdminManager passes empty values:
> >> >   - Config Entries definition:
> >> >
> >>
> https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/core/src/main/scala/kafka/server/ZkAdminManager.scala#L503
> >> >   - and passed to policy without changes:
> &g

Re: [ANNOUNCE] New committer: Divij Vaidya

2023-06-13 Thread Jorge Esteban Quilcate Otoya

Awesome news! Congrats Divij!

On Tue 13. Jun 2023 at 19.30, Tom Bentley  wrote:

> Congratulations Divij!
>
> On Tue, 13 Jun 2023 at 17:21, Mickael Maison 
> wrote:
>
> > Congratulations!
> >
> > On Tue, Jun 13, 2023 at 6:05 PM Yash Mayya  wrote:
> > >
> > > Congratulations Divij!
> > >
> > > On Tue, Jun 13, 2023 at 9:20 PM Bruno Cadonna 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> > > > Divij Vaidya.
> > > >
> > > > Divij's major contributions are:
> > > >
> > > > GDPR compliance enforcement of kafka-site -
> > > > https://issues.apache.org/jira/browse/KAFKA-13868
> > > >
> > > > Performance improvements:
> > > >
> > > > Improve performance of VarInt encoding and decoding -
> > > > https://github.com/apache/kafka/pull/13312
> > > >
> > > > Reduce data copy & buffer allocation during decompression -
> > > > https://github.com/apache/kafka/pull/13135
> > > >
> > > > He also was heavily involved in the migration to Mockito.
> > > >
> > > > Furthermore, Divij is very active on the mailing lists as well as in
> > > > maintaining and reviewing pull requests.
> > > >
> > > > Congratulations, Divij!
> > > >
> > > > Thanks,
> > > >
> > > > Bruno (on behalf of the Apache Kafka PMC)
> > > >
> > > >
> > > >
> >
> >
>

Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-06-12 Thread Jorge Esteban Quilcate Otoya

Thanks Colin! You're right. I have added some notes about this to the KIP,
and clarify how this KIP is related to legacy and incremental alter config
APIs.

Let me know if there's any gaps on the current proposal.

Many thanks,
Jorge.

On Mon, 12 Jun 2023 at 11:04, Colin McCabe  wrote:

> See KAFKA-14195. Some deletions are not handled correctly. And this cannot
> be fixed without a kip because of backwards compatibility.
>
> Colin
>
> On Wed, Jun 7, 2023, at 17:07, Jorge Esteban Quilcate Otoya wrote:
> > Thank Colin.
> >
> > I've took a closer look on how configs are passed to the policy when
> > delete
> > configs are requested, and either null (KRaft) or empty values
> > (ZkAdminManager) are passed:
> > - ZkAdminManager passes empty values:
> >   - Config Entries definition:
> >
> https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/core/src/main/scala/kafka/server/ZkAdminManager.scala#L503
> >   - and passed to policy without changes:
> >
> https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/core/src/main/scala/kafka/server/ZkAdminManager.scala#L495
> > - Similar with ConfigurationControlManager (KRaft) passes null values:
> >   - Config entries added regardless of value==null:
> >
> https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L281
> >   - And passed to policy:
> >
> https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L295
> >
> > So, instead of passing (requested config + requested config to delete +
> > existing configs), the new metadata type is including (requested
> > configs--which already include deleted configs-- + _resulting_ configs)
> so
> > users could evaluate the whole (resulting) config map similar to
> > CreateTopicPolicy; and check only requested configs if needed (as with
> > current metadata).
> >
> > I've also added a rejected alternative to consider the scenario of
> > piggybacking on the existing map and just including the resulting config
> > instead, though this would break compatibility with existing
> > implementations.
> >
> > Thanks,
> > Jorge.
> >
> >
> > On Wed, 7 Jun 2023 at 08:38, Colin McCabe  wrote:
> >
> >> On Tue, Jun 6, 2023, at 06:57, Jorge Esteban Quilcate Otoya wrote:
> >> > Thanks Colin.
> >> >
> >> >> I would suggest renaming the "configs" parameter to
> "proposedConfigs,"
> >> in
> >> > both the new and old RequestMetadata constructors, to make things
> >> clearer.
> >> > This would be a binary and API-compatible change in Java.
> >> >
> >> > Sure, fully agree. KIP is updated with this suggestion.
> >> >
> >> >> We should also clarify that if configurations are being proposed for
> >> > deletion, they won't appear in proposedConfigs.
> >> >
> >> > Great catch. Wasn't aware of this, but I think it's valuable for the
> API
> >> to
> >> > also include the list of configurations to be deleted.
> >> > For this, I have extended the `RequestMetadata` type with a list of
> >> > proposed configs to delete:
> >> >
> >>
> >> Hi Jorge,
> >>
> >> Thanks for the reply.
> >>
> >> Rather than having a separate list of proposedConfigsToDelete, it seems
> >> like we could have an accessor function that calculates this when
> needed.
> >> After all, it's completely determined by existingConfigs and
> >> proposedConfigs. And some plugins will want the list and some won't (or
> >> will want to do a slightly different analysis)
> >>
> >> regards,
> >> Colin
> >>
> >>
> >> > ```
> >> > class RequestMetadata {
> >> >
> >> > private final ConfigResource resource;
> >> > private final Map proposedConfigs;
> >> > private final List proposedConfigsToDelete;
> >> > private final Map existingConfigs;
> >> > ```
> >> >
> >> > Looking forward to your feedback.
> >> >
> >> > Cheers,
> >> > Jorge.
> >> >
> >> > On Fri, 2 Jun 2023 at 22:42, Colin McCabe  wrote:
> >> >
> >> >> Hi Jorge,
> >> >>
> >> >> This is a good KIP wh

Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-06-07 Thread Jorge Esteban Quilcate Otoya

Thank Colin.

I've took a closer look on how configs are passed to the policy when delete
configs are requested, and either null (KRaft) or empty values
(ZkAdminManager) are passed:
- ZkAdminManager passes empty values:
  - Config Entries definition:
https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/core/src/main/scala/kafka/server/ZkAdminManager.scala#L503
  - and passed to policy without changes:
https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/core/src/main/scala/kafka/server/ZkAdminManager.scala#L495
- Similar with ConfigurationControlManager (KRaft) passes null values:
  - Config entries added regardless of value==null:
https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L281
  - And passed to policy:
https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java#L295

So, instead of passing (requested config + requested config to delete +
existing configs), the new metadata type is including (requested
configs--which already include deleted configs-- + _resulting_ configs) so
users could evaluate the whole (resulting) config map similar to
CreateTopicPolicy; and check only requested configs if needed (as with
current metadata).

I've also added a rejected alternative to consider the scenario of
piggybacking on the existing map and just including the resulting config
instead, though this would break compatibility with existing
implementations.

Thanks,
Jorge.


On Wed, 7 Jun 2023 at 08:38, Colin McCabe  wrote:

> On Tue, Jun 6, 2023, at 06:57, Jorge Esteban Quilcate Otoya wrote:
> > Thanks Colin.
> >
> >> I would suggest renaming the "configs" parameter to "proposedConfigs,"
> in
> > both the new and old RequestMetadata constructors, to make things
> clearer.
> > This would be a binary and API-compatible change in Java.
> >
> > Sure, fully agree. KIP is updated with this suggestion.
> >
> >> We should also clarify that if configurations are being proposed for
> > deletion, they won't appear in proposedConfigs.
> >
> > Great catch. Wasn't aware of this, but I think it's valuable for the API
> to
> > also include the list of configurations to be deleted.
> > For this, I have extended the `RequestMetadata` type with a list of
> > proposed configs to delete:
> >
>
> Hi Jorge,
>
> Thanks for the reply.
>
> Rather than having a separate list of proposedConfigsToDelete, it seems
> like we could have an accessor function that calculates this when needed.
> After all, it's completely determined by existingConfigs and
> proposedConfigs. And some plugins will want the list and some won't (or
> will want to do a slightly different analysis)
>
> regards,
> Colin
>
>
> > ```
> > class RequestMetadata {
> >
> > private final ConfigResource resource;
> > private final Map proposedConfigs;
> > private final List proposedConfigsToDelete;
> > private final Map existingConfigs;
> > ```
> >
> > Looking forward to your feedback.
> >
> > Cheers,
> > Jorge.
> >
> > On Fri, 2 Jun 2023 at 22:42, Colin McCabe  wrote:
> >
> >> Hi Jorge,
> >>
> >> This is a good KIP which addresses a real gap we have today.
> >>
> >> I would suggest renaming the "configs" parameter to "proposedConfigs,"
> in
> >> both the new and old RequestMetadata constructors, to make things
> clearer.
> >> This would be a binary and API-compatible change in Java. We should also
> >> clarify that if configurations are being proposed for deletion, they
> won't
> >> appear in proposedConfigs.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Tue, May 23, 2023, at 03:03, Christo Lolov wrote:
> >> > Hello!
> >> >
> >> > This proposal will address problems with configuration dependencies
> >> which I
> >> > run into very frequently, so I am fully supporting the development of
> >> this
> >> > feature!
> >> >
> >> > Best,
> >> > Christo
> >> >
> >> > On Mon, 22 May 2023 at 17:18, Jorge Esteban Quilcate Otoya <
> >> > quilcate.jo...@gmail.com> wrote:
> >> >
> >> >> Hi everyone,
> >> >>
> >> >> I'd like to start a discussion for KIP-935 <
> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations
> >> >> >
> >> >> which proposes extend AlterConfigPolicy with existing configuration
> to
> >> >> enable more complex policies.
> >> >>
> >> >> There have been related KIPs in the past that haven't been accepted
> and
> >> >> seem retired/abandoned as outlined in the motivation.
> >> >> The scope of this KIP intends to be more specific to get most of the
> >> >> benefits from previous discussions; and if previous KIPs are
> >> resurrected,
> >> >> should still be possible to do it if this one is adopted.
> >> >>
> >> >> Looking forward to your feedback!
> >> >>
> >> >> Thanks,
> >> >> Jorge.
> >> >>
> >>
>

Re: [DISCUSS] KIP-934: Add DeleteTopicPolicy

2023-06-06 Thread Jorge Esteban Quilcate Otoya

Thank Colin.

> One last note: if we do this, we should pass the UUID of the topic as
well as the name.

Sure, adding it to the KIP.

> On the minus side, the stated use-case (preventing deletion of
__consumer_offsets) seems easy to solve via ACLs. The CreateTopics case is
different... it's not as easy to solve via ACLs because people wanted to
enforcce specific topic names or conventions, beyond what ACLs could
provide.
> So it would be good to understand a bit more about why ACLs are not a
better solution than deletion policies.

Thanks for highlight this. I elaborated on the motivation a bit further.
I agree ACLs are a main tool to protect against unwanted topic deletion;
but even when proper users are authorized, it may be a human error to
request a topic deletion. So, in this case, policies act as a complement to
ACLs when topic deletion wants to be blocked.

Looking forward to your feedback.

Many thanks,
Jorge.

On Fri, 2 Jun 2023 at 22:24, Colin McCabe  wrote:

> Hi Jorge,
>
> On the plus side, the change is small and pretty easy to support.
>
> On the minus side, the stated use-case (preventing deletion of
> __consumer_offsets) seems easy to solve via ACLs. The CreateTopics case is
> different... it's not as easy to solve via ACLs because people wanted to
> enforcce specific topic names or conventions, beyond what ACLs could
> provide.
>
> So it would be good to understand a bit more about why ACLs are not a
> better solution than deletion policies.
>
> One last note: if we do this, we should pass the UUID of the topic as well
> as the name.
>
> best,
> Colin
>
>
> On Mon, May 22, 2023, at 09:18, Jorge Esteban Quilcate Otoya wrote:
> > Hi everyone,
> >
> > I'd like to start a discussion for KIP-934 <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy
> >
> > which proposes adding a new policy for when deleting topics.
> >
> > There have been related KIPs in the past that haven't been accepted and
> > seem retired/abandoned as outlined in the motivation.
> > The scope of this KIP intends to be more specific to get most of the
> > benefits from previous discussions; and if previous KIPs are resurrected,
> > should still be possible to do it if this one is adopted.
> >
> > Looking forward to your feedback!
> >
> > Thanks,
> > Jorge.
>

Re: [DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-06-06 Thread Jorge Esteban Quilcate Otoya

Thanks Colin.

> I would suggest renaming the "configs" parameter to "proposedConfigs," in
both the new and old RequestMetadata constructors, to make things clearer.
This would be a binary and API-compatible change in Java.

Sure, fully agree. KIP is updated with this suggestion.

> We should also clarify that if configurations are being proposed for
deletion, they won't appear in proposedConfigs.

Great catch. Wasn't aware of this, but I think it's valuable for the API to
also include the list of configurations to be deleted.
For this, I have extended the `RequestMetadata` type with a list of
proposed configs to delete:

```
class RequestMetadata {

private final ConfigResource resource;
private final Map proposedConfigs;
private final List proposedConfigsToDelete;
private final Map existingConfigs;
```

Looking forward to your feedback.

Cheers,
Jorge.

On Fri, 2 Jun 2023 at 22:42, Colin McCabe  wrote:

> Hi Jorge,
>
> This is a good KIP which addresses a real gap we have today.
>
> I would suggest renaming the "configs" parameter to "proposedConfigs," in
> both the new and old RequestMetadata constructors, to make things clearer.
> This would be a binary and API-compatible change in Java. We should also
> clarify that if configurations are being proposed for deletion, they won't
> appear in proposedConfigs.
>
> best,
> Colin
>
>
> On Tue, May 23, 2023, at 03:03, Christo Lolov wrote:
> > Hello!
> >
> > This proposal will address problems with configuration dependencies
> which I
> > run into very frequently, so I am fully supporting the development of
> this
> > feature!
> >
> > Best,
> > Christo
> >
> > On Mon, 22 May 2023 at 17:18, Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> >> Hi everyone,
> >>
> >> I'd like to start a discussion for KIP-935 <
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations
> >> >
> >> which proposes extend AlterConfigPolicy with existing configuration to
> >> enable more complex policies.
> >>
> >> There have been related KIPs in the past that haven't been accepted and
> >> seem retired/abandoned as outlined in the motivation.
> >> The scope of this KIP intends to be more specific to get most of the
> >> benefits from previous discussions; and if previous KIPs are
> resurrected,
> >> should still be possible to do it if this one is adopted.
> >>
> >> Looking forward to your feedback!
> >>
> >> Thanks,
> >> Jorge.
> >>
>

[jira] [Created] (KAFKA-15051) docs: add missing connector plugin endpoint to documentation

2023-06-02 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15051:


 Summary: docs: add missing connector plugin endpoint to 
documentation
 Key: KAFKA-15051
 URL: https://issues.apache.org/jira/browse/KAFKA-15051
 Project: Kafka
  Issue Type: Task
  Components: docs, documentation
Reporter: Jorge Esteban Quilcate Otoya


GET /plugin/config endpoint added in 
[KIP-769|https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions]
 is not included in the connect documentation page: 
https://kafka.apache.org/documentation/#connect_rest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-934: Add DeleteTopicPolicy

2023-05-23 Thread Jorge Esteban Quilcate Otoya

Thanks Christo!

On Tue, 23 May 2023 at 13:02, Christo Lolov  wrote:

> Heya Jorge,
>
> Thank you for the KIP!
>
> This feature sounds great to me since I have encountered problems with
> this, so I am supporting it. Do you have any idea why the previous KIPs
> were abandoned - I went through the email conversations and pull requests,
> but I didn't find a good reason?
>

Looking at the mailing thread, my understanding is:
In the case of KIP-170, it was retired as part of the KIP-201 when it was
figured they were overlapping. (around here[1]?)
For KIP-201, it's not clear. It was not stated within the mailing list
(last mail on Aug 2019), but on the PR that I found it was abandoned[3].
Maybe @Mickael Maison  may have more background?

[1] https://lists.apache.org/thread/6nkm706o7qjlz2ld7v8xdhf1421brh7o
[2] https://lists.apache.org/thread/cklw5fr5j1nhxyp5hpkfc1orm8r2tsmn
[3] https://github.com/apache/kafka/pull/4281#issuecomment-1035154386


>
> Best,
> Christo
>
> On Mon, 22 May 2023 at 17:19, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I'd like to start a discussion for KIP-934 <
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy
> > >
> > which proposes adding a new policy for when deleting topics.
> >
> > There have been related KIPs in the past that haven't been accepted and
> > seem retired/abandoned as outlined in the motivation.
> > The scope of this KIP intends to be more specific to get most of the
> > benefits from previous discussions; and if previous KIPs are resurrected,
> > should still be possible to do it if this one is adopted.
> >
> > Looking forward to your feedback!
> >
> > Thanks,
> > Jorge.
> >
>

[DISCUSS] KIP-934: Add DeleteTopicPolicy

2023-05-22 Thread Jorge Esteban Quilcate Otoya

Hi everyone,

I'd like to start a discussion for KIP-934 <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy>
which proposes adding a new policy for when deleting topics.

There have been related KIPs in the past that haven't been accepted and
seem retired/abandoned as outlined in the motivation.
The scope of this KIP intends to be more specific to get most of the
benefits from previous discussions; and if previous KIPs are resurrected,
should still be possible to do it if this one is adopted.

Looking forward to your feedback!

Thanks,
Jorge.

[DISCUSS] KIP-935: Extend AlterConfigPolicy with existing configurations

2023-05-22 Thread Jorge Esteban Quilcate Otoya

Hi everyone,

I'd like to start a discussion for KIP-935 <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations>
which proposes extend AlterConfigPolicy with existing configuration to
enable more complex policies.

There have been related KIPs in the past that haven't been accepted and
seem retired/abandoned as outlined in the motivation.
The scope of this KIP intends to be more specific to get most of the
benefits from previous discussions; and if previous KIPs are resurrected,
should still be possible to do it if this one is adopted.

Looking forward to your feedback!

Thanks,
Jorge.

[jira] [Created] (KAFKA-15014) KIP-935: Extend AlterConfigPolicy with existing configurations

2023-05-22 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15014:


 Summary:  KIP-935: Extend AlterConfigPolicy with existing 
configurations 
 Key: KAFKA-15014
 URL: https://issues.apache.org/jira/browse/KAFKA-15014
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15013) KIP-934: Add DeleteTopicPolicy

2023-05-22 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-15013:


 Summary: KIP-934: Add DeleteTopicPolicy
 Key: KAFKA-15013
 URL: https://issues.apache.org/jira/browse/KAFKA-15013
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2023-05-15 Thread Jorge Esteban Quilcate Otoya

Hi Mickael,

Just to check the status of this KIP as it looks very useful. I can see how
new Tiered Storage interfaces and plugins may benefit from this.

Cheers,
Jorge.

On Mon, 6 Feb 2023 at 23:00, Chris Egerton  wrote:

> Hi Mickael,
>
> I agree that adding a getter method for Monitorable isn't great. A few
> alternatives come to mind:
>
> 1. Introduce a new ConfiguredInstance (name subject to change) interface
> that wraps an instance of type T, but also contains a getter method for any
> PluginMetrics instances that the plugin was instantiated with (which may
> return null either if no PluginMetrics instance could be created for the
> plugin, or if it did not implement the Monitorable interface). This can be
> the return type of the new AbstractConfig::getConfiguredInstance variants.
> It would give us room to move forward with other plugin-for-your-plugin
> style interfaces without cluttering things up with getter methods. We could
> even add a close method to this interface which would handle cleanup of all
> extra resources allocated for the plugin by the runtime, and even possibly
> the plugin itself.
>
> 2. Break out the instantiation logic into two separate steps. The first
> step, creating a PluginMetrics instance, can be either private or public
> API. The second step, providing that PluginMetrics instance to a
> newly-created object, can be achieved with a small tweak of the proposed
> new methods for the AbstractConfig class; instead of accepting a Metrics
> instance, they would now accept a PluginMetrics instance. For the first
> step, we might even introduce a new CloseablePluginMetrics interface which
> would be the return type of whatever method we use to create the
> PluginMetrics instance. We can track that CloseablePluginMetrics instance
> in tandem with the plugin it applies to, and close it when we're done with
> the plugin.
>
> I know that this adds some complexity to the API design and some
> bookkeeping responsibilities for our implementation, but I can't shake the
> feeling that if we don't feel comfortable taking on the responsibility to
> clean up these resources ourselves, it's not really fair to ask users to
> handle it for us instead. And with the case of Connect, sometimes Connector
> or Task instances that are scheduled for shutdown block for a while, at
> which point we abandon them and bring up new instances in their place; it'd
> be nice to have a way to forcibly clear out all the metrics allocated by
> that Connector or Task instance before bringing up a new one, in order to
> prevent issues due to naming conflicts.
>
> Regardless, and whether or not it ends up being relevant to this KIP, I'd
> love to see a new Converter::close method. It's irked me for quite a while
> that we don't have one already.
>
> Cheers,
>
> Chris
>
> On Mon, Feb 6, 2023 at 1:50 PM Mickael Maison 
> wrote:
>
> > Hi Chris,
> >
> > I envisioned plugins to be responsible for closing the PluginMetrics
> > instance. This is mostly important for Connect connector plugins as
> > they can be closed while the runtime keeps running (and keeps its
> > Metrics instance). As far as I can tell, other plugins should only be
> > closed when their runtime closes, so we should not be leaking metrics
> > even if those don't explicitly call close().
> >
> > For Connect plugin, as you said, it would be nice to automatically
> > close their associated PluginMetrics rather than relying on user
> > logic. The issue is that with the current API there's no way to
> > retrieve the PluginMetrics instance once it's passed to the plugin.
> > I'm not super keen on having a getter method on the Monitorable
> > interface and tracking PluginMetrics instances associated with each
> > plugin would require a lot of changes. I just noticed Converter does
> > not have a close() method so it's problematic for that type of plugin.
> > The other Connect plugins all have close() or stop() methods. I wonder
> > if the simplest is to make Converter extend Closeable. WDYT?
> >
> > Thanks,
> > Mickael
> >
> > On Mon, Feb 6, 2023 at 6:39 PM Mickael Maison 
> > wrote:
> > >
> > > Hi Yash,
> > >
> > > I added a sentence to the sensor() method mentioning the sensor name
> > > must only be unique per plugin. Regarding having getters for sensors
> > > and metrics I considered this not strictly necessary as I expect the
> > > metrics logic in most plugins to be relatively simple. If you have a
> > > use case that would benefit from these methods, let me know I will
> > > reconsider.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Fri, Feb 3, 2023 at 9:16 AM Yash Mayya 
> wrote:
> > > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for the updates.
> > > >
> > > > > the PluginMetrics implementation will append a
> > > > > suffix to sensor names to unique identify
> > > > > the plugin (based on the class name and tags).
> > > >
> > > > Can we call this out explicitly in the KIP, since it is important to
> > avoid
> > > > clashes in sensor

Re: [VOTE] KIP-864: Add End-To-End Latency Metrics to Connectors

2023-05-10 Thread Jorge Esteban Quilcate Otoya

Hi everyone,

Bumping this vote thread. 2 +1 binding and 1 +1 non-binding so far.

Cheers,
Jorge.

On Mon, 27 Feb 2023 at 18:56, Knowles Atchison Jr 
wrote:

> +1 (non binding)
>
> On Mon, Feb 27, 2023 at 11:21 AM Chris Egerton 
> wrote:
>
> > Hi all,
> >
> > I could have sworn I +1'd this but I can't seem to find a record of that.
> >
> > In the hopes that this action is idempotent, +1 (binding). Thanks for the
> > KIP!
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Feb 27, 2023 at 6:28 AM Mickael Maison  >
> > wrote:
> >
> > > Thanks for the KIP
> > >
> > > +1 (binding)
> > >
> > > On Thu, Jan 26, 2023 at 4:36 PM Jorge Esteban Quilcate Otoya
> > >  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I'd like to call for a vote on KIP-864, which proposes to add metrics
> > to
> > > > measure end-to-end latency in source and sink connectors.
> > > >
> > > > KIP:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > > >
> > > > Discussion thread:
> > > > https://lists.apache.org/thread/k6rh2mr7pg94935fgpqw8b5fj308f2n7
> > > >
> > > > Many thanks,
> > > > Jorge.
> > >
> >
>

[jira] [Created] (KAFKA-14843) Connector plugins config endpoint does not include Common configs

2023-03-24 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14843:


 Summary: Connector plugins config endpoint does not include Common 
configs
 Key: KAFKA-14843
 URL: https://issues.apache.org/jira/browse/KAFKA-14843
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.2.0
Reporter: Jorge Esteban Quilcate Otoya


Connector plugins GET config endpoint introduced in 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions|https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions,]
 allows to get plugin configuration from the rest endpoint.

This configuration only includes the plugin configuration, but not the base 
configuration of the Sink/Source Connector.

For instance, when validating the configuration of a plugin, _all_ configs are 
returned:

```

curl -s 
$CONNECT_URL/connector-plugins/io.aiven.kafka.connect.http.HttpSinkConnector/config
 | jq -r '.[].name' | sort -u | wc -l     
21

curl -s 
$CONNECT_URL/connector-plugins/io.aiven.kafka.connect.http.HttpSinkConnector/config/validate
 -XPUT -H 'Content-type: application/json' --data "\{\"connector.class\": 
\"io.aiven.kafka.connect.http.HttpSinkConnector\", \"topics\": 
\"example-topic-name\"}" | jq -r '.configs[].definition.name' | sort -u | wc -l
39

```

and the missing configs are all from base config:

```

diff validate.txt config.txt                                                    
                                                
6,14d5
< config.action.reload
< connector.class
< errors.deadletterqueue.context.headers.enable
< errors.deadletterqueue.topic.name
< errors.deadletterqueue.topic.replication.factor
< errors.log.enable
< errors.log.include.messages
< errors.retry.delay.max.ms
< errors.retry.timeout
16d6
< header.converter
24d13
< key.converter
26d14
< name
33d20
< predicates
35,39d21
< tasks.max
< topics
< topics.regex
< transforms
< value.converter

```

Would be great to get the base configs from the same endpoint as well, so we 
could rely on it instead of using the validate endpoint to get all configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14815) Move Kafka documentation to Markdown/Hugo

2023-03-16 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14815:


 Summary: Move Kafka documentation to Markdown/Hugo 
 Key: KAFKA-14815
 URL: https://issues.apache.org/jira/browse/KAFKA-14815
 Project: Kafka
  Issue Type: Task
  Components: documentation
Reporter: Jorge Esteban Quilcate Otoya


Follow up from https://issues.apache.org/jira/browse/KAFKA-2967

Creating this task to discuss the adoption of Markdown and Hugo, and replace 
the HTML code.

The reasons to move away from HTML are outlined in KAFKA-2967.

Markdown and Asciidoc are both alternatives, but Markdown has been used, as I 
found some blockers when trying to migrate to Asciidoc:
 * Hugo requires Asciidoctor as additional binary (Markdown supported ootb)
 * To use Asciidoctor some security policies need to be opened: 
[https://stackoverflow.com/questions/71058236/hugo-with-asciidoctor]
 * I haven't managed to use shortcodes in Asciidoctor to inject versions, 
tables, etc.

Given the lower friction of Markdown and default support from Hugo, I would 
like to propose using these. Though I'm open to collaborate if someone has 
experience with Asciidoc and Hugo to make the migration as there is an existing 
interest to use Asciidoc if possible.

Draft repo: https://github.com/jeqo/ak-docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New Kafka PMC Member: Chris Egerton

2023-03-09 Thread Jorge Esteban Quilcate Otoya

So well deserved! Congratulations Chris!!!

On Thu, 9 Mar 2023 at 22:09, Lucas Brutschy 
wrote:

> Congratulations!
>
> On Thu, Mar 9, 2023 at 8:48 PM Roman Schmitz 
> wrote:
> >
> > Congratulations Chris!
> >
> > Am Do., 9. März 2023 um 20:33 Uhr schrieb Chia-Ping Tsai <
> chia7...@gmail.com
> > >:
> >
> > > Congratulations Chris!
> > >
> > > > Mickael Maison  於 2023年3月10日 上午2:21 寫道：
> > > >
> > > > Congratulations Chris!
> > > >
> > > >> On Thu, Mar 9, 2023 at 7:17 PM Bill Bejeck 
> wrote:
> > > >>
> > > >> Congratulations Chris!
> > > >>
> > > >>> On Thu, Mar 9, 2023 at 1:12 PM Jun Rao 
> > > wrote:
> > > >>>
> > > >>> Hi, Everyone,
> > > >>>
> > > >>> Chris Egerton has been a Kafka committer since July 2022. He has
> been
> > > very
> > > >>> instrumental to the community since becoming a committer. It's my
> > > pleasure
> > > >>> to announce that Chris is now a member of Kafka PMC.
> > > >>>
> > > >>> Congratulations Chris!
> > > >>>
> > > >>> Jun
> > > >>> on behalf of Apache Kafka PMC
> > > >>>
> > >
>

[DISCUSS] Migrating documentation to Markdown/Hugo

2023-03-07 Thread Jorge Esteban Quilcate Otoya

Hi all,

I wanted to share on the mailing list a follow up from
https://issues.apache.org/jira/browse/KAFKA-2967 to gather feedback from a
wider audience:

I put together a static-site generation with Hugo, and migrated most of the
documentation under `docs/` to Markdown using Pandoc:
https://github.com/jeqo/ak-docs

Hopefully the readme is good enough for anyone who wants to test it. Did
some initial integration with kafka-site to check how that would look.

There are some style minor details (e.g. URLs, image sizing, kafka streams
navigation, generate JSON instead of HTML on Config/Metrics, etc) that
could be discussed in a following issue, but I'd like to get feedback at
this point to see if the migration seems promising and what the next steps
could be.

Original HTMLs are kept to diff with current repo and align any diff, and
potentially the content of the repo could go under kafka/docs directory.

Looking forward to your feedback!

Jorge.

Re: [VOTE] KIP-821: Connect Transforms support for nested structures

2023-01-27 Thread Jorge Esteban Quilcate Otoya

Thanks Bill!!

Just to summarize: 4 binding +1 (Chris, Mickael, John and Bill), and 1
non-binding +1 (Robert)

Many thanks,
Jorge.

On Fri, 27 Jan 2023 at 15:56, Bill Bejeck  wrote:

> Hi Jorge,
>
> Recasting my vote +1(binding).
>
> I'm looking forward to this KIP!
>
> Thanks,
> Bill
>
> On Fri, Jan 27, 2023 at 10:11 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'm pleased to announce that the KIP-821 has been accepted with 3 binding
> > votes from Chris, John, Mickael, 1 non-binding vote by Robert, and 1
> > binding vote to be confirmed by Bill.
> >
> > Thank you all for your great feedback and contribution to the KIP!
> >
> > Cheers,
> > Jorge.
> >
> > On Thu, 26 Jan 2023 at 16:21, Mickael Maison 
> > wrote:
> >
> > > +1 (binding)
> > > Thanks for the KIP
> > >
> > > On Tue, Jun 28, 2022 at 10:42 PM Jorge Esteban Quilcate Otoya
> > >  wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I'd like to bump this vote thread. Currently it's missing 1 +1
> binding
> > > vote
> > > > to pass (2 +1 binding, 1 +1 non-binding).
> > > >
> > > > There has been additional discussions to consider array access and
> > > > deep-scan (similar to JsonPath) but hasn't been included as part of
> > this
> > > > KIP.
> > > > The only minor change since the previous votes has been the change of
> > > > configuration name: from `field.style` to `field.syntax.version`.
> > > >
> > > > KIP:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > >
> > > > Cheers,
> > > > Jorge.
> > > >
> > > > On Fri, 22 Apr 2022 at 00:01, Bill Bejeck  wrote:
> > > >
> > > > > Thanks for the KIP, Jorge.
> > > > >
> > > > > This looks like a great addition to Kafka Connect.
> > > > >
> > > > > +1(binding)
> > > > >
> > > > > -Bill
> > > > >
> > > > > On Thu, Apr 21, 2022 at 6:41 PM John Roesler 
> > > wrote:
> > > > >
> > > > > > Thanks for the KIP, Jorge!
> > > > > >
> > > > > > I’ve just looked over the KIP, and it looks good to me.
> > > > > >
> > > > > > I’m +1 (binding)
> > > > > >
> > > > > > Thanks,
> > > > > > John
> > > > > >
> > > > > > On Thu, Apr 21, 2022, at 09:10, Chris Egerton wrote:
> > > > > > > This is a worthwhile addition to the SMTs that ship out of the
> > box
> > > with
> > > > > > > Kafka Connect. +1 non-binding
> > > > > > >
> > > > > > > On Thu, Apr 21, 2022, 09:51 Jorge Esteban Quilcate Otoya <
> > > > > > > quilcate.jo...@gmail.com> wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I'd like to start a vote on KIP-821:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Jorge
> > > > > > >>
> > > > > >
> > > > >
> > >
> >
>

Re: [VOTE] KIP-821: Connect Transforms support for nested structures

2023-01-27 Thread Jorge Esteban Quilcate Otoya

Hi all,

I'm pleased to announce that the KIP-821 has been accepted with 3 binding
votes from Chris, John, Mickael, 1 non-binding vote by Robert, and 1
binding vote to be confirmed by Bill.

Thank you all for your great feedback and contribution to the KIP!

Cheers,
Jorge.

On Thu, 26 Jan 2023 at 16:21, Mickael Maison 
wrote:

> +1 (binding)
> Thanks for the KIP
>
> On Tue, Jun 28, 2022 at 10:42 PM Jorge Esteban Quilcate Otoya
>  wrote:
> >
> > Hi everyone,
> >
> > I'd like to bump this vote thread. Currently it's missing 1 +1 binding
> vote
> > to pass (2 +1 binding, 1 +1 non-binding).
> >
> > There has been additional discussions to consider array access and
> > deep-scan (similar to JsonPath) but hasn't been included as part of this
> > KIP.
> > The only minor change since the previous votes has been the change of
> > configuration name: from `field.style` to `field.syntax.version`.
> >
> > KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> >
> > Cheers,
> > Jorge.
> >
> > On Fri, 22 Apr 2022 at 00:01, Bill Bejeck  wrote:
> >
> > > Thanks for the KIP, Jorge.
> > >
> > > This looks like a great addition to Kafka Connect.
> > >
> > > +1(binding)
> > >
> > > -Bill
> > >
> > > On Thu, Apr 21, 2022 at 6:41 PM John Roesler 
> wrote:
> > >
> > > > Thanks for the KIP, Jorge!
> > > >
> > > > I’ve just looked over the KIP, and it looks good to me.
> > > >
> > > > I’m +1 (binding)
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Thu, Apr 21, 2022, at 09:10, Chris Egerton wrote:
> > > > > This is a worthwhile addition to the SMTs that ship out of the box
> with
> > > > > Kafka Connect. +1 non-binding
> > > > >
> > > > > On Thu, Apr 21, 2022, 09:51 Jorge Esteban Quilcate Otoya <
> > > > > quilcate.jo...@gmail.com> wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> I'd like to start a vote on KIP-821:
> > > > >>
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > > >>
> > > > >> Thanks,
> > > > >> Jorge
> > > > >>
> > > >
> > >
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2023-01-26 Thread Jorge Esteban Quilcate Otoya

No worries, thanks Chris!

I think most feedback has been covered and the KIP is ready for vote. Will
be starting the vote thread soon.

Cheers,
Jorge.

On Mon, 5 Dec 2022 at 15:10, Chris Egerton  wrote:

> Hi Jorge,
>
> Thanks for indulging my paranoia. LGTM!
>
> Cheers,
>
> Chris
>
> On Mon, Dec 5, 2022 at 10:06 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Sure! I have a added the following to the proposed changes section:
> >
> > ```
> > The per-record metrics will definitely be added to Kafka Connect as part
> of
> > this KIP, but their metric level will be changed pending the performance
> > testing described in KAFKA-14441, and will otherwise only be exposed at
> > lower level (DEBUG instead of INFO, and TRACE instead of DEBUG)
> > ```
> >
> > Let me know if how does it look.
> >
> > Many thanks!
> > Jorge.
> >
> > On Mon, 5 Dec 2022 at 14:11, Chris Egerton 
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for filing KAFKA-14441! In the ticket description we mention
> that
> > > "there will be more confidence whether to design metrics to be exposed
> > at a
> > > DEBUG or INFO level depending on their impact" but it doesn't seem like
> > > this is called out in the KIP and, just based on what's in the KIP, the
> > > proposal is still to have several per-record metrics exposed at INFO
> > level.
> > >
> > > Could we explicitly call out that the per-record metrics will
> definitely
> > be
> > > added to Kafka Connect as part of this KIP, but they will only be
> exposed
> > > at INFO level pending pending the performance testing described in
> > > KAFKA-14441, and will otherwise only be exposed at DEBUG level?
> > Otherwise,
> > > it's possible that a vote for the KIP as it's written today would be a
> > vote
> > > in favor of unconditionally exposing these metrics at INFO level, even
> if
> > > the performance testing reveals issues.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Sun, Dec 4, 2022 at 7:08 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Thanks for the reminder Chris!
> > > >
> > > > I have added a note on the KIP to include this as part of the KIP as
> > most
> > > > of the metrics proposed are per-record and having all on DEBUG would
> > > limit
> > > > the benefits, and created
> > > > https://issues.apache.org/jira/browse/KAFKA-14441
> > > > to keep track of this task.
> > > >
> > > > Cheers,
> > > > Jorge.
> > > >
> > > > On Tue, 29 Nov 2022 at 19:40, Chris Egerton  >
> > > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks! What were your thoughts on the possible benchmarking and/or
> > > > > downgrading of per-record metrics to DEBUG?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Thu, Nov 24, 2022 at 8:20 AM Jorge Esteban Quilcate Otoya <
> > > > > quilcate.jo...@gmail.com> wrote:
> > > > >
> > > > > > Thanks Chris! I have updated the KIP with "transform" instead of
> > > > "alias".
> > > > > > Agree it's clearer.
> > > > > >
> > > > > > Cheers,
> > > > > > Jorge.
> > > > > >
> > > > > > On Mon, 21 Nov 2022 at 21:36, Chris Egerton
> >  > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jorge,
> > > > > > >
> > > > > > > Thanks for the updates, and apologies for the delay. The new
> > > diagram
> > > > > > > directly under the "Proposed Changes" section is absolutely
> > > gorgeous!
> > > > > > >
> > > > > > >
> > > > > > > Follow-ups:
> > > > > > >
> > > > > > > RE 2: Good point. We can use the same level for these metrics,
> > it's
> > > > > not a
> > > > > > > big deal.
> > > > > > >
> > > > > > > RE 3: As long as all the per-record metrics are kept at DEBUG
> > > level,
>

[VOTE] KIP-864: Add End-To-End Latency Metrics to Connectors

2023-01-26 Thread Jorge Esteban Quilcate Otoya

Hi all,

I'd like to call for a vote on KIP-864, which proposes to add metrics to
measure end-to-end latency in source and sink connectors.

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors

Discussion thread:
https://lists.apache.org/thread/k6rh2mr7pg94935fgpqw8b5fj308f2n7

Many thanks,
Jorge.

Re: Re: [DISCUSS] KIP-821: Connect Transforms support for nested structures

2023-01-25 Thread Jorge Esteban Quilcate Otoya

Hi there,

Bumping this thread for visibility.

Cheers,
Jorge

On Fri, 2 Sep 2022, 18:01 Chris Egerton,  wrote:

> Hi Jorge,
>
> One tiny nit, but LGTM otherwise:
>
> The KIP mentions backslashes as "(/)"; shouldn't this be "(\)"?
>
> I'll cast a +1 on the vote thread anyways; I'm sure this won't block us.
>
> Cheers, and thanks for all your hard work on this!
>
> Chris
>
> On Thu, Sep 1, 2022 at 1:33 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thanks for your feedback!
> >
> > 1. Yes, it will be context-dependent. I have added rules and scenarios to
> > the nested notation to cover the happy path and edge cases. In short,
> > backticks will be not be considered as part of the field name when they
> are
> > wrapping a field name: first backtick at the beginning of the path or
> after
> > a dot, and closing backtick before a dot or at the end of the path.
> > If backticks happen to be in those positions, use backslash to escape
> them.
> > 2. You're right, that's a typo. Fixing it.
> > 3. I don't think so, I have added a scenario to clarify this.
> >
> > KIP is updated. Hopefully the rules and scenarios help to close any open
> > gap. Let me know if you see any cases that is not considered to address
> it.
> >
> > Cheers,
> > Jorge.
> >
> > On Wed, 31 Aug 2022 at 20:02, Chris Egerton 
> > wrote:
> >
> > > Hi Robert and Jorge,
> > >
> > > I think the backtick/backslash proposal works, but I'm a little unclear
> > on
> > > some of the details:
> > >
> > > 1. Are backticks only given special treatment when they immediately
> > follow
> > > a non-escaped dot? E.g., "foo.b`ar.ba`z" would refer to "foo" ->
> "b`ar"
> > ->
> > > "ba`z" instead of "foo" -> "bar.baz"? Based on the example where the
> name
> > > "a.b`.c" refers to "a" -> "b`" -> "c", it seems like this is the case,
> > but
> > > I'm worried this might cause confusion since the role of the backtick
> and
> > > the need to escape it becomes context-dependent.
> > >
> > > 2. In the example table near the beginning of the KIP, the name
> > "a.`b\`.c`"
> > > refers to "a" -> "b`c". What happened to the dot in the second part of
> > the
> > > name? Should it refer to "a" -> "b`.c" instead?
> > >
> > > 3. Is it ever necessary to escape backslashes themselves? If so, when?
> > >
> > > Overall, I wish we could come up with a prettier/simpler approach, but
> > the
> > > benefits provided by the dual backtick/dot syntax are too great to
> deny:
> > > there are no correctness issues like the ones posed with double-dot
> > > escaping that would lead to ambiguity, the most common cases are still
> > very
> > > simple to work with, and there's no risk of interfering with JSON
> escape
> > > mechanisms (in most cases) or single-quote shell quoting (which may be
> > > relevant when connector configurations are defined on the command
> line).
> > > Thanks for the suggestion, Robert!
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> >
>

Re: [VOTE] KIP-821: Connect Transforms support for nested structures

2023-01-25 Thread Jorge Esteban Quilcate Otoya

Hi there,

Bumping this thread for visibility.

Cheers,
Jorge

On Wed, 14 Sep 2022, 19:23 John Roesler,  wrote:

> Thanks, all!
>
> I've reviewed the current state of the KIP, and I'm still +1 (binding).
>
> Thanks,
> -John
>
> On Fri, Sep 2, 2022, at 12:03, Chris Egerton wrote:
> > +1 (binding). Thanks Jorge, great stuff!
> >
> > We should probably verify with the people that have already cast +1 votes
> > that they're still on board, since the design has shifted a bit since the
> > last vote was casted.
> >
> > On 2022/06/28 20:42:14 Jorge Esteban Quilcate Otoya wrote:
> >> Hi everyone,
> >>
> >> I'd like to bump this vote thread. Currently it's missing 1 +1 binding
> > vote
> >> to pass (2 +1 binding, 1 +1 non-binding).
> >>
> >> There has been additional discussions to consider array access and
> >> deep-scan (similar to JsonPath) but hasn't been included as part of this
> >> KIP.
> >> The only minor change since the previous votes has been the change of
> >> configuration name: from `field.style` to `field.syntax.version`.
> >>
> >> KIP:
> >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> >>
> >> Cheers,
> >> Jorge.
> >>
> >> On Fri, 22 Apr 2022 at 00:01, Bill Bejeck  wrote:
> >>
> >> > Thanks for the KIP, Jorge.
> >> >
> >> > This looks like a great addition to Kafka Connect.
> >> >
> >> > +1(binding)
> >> >
> >> > -Bill
> >> >
> >> > On Thu, Apr 21, 2022 at 6:41 PM John Roesler 
> wrote:
> >> >
> >> > > Thanks for the KIP, Jorge!
> >> > >
> >> > > I’ve just looked over the KIP, and it looks good to me.
> >> > >
> >> > > I’m +1 (binding)
> >> > >
> >> > > Thanks,
> >> > > John
> >> > >
> >> > > On Thu, Apr 21, 2022, at 09:10, Chris Egerton wrote:
> >> > > > This is a worthwhile addition to the SMTs that ship out of the box
> > with
> >> > > > Kafka Connect. +1 non-binding
> >> > > >
> >> > > > On Thu, Apr 21, 2022, 09:51 Jorge Esteban Quilcate Otoya <
> >> > > > quilcate.jo...@gmail.com> wrote:
> >> > > >
> >> > > >> Hi all,
> >> > > >>
> >> > > >> I'd like to start a vote on KIP-821:
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> >> > > >>
> >> > > >> Thanks,
> >> > > >> Jorge
> >> > > >>
> >> > >
> >> >
> >>
>

Re: [ANNOUNCE] New committer: Josep Prat

2022-12-20 Thread Jorge Esteban Quilcate Otoya

Congrats Josep!!

On Tue, 20 Dec 2022, 17:31 Greg Harris, 
wrote:

> Congratulations Josep!
>
> On Tue, Dec 20, 2022 at 9:29 AM Chris Egerton 
> wrote:
>
> > Congrats Josep! Well-earned.
> >
> > On Tue, Dec 20, 2022, 12:26 Jun Rao  wrote:
> >
> > > Hi, Everyone,
> > >
> > > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> > Josep
> > >  Prat.
> > >
> > > Josep has been contributing to Kafka since May 2021. He contributed 20
> > PRs
> > > including the following 2 KIPs.
> > >
> > > KIP-773 Differentiate metric latency measured in ms and ns
> > > KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with
> > > internal implementation
> > >
> > > Congratulations, Josep!
> > >
> > > Thanks,
> > >
> > > Jun (on behalf of the Apache Kafka PMC)
> > >
> >
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-12-05 Thread Jorge Esteban Quilcate Otoya

Sure! I have a added the following to the proposed changes section:

```
The per-record metrics will definitely be added to Kafka Connect as part of
this KIP, but their metric level will be changed pending the performance
testing described in KAFKA-14441, and will otherwise only be exposed at
lower level (DEBUG instead of INFO, and TRACE instead of DEBUG)
```

Let me know if how does it look.

Many thanks!
Jorge.

On Mon, 5 Dec 2022 at 14:11, Chris Egerton  wrote:

> Hi Jorge,
>
> Thanks for filing KAFKA-14441! In the ticket description we mention that
> "there will be more confidence whether to design metrics to be exposed at a
> DEBUG or INFO level depending on their impact" but it doesn't seem like
> this is called out in the KIP and, just based on what's in the KIP, the
> proposal is still to have several per-record metrics exposed at INFO level.
>
> Could we explicitly call out that the per-record metrics will definitely be
> added to Kafka Connect as part of this KIP, but they will only be exposed
> at INFO level pending pending the performance testing described in
> KAFKA-14441, and will otherwise only be exposed at DEBUG level? Otherwise,
> it's possible that a vote for the KIP as it's written today would be a vote
> in favor of unconditionally exposing these metrics at INFO level, even if
> the performance testing reveals issues.
>
> Cheers,
>
> Chris
>
> On Sun, Dec 4, 2022 at 7:08 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thanks for the reminder Chris!
> >
> > I have added a note on the KIP to include this as part of the KIP as most
> > of the metrics proposed are per-record and having all on DEBUG would
> limit
> > the benefits, and created
> > https://issues.apache.org/jira/browse/KAFKA-14441
> > to keep track of this task.
> >
> > Cheers,
> > Jorge.
> >
> > On Tue, 29 Nov 2022 at 19:40, Chris Egerton 
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks! What were your thoughts on the possible benchmarking and/or
> > > downgrading of per-record metrics to DEBUG?
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Nov 24, 2022 at 8:20 AM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Thanks Chris! I have updated the KIP with "transform" instead of
> > "alias".
> > > > Agree it's clearer.
> > > >
> > > > Cheers,
> > > > Jorge.
> > > >
> > > > On Mon, 21 Nov 2022 at 21:36, Chris Egerton  >
> > > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks for the updates, and apologies for the delay. The new
> diagram
> > > > > directly under the "Proposed Changes" section is absolutely
> gorgeous!
> > > > >
> > > > >
> > > > > Follow-ups:
> > > > >
> > > > > RE 2: Good point. We can use the same level for these metrics, it's
> > > not a
> > > > > big deal.
> > > > >
> > > > > RE 3: As long as all the per-record metrics are kept at DEBUG
> level,
> > it
> > > > > should be fine to leave JMH benchmarking for a follow-up. If we
> want
> > to
> > > > add
> > > > > new per-record, INFO-level metrics, I would be more comfortable
> with
> > > > > including benchmarking as part of the testing plan for the KIP. One
> > > > > possible compromise could be to propose that these features be
> merged
> > > at
> > > > > DEBUG level, and then possibly upgraded to INFO level in the future
> > > > pending
> > > > > benchmarks to guard against performance degradation.
> > > > >
> > > > > RE 4: I think for a true "end-to-end" metric, it'd be useful to
> > include
> > > > the
> > > > > time taken by the task to actually deliver the record. However,
> with
> > > the
> > > > > new metric names and descriptions provided in the KIP, I have no
> > > > objections
> > > > > with what's currently proposed, and a new "end-to-end" metric can
> be
> > > > taken
> > > > > on later in a follow-up KIP.
> > > > >
> > > > > RE 6: You're right, existing producer metrics should be enough for
> > now.
> > > > We
> > > > > can revisit this later if/when we add delivery-cent

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-12-04 Thread Jorge Esteban Quilcate Otoya

Thanks for the reminder Chris!

I have added a note on the KIP to include this as part of the KIP as most
of the metrics proposed are per-record and having all on DEBUG would limit
the benefits, and created https://issues.apache.org/jira/browse/KAFKA-14441
to keep track of this task.

Cheers,
Jorge.

On Tue, 29 Nov 2022 at 19:40, Chris Egerton  wrote:

> Hi Jorge,
>
> Thanks! What were your thoughts on the possible benchmarking and/or
> downgrading of per-record metrics to DEBUG?
>
> Cheers,
>
> Chris
>
> On Thu, Nov 24, 2022 at 8:20 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thanks Chris! I have updated the KIP with "transform" instead of "alias".
> > Agree it's clearer.
> >
> > Cheers,
> > Jorge.
> >
> > On Mon, 21 Nov 2022 at 21:36, Chris Egerton 
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the updates, and apologies for the delay. The new diagram
> > > directly under the "Proposed Changes" section is absolutely gorgeous!
> > >
> > >
> > > Follow-ups:
> > >
> > > RE 2: Good point. We can use the same level for these metrics, it's
> not a
> > > big deal.
> > >
> > > RE 3: As long as all the per-record metrics are kept at DEBUG level, it
> > > should be fine to leave JMH benchmarking for a follow-up. If we want to
> > add
> > > new per-record, INFO-level metrics, I would be more comfortable with
> > > including benchmarking as part of the testing plan for the KIP. One
> > > possible compromise could be to propose that these features be merged
> at
> > > DEBUG level, and then possibly upgraded to INFO level in the future
> > pending
> > > benchmarks to guard against performance degradation.
> > >
> > > RE 4: I think for a true "end-to-end" metric, it'd be useful to include
> > the
> > > time taken by the task to actually deliver the record. However, with
> the
> > > new metric names and descriptions provided in the KIP, I have no
> > objections
> > > with what's currently proposed, and a new "end-to-end" metric can be
> > taken
> > > on later in a follow-up KIP.
> > >
> > > RE 6: You're right, existing producer metrics should be enough for now.
> > We
> > > can revisit this later if/when we add delivery-centric metrics for sink
> > > tasks as well.
> > >
> > > RE 7: The new metric names in the KIP LGTM; I don't see any need to
> > expand
> > > beyond those but if you'd still like to pursue others, LMK.
> > >
> > >
> > > New thoughts:
> > >
> > > One small thought: instead of "alias" in "alias="{transform_alias}" for
> > the
> > > per-transform metrics, could we use "transform"? IMO it's clearer since
> > we
> > > don't use "alias" in the names of transform-related properties, and
> > "alias"
> > > may be confused with the classloading term where you can use, e.g.,
> > > "FileStreamSource" as the name of a connector class in a connector
> config
> > > instead of "org.apache.kafka.connect.file.FileStreamSourceConnector".
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Fri, Nov 18, 2022 at 12:06 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Thanks Mickael!
> > > >
> > > >
> > > > On Wed, 9 Nov 2022 at 15:54, Mickael Maison <
> mickael.mai...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks for the KIP, it is a nice improvement.
> > > > >
> > > > > 1) The per transformation metrics still have a question mark next
> to
> > > > > them in the KIP. Do you want to include them? If so we'll want to
> tag
> > > > > them, we should be able to include the aliases in
> TransformationChain
> > > > > and use them.
> > > > >
> > > >
> > > > Yes, I have added the changes on TransformChain that will be needed
> to
> > > add
> > > > these metrics.
> > > >
> > > >
> > > > >
> > > > > 2) I see no references to predicates. If we don't want to measure
> > > > > their latency, can we say it explicitly?
> > > &g

[jira] [Created] (KAFKA-14441) Benchmark performance impact of metrics library

2022-12-04 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14441:


 Summary: Benchmark performance impact of metrics library
 Key: KAFKA-14441
 URL: https://issues.apache.org/jira/browse/KAFKA-14441
 Project: Kafka
  Issue Type: Task
  Components: metrics
Reporter: Jorge Esteban Quilcate Otoya


While discussing KIP-864 
([https://lists.apache.org/thread/k6rh2mr7pg94935fgpqw8b5fj308f2n7]) there is a 
concern on how much impact there is on sampling metric values, particularly 
when adding metrics that record values per-record instead of per-batch.

By adding benchmarks for sampling values, there will be more confidence whether 
to design metrics to be exposed at a DEBUG or INFO level depending on their 
impact.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-11-24 Thread Jorge Esteban Quilcate Otoya

Thanks Chris! I have updated the KIP with "transform" instead of "alias".
Agree it's clearer.

Cheers,
Jorge.

On Mon, 21 Nov 2022 at 21:36, Chris Egerton  wrote:

> Hi Jorge,
>
> Thanks for the updates, and apologies for the delay. The new diagram
> directly under the "Proposed Changes" section is absolutely gorgeous!
>
>
> Follow-ups:
>
> RE 2: Good point. We can use the same level for these metrics, it's not a
> big deal.
>
> RE 3: As long as all the per-record metrics are kept at DEBUG level, it
> should be fine to leave JMH benchmarking for a follow-up. If we want to add
> new per-record, INFO-level metrics, I would be more comfortable with
> including benchmarking as part of the testing plan for the KIP. One
> possible compromise could be to propose that these features be merged at
> DEBUG level, and then possibly upgraded to INFO level in the future pending
> benchmarks to guard against performance degradation.
>
> RE 4: I think for a true "end-to-end" metric, it'd be useful to include the
> time taken by the task to actually deliver the record. However, with the
> new metric names and descriptions provided in the KIP, I have no objections
> with what's currently proposed, and a new "end-to-end" metric can be taken
> on later in a follow-up KIP.
>
> RE 6: You're right, existing producer metrics should be enough for now. We
> can revisit this later if/when we add delivery-centric metrics for sink
> tasks as well.
>
> RE 7: The new metric names in the KIP LGTM; I don't see any need to expand
> beyond those but if you'd still like to pursue others, LMK.
>
>
> New thoughts:
>
> One small thought: instead of "alias" in "alias="{transform_alias}" for the
> per-transform metrics, could we use "transform"? IMO it's clearer since we
> don't use "alias" in the names of transform-related properties, and "alias"
> may be confused with the classloading term where you can use, e.g.,
> "FileStreamSource" as the name of a connector class in a connector config
> instead of "org.apache.kafka.connect.file.FileStreamSourceConnector".
>
>
> Cheers,
>
> Chris
>
> On Fri, Nov 18, 2022 at 12:06 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thanks Mickael!
> >
> >
> > On Wed, 9 Nov 2022 at 15:54, Mickael Maison 
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the KIP, it is a nice improvement.
> > >
> > > 1) The per transformation metrics still have a question mark next to
> > > them in the KIP. Do you want to include them? If so we'll want to tag
> > > them, we should be able to include the aliases in TransformationChain
> > > and use them.
> > >
> >
> > Yes, I have added the changes on TransformChain that will be needed to
> add
> > these metrics.
> >
> >
> > >
> > > 2) I see no references to predicates. If we don't want to measure
> > > their latency, can we say it explicitly?
> > >
> >
> > Good question, I haven't considered these. Though as these are
> materialized
> > as PredicatedTransformation, they should be covered by these changes.
> > Adding a note about this.
> >
> >
> > >
> > > 3) Should we have sink-record-batch-latency-avg-ms? All other metrics
> > > have both the maximum and average values.
> > >
> > >
> > Good question. I will remove it and change the record latency from
> > DEBUG->INFO as it already cover the maximum metric.
> >
> > Hope it's clearer now, let me know if there any additional feedback.
> > Thanks!
> >
> >
> >
> > > Thanks,
> > > Mickael
> > >
> > > On Thu, Oct 20, 2022 at 9:58 PM Jorge Esteban Quilcate Otoya
> > >  wrote:
> > > >
> > > > Thanks, Chris! Great feedback! Please, find my comments below:
> > > >
> > > > On Thu, 13 Oct 2022 at 18:52, Chris Egerton  >
> > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks for the KIP. I agree with the overall direction and think
> this
> > > would
> > > > > be a nice improvement to Kafka Connect. Here are my initial
> thoughts
> > > on the
> > > > > details:
> > > > >
> > > > > 1. The motivation section outlines the gaps in Kafka Connect's task
> > > metrics
> > > > > nicely. I think it'd be useful to include more concrete details on
>

[jira] [Created] (KAFKA-14409) Clean ProcessorParameters from casting

2022-11-19 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14409:


 Summary: Clean ProcessorParameters from casting
 Key: KAFKA-14409
 URL: https://issues.apache.org/jira/browse/KAFKA-14409
 Project: Kafka
  Issue Type: Improvement
Reporter: Jorge Esteban Quilcate Otoya


ProcessorParameters currently includes a set of methods to cast to specific 
supplier types:
 * kTableSourceSupplier
 * kTableProcessorSupplier
 * kTableKTableJoinMergerProcessorSupplier

As most of these are used on specific classes, and the usage assumptions may 
vary (some expect nulls and other don't), this ticket proposes to remove these 
methods and move the casting into the class using it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14408) Consider enabling DEBUG log level on tests for Streams

2022-11-19 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14408:


 Summary: Consider enabling DEBUG log level on tests for Streams
 Key: KAFKA-14408
 URL: https://issues.apache.org/jira/browse/KAFKA-14408
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jorge Esteban Quilcate Otoya


Ideally logging should not trigger any side effect, though we have found that 
it did for https://issues.apache.org/jira/browse/KAFKA-14325.

This ticket is to request if we should consider enabling higher logging levels 
(currently INFO) during tests to validate these paths.

There may be some additional costs on log file sizes and verbosity, so it's 
open to discuss if this is worth it or not, and whether to expand this to other 
components as well.

Additional discussion: 
https://github.com/apache/kafka/pull/12859#discussion_r1027007714



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-11-18 Thread Jorge Esteban Quilcate Otoya

Thanks Mickael!


On Wed, 9 Nov 2022 at 15:54, Mickael Maison 
wrote:

> Hi Jorge,
>
> Thanks for the KIP, it is a nice improvement.
>
> 1) The per transformation metrics still have a question mark next to
> them in the KIP. Do you want to include them? If so we'll want to tag
> them, we should be able to include the aliases in TransformationChain
> and use them.
>

Yes, I have added the changes on TransformChain that will be needed to add
these metrics.


>
> 2) I see no references to predicates. If we don't want to measure
> their latency, can we say it explicitly?
>

Good question, I haven't considered these. Though as these are materialized
as PredicatedTransformation, they should be covered by these changes.
Adding a note about this.


>
> 3) Should we have sink-record-batch-latency-avg-ms? All other metrics
> have both the maximum and average values.
>
>
Good question. I will remove it and change the record latency from
DEBUG->INFO as it already cover the maximum metric.

Hope it's clearer now, let me know if there any additional feedback.
Thanks!



> Thanks,
> Mickael
>
> On Thu, Oct 20, 2022 at 9:58 PM Jorge Esteban Quilcate Otoya
>  wrote:
> >
> > Thanks, Chris! Great feedback! Please, find my comments below:
> >
> > On Thu, 13 Oct 2022 at 18:52, Chris Egerton 
> wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the KIP. I agree with the overall direction and think this
> would
> > > be a nice improvement to Kafka Connect. Here are my initial thoughts
> on the
> > > details:
> > >
> > > 1. The motivation section outlines the gaps in Kafka Connect's task
> metrics
> > > nicely. I think it'd be useful to include more concrete details on why
> > > these gaps need to be filled in, and in which cases additional metrics
> > > would be helpful. One goal could be to provide enhanced monitoring of
> > > production deployments that allows for cluster administrators to set up
> > > automatic alerts for latency spikes and, if triggered, quickly
> identify the
> > > root cause of those alerts, reducing the time to remediation. Another
> goal
> > > could be to provide more insight to developers or cluster
> administrators
> > > who want to do performance testing on connectors in non-production
> > > environments. It may help guide our decision making process to have a
> > > clearer picture of the goals we're trying to achieve.
> > >
> >
> > Agree. The Motivation section has been updated.
> > Thanks for the examples, I see both of them being covered by the KIP.
> > I see how these could give us a good distinction on whether to position
> > some metrics at INFO or DEBUG level.
> >
> >
> > > 2. If we're trying to address the alert-and-diagnose use case, it'd be
> > > useful to have as much information as possible at INFO level, rather
> than
> > > forcing cluster administrators to possibly reconfigure a connector to
> emit
> > > DEBUG or TRACE level metrics in order to diagnose a potential
> > > production-impacting performance bottleneck. I can see the rationale
> for
> > > emitting per-record metrics that track an average value at DEBUG
> level, but
> > > for per-record metrics that track a maximum value, is there any reason
> not
> > > to provide this information at INFO level?
> > >
> >
> > Agree. Though with Max and Avg metrics being part of the same sensor —
> > where Metric Level is defined — then both metrics get the same level.
> >
> >
> > > 3. I'm also curious about the performance testing suggested by Yash to
> > > gauge the potential impact of this change. Have you been able to do any
> > > testing with your draft implementation yet?
> > >
> >
> > No, not so far.
> > I think it would be valuable to discuss the scope of this testing and
> maybe
> > tackle it
> > in a separate issue as Sensors and Metrics are used all over the place.
> > My initial understanding is that these tests should by placed in the
> > jmh-benchmarks[1].
> > Then, we could target testing Sensors and Metrics, and validate how much
> > overhead
> > is added by having only Max vs Max,Avg(,Min), etc.
> > In the other hand, we could extend this to Transformers or other Connect
> > layers.
> >
> > Here are some pointers to the Sensors and Metrics implementations that
> > could be considered:
> > Path to metric recording:
> > -
> >
> https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apac

Re: [ANNOUNCE] New Kafka PMC Member: Bruno Cadonna

2022-11-02 Thread Jorge Esteban Quilcate Otoya

Congratulations, Bruno!!

On Wed, 2 Nov 2022, 09:06 Mickael Maison,  wrote:

> Congratulations Bruno!
>
> On Wed, Nov 2, 2022 at 8:33 AM Matthew Benedict de Detrich
>  wrote:
> >
> > Congratulations!
> >
> > On Wed, Nov 2, 2022 at 8:32 AM Josep Prat 
> > wrote:
> >
> > > Congrats Bruno!
> > >
> > > ———
> > > Josep Prat
> > >
> > > Aiven Deutschland GmbH
> > >
> > > Immanuelkirchstraße 26, 10405 Berlin
> > >
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >
> > > m: +491715557497
> > >
> > > w: aiven.io
> > >
> > > e: josep.p...@aiven.io
> > >
> > > On Wed, Nov 2, 2022, 08:20 Tom Bentley  wrote:
> > >
> > > > Congratulations!
> > > >
> > > > On Wed, 2 Nov 2022 at 06:40, David Jacot 
> wrote:
> > > >
> > > > > Congrats, Bruno! Well deserved.
> > > > >
> > > > > Le mer. 2 nov. 2022 à 06:12, Randall Hauch  a
> écrit
> > > :
> > > > >
> > > > > > Congratulations, Bruno!
> > > > > >
> > > > > > On Tue, Nov 1, 2022 at 11:20 PM Sagar  >
> > > > wrote:
> > > > > >
> > > > > > > Congrats Bruno!
> > > > > > >
> > > > > > > Sagar.
> > > > > > >
> > > > > > > On Wed, Nov 2, 2022 at 7:51 AM deng ziming <
> > > dengziming1...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congrats!
> > > > > > > >
> > > > > > > > --
> > > > > > > > Ziming
> > > > > > > >
> > > > > > > > > On Nov 2, 2022, at 3:36 AM, Guozhang Wang <
> wangg...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'd like to introduce our new Kafka PMC member, Bruno.
> > > > > > > > >
> > > > > > > > > Bruno has been a committer since April. 2021 and has been
> very
> > > > > active
> > > > > > > in
> > > > > > > > > the community. He's a key contributor to Kafka Streams, and
> > > also
> > > > > > helped
> > > > > > > > > review a lot of horizontal improvements such as Mockito.
> It is
> > > my
> > > > > > > > pleasure
> > > > > > > > > to announce that Bruno has agreed to join the Kafka PMC.
> > > > > > > > >
> > > > > > > > > Congratulations, Bruno!
> > > > > > > > >
> > > > > > > > > -- Guozhang Wang, on behalf of Apache Kafka PMC
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Matthew de Detrich
> >
> > *Aiven Deutschland GmbH*
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > *m:* +491603708037
> >
> > *w:* aiven.io *e:* matthew.dedetr...@aiven.io
>

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-10-20 Thread Jorge Esteban Quilcate Otoya

se entirely)?
>

It currently means when it has been received by the Sink task
right after consumer poll and before conversions.
Would it be valuable to have it after put-sink-records?


> 5. If the goal is to identify performance bottlenecks (either in production
> or pre-production environments), would it make sense to introduce metrics
> for each individual converter (i.e., key/value/header) and transformation?
> It's definitely an improvement to be able to identify the total time for
> conversion and transformation, but then the immediate follow-up question if
> a bottleneck is found in that phase is "which converter/transformation is
> responsible?" It'd be nice if we could provide a way to quickly answer that
> question.
>

This is a great idea. I'd like to consider this as well, though maybe these
more granular
metrics would be good to have them as DEBUG.


> 6. Any thoughts about offering latency metrics for source tasks between
> receipt of the record from the task and delivery of the record to Kafka
> (which would be tracked by producer callback)? We could also use the record
> timestamp either instead of or in addition to receipt time if the task
> provides a timestamp with its records.
>

With source transform and convert metrics we get part of that latency.
Looking at the Producer metrics, `request-latency` (though a very generic
metric)
sort of answer the time between send request and ack — if my understanding
is correct.
Would these be enough or you're thinking about another approach?
maybe a custom metric to cover the producer side?


> 7. We may end up introducing a way for sink tasks to record per-record
> delivery to the sink system (see KIP-767 [1]). I'd like it if we could keep
> the names of our metrics very precise in order to avoid confusing users
> (who may think that we're providing metrics on actual delivery to the sink
> system, which may not be the case if the connector performs asynchronous
> writes), and in order to leave room for a metrics on true delivery time by
> sink tasks. It'd also be nice if we could remain consistent with existing
> metrics such as "put-batch-avg-time-ms". With that in mind, what do you
> think about renaming these metrics:
> - "sink-record-batch-latency-max-ms" to "put-batch-avg-latency-ms"
> - "sink-record-latency-max-ms" to "put-sink-record-latency-max-ms"
> - "sink-record-latency-avg-ms" to "put-sink-record-latency-avg-ms"
> - "sink-record-convert-transform-time-max-ms" to
> "convert-transform-sink-record-time-max-ms"
> - "sink-record-convert-transform-time-avg-ms" to
> "convert-transform-sink-record-time-avg-ms"
> - "source-record-transform-convert-time-max-ms" to
> "transform-convert-source-record-time-max-ms"
> - "source-record-transform-convert-time-avg-ms" to
> "transform-convert-source-record-time-avg-ms"
>

Make sense, thanks! I have updated the list of metrics and group them by
sensor and applying these suggestions.
The only ones that I want to review are: sink-record-* to put-batch-*
(first 3). Not sure if put-batch/put-sink-record describes the purpose of
the metric — neither `sink-record-latency` to be honest.
My initial thought was to have something like Kafka Streams e2e-latency.
Based on 4. and 6. questions, an idea could be to add:
- source-batch-e2e-latency-before-send: measure wallclock - source record
timestamp after source connector poll.
- source-batch-e2e-latency-after-send: measure wallclock - record timestamp
on producer send callback
- sink-batch-e2e-latency-before-put: measure time wallclock - record
timestamp after consumer poll
- sink-batch-e2e-latency-after-put: measure time wallclock - record
timestamp after sink connector put.


> Thanks again for the KIP! Looking forward to your thoughts.
>
> Cheers,
>
> Chris
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
>
> On Thu, Sep 15, 2022 at 1:32 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I've made a slight addition to the KIP based on Yash feedback:
> >
> > - A new metric is added at INFO level to record the max latency from the
> > batch timestamp, by keeping the oldest record timestamp per batch.
> > - A draft implementation is linked.
> >
> > Looking forward to your feedback.
> > Also, a kindly reminder that the vote thread is open.
> >
> > Thanks!
> > Jorge.
> >
> > On Thu, 8 Sept 2022 at 14:25, Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Great. I have updated the KIP to reflect this.
> > >
> > &

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-15 Thread Jorge Esteban Quilcate Otoya

Hi everyone,

I've made a slight addition to the KIP based on Yash feedback:

- A new metric is added at INFO level to record the max latency from the
batch timestamp, by keeping the oldest record timestamp per batch.
- A draft implementation is linked.

Looking forward to your feedback.
Also, a kindly reminder that the vote thread is open.

Thanks!
Jorge.

On Thu, 8 Sept 2022 at 14:25, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Great. I have updated the KIP to reflect this.
>
> Cheers,
> Jorge.
>
> On Thu, 8 Sept 2022 at 12:26, Yash Mayya  wrote:
>
>> Thanks, I think it makes sense to define these metrics at a DEBUG
>> recording
>> level.
>>
>> On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban Quilcate Otoya <
>> quilcate.jo...@gmail.com> wrote:
>>
>> > On Thu, 8 Sept 2022 at 05:55, Yash Mayya  wrote:
>> >
>> > > Hi Jorge,
>> > >
>> > > Thanks for the changes. With regard to having per batch vs per record
>> > > metrics, the additional overhead I was referring to wasn't about
>> whether
>> > or
>> > > not we would need to iterate over all the records in a batch. I was
>> > > referring to the potential additional overhead caused by the higher
>> > volume
>> > > of calls to Sensor::record on the sensors for the new metrics (as
>> > compared
>> > > to the existing batch only metrics), especially for high throughput
>> > > connectors where batch sizes could be large. I guess we may want to do
>> > some
>> > > sort of performance testing and get concrete numbers to verify whether
>> > this
>> > > is a valid concern or not?
>> > >
>> >
>> > 6.1. Got it, thanks for clarifying. I guess there could be a benchmark
>> test
>> > of the `Sensor::record` to get an idea of the performance impact.
>> > Regardless, the fact that these are single-record metrics compared to
>> > existing batch-only could be explicitly defined by setting these
>> metrics at
>> > a DEBUG or TRACE metric recording level, leaving the existing at INFO
>> > level.
>> > wdyt?
>> >
>> >
>> > >
>> > > Thanks,
>> > > Yash
>> > >
>> > > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban Quilcate Otoya <
>> > > quilcate.jo...@gmail.com> wrote:
>> > >
>> > > > Hi Sagar and Yash,
>> > > >
>> > > > > the way it's defined in
>> > > > https://kafka.apache.org/documentation/#connect_monitoring for the
>> > > metrics
>> > > >
>> > > > 4.1. Got it. Add it to the KIP.
>> > > >
>> > > > > The only thing I would argue is do we need
>> sink-record-latency-min?
>> > > Maybe
>> > > > we
>> > > > > could remove this min metric as well and make all of the 3 e2e
>> > metrics
>> > > > > consistent
>> > > >
>> > > > 4.2 I see. Will remove it from the KIP.
>> > > >
>> > > > > Probably users can track the metrics at their end to
>> > > > > figure that out. Do you think that makes sense?
>> > > >
>> > > > 4.3. Yes, agree. With these new metrics it should be easier for
>> users
>> > to
>> > > > track this.
>> > > >
>> > > > > I think it makes sense to not have a min metric for either to
>> remain
>> > > > > consistent with the existing put-batch and poll-batch metrics
>> > > >
>> > > > 5.1. Got it. Same as 4.2
>> > > >
>> > > > > Another naming related suggestion I had was with the
>> > > > > "convert-time" metrics - we should probably include
>> transformations
>> > in
>> > > > the
>> > > > > name since SMTs could definitely be attributable to a sizable
>> chunk
>> > of
>> > > > the
>> > > > > latency depending on the specific transformation chain.
>> > > >
>> > > > 5.2. Make sense. I'm proposing to add
>> > `sink-record-convert-transform...`
>> > > > and `source-record-transform-convert...` to represent correctly the
>> > order
>> > > > of operations.
>> > > >
>> > > > > it seems like both source and sink tasks only record metrics at a
>> > > "batch"
>> >

[jira] [Created] (KAFKA-14232) Support for nested structures: InsertField

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14232:


 Summary: Support for nested structures: InsertField
 Key: KAFKA-14232
 URL: https://issues.apache.org/jira/browse/KAFKA-14232
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14231) Support for nested structures: ReplaceField

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14231:


 Summary: Support for nested structures: ReplaceField
 Key: KAFKA-14231
 URL: https://issues.apache.org/jira/browse/KAFKA-14231
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14230) Support for nested structures: Cast

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14230:


 Summary: Support for nested structures: Cast
 Key: KAFKA-14230
 URL: https://issues.apache.org/jira/browse/KAFKA-14230
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14228) Support for nested structures: ValueToKey

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14228:


 Summary: Support for nested structures: ValueToKey
 Key: KAFKA-14228
 URL: https://issues.apache.org/jira/browse/KAFKA-14228
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14229) Support for nested structures: HoistValue

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14229:


 Summary: Support for nested structures: HoistValue
 Key: KAFKA-14229
 URL: https://issues.apache.org/jira/browse/KAFKA-14229
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14227) Support for nested structures: MaskField

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14227:


 Summary: Support for nested structures: MaskField
 Key: KAFKA-14227
 URL: https://issues.apache.org/jira/browse/KAFKA-14227
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14226) Introduce support for nested structures

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-14226:


 Summary: Introduce support for nested structures
 Key: KAFKA-14226
 URL: https://issues.apache.org/jira/browse/KAFKA-14226
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Abstraction for FieldPath and initial SMTs:
 * ExtractField
 * HeaderFrom
 * TimestampConverter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-12 Thread Jorge Esteban Quilcate Otoya

Pressed send to soon. Updating subject.

On Mon, 12 Sept 2022 at 11:45, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi everyone,
>
> Thank you all for the positive discussion about KIP-864.
>
> I would like to start a voting thread, to introduce these new metrics for
> Connector tasks
>
> KIP: https://cwiki.apache.org/confluence/x/6I5rDQ
>
> Thanks,
> Jorge
>
>

[VOTE]

2022-09-12 Thread Jorge Esteban Quilcate Otoya

Hi everyone,

Thank you all for the positive discussion about KIP-864.

I would like to start a voting thread, to introduce these new metrics for
Connector tasks

KIP: https://cwiki.apache.org/confluence/x/6I5rDQ

Thanks,
Jorge

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-08 Thread Jorge Esteban Quilcate Otoya

Great. I have updated the KIP to reflect this.

Cheers,
Jorge.

On Thu, 8 Sept 2022 at 12:26, Yash Mayya  wrote:

> Thanks, I think it makes sense to define these metrics at a DEBUG recording
> level.
>
> On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > On Thu, 8 Sept 2022 at 05:55, Yash Mayya  wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the changes. With regard to having per batch vs per record
> > > metrics, the additional overhead I was referring to wasn't about
> whether
> > or
> > > not we would need to iterate over all the records in a batch. I was
> > > referring to the potential additional overhead caused by the higher
> > volume
> > > of calls to Sensor::record on the sensors for the new metrics (as
> > compared
> > > to the existing batch only metrics), especially for high throughput
> > > connectors where batch sizes could be large. I guess we may want to do
> > some
> > > sort of performance testing and get concrete numbers to verify whether
> > this
> > > is a valid concern or not?
> > >
> >
> > 6.1. Got it, thanks for clarifying. I guess there could be a benchmark
> test
> > of the `Sensor::record` to get an idea of the performance impact.
> > Regardless, the fact that these are single-record metrics compared to
> > existing batch-only could be explicitly defined by setting these metrics
> at
> > a DEBUG or TRACE metric recording level, leaving the existing at INFO
> > level.
> > wdyt?
> >
> >
> > >
> > > Thanks,
> > > Yash
> > >
> > > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Hi Sagar and Yash,
> > > >
> > > > > the way it's defined in
> > > > https://kafka.apache.org/documentation/#connect_monitoring for the
> > > metrics
> > > >
> > > > 4.1. Got it. Add it to the KIP.
> > > >
> > > > > The only thing I would argue is do we need sink-record-latency-min?
> > > Maybe
> > > > we
> > > > > could remove this min metric as well and make all of the 3 e2e
> > metrics
> > > > > consistent
> > > >
> > > > 4.2 I see. Will remove it from the KIP.
> > > >
> > > > > Probably users can track the metrics at their end to
> > > > > figure that out. Do you think that makes sense?
> > > >
> > > > 4.3. Yes, agree. With these new metrics it should be easier for users
> > to
> > > > track this.
> > > >
> > > > > I think it makes sense to not have a min metric for either to
> remain
> > > > > consistent with the existing put-batch and poll-batch metrics
> > > >
> > > > 5.1. Got it. Same as 4.2
> > > >
> > > > > Another naming related suggestion I had was with the
> > > > > "convert-time" metrics - we should probably include transformations
> > in
> > > > the
> > > > > name since SMTs could definitely be attributable to a sizable chunk
> > of
> > > > the
> > > > > latency depending on the specific transformation chain.
> > > >
> > > > 5.2. Make sense. I'm proposing to add
> > `sink-record-convert-transform...`
> > > > and `source-record-transform-convert...` to represent correctly the
> > order
> > > > of operations.
> > > >
> > > > > it seems like both source and sink tasks only record metrics at a
> > > "batch"
> > > > > level, not on an individual record level. I think it might be
> > > additional
> > > > > overhead if we want to record these new metrics all at the record
> > > level?
> > > >
> > > > 5.3. I considered at the beginning to implement all metrics at the
> > batch
> > > > level, but given how the framework process records, I fallback to the
> > > > proposed approach:
> > > > - Sink Task:
> > > >   - `WorkerSinkTask#convertMessages(msgs)` already iterates over
> > records,
> > > > so there is no additional overhead to capture record latency per
> > record.
> > > > -
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/o

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-08 Thread Jorge Esteban Quilcate Otoya

On Thu, 8 Sept 2022 at 05:55, Yash Mayya  wrote:

> Hi Jorge,
>
> Thanks for the changes. With regard to having per batch vs per record
> metrics, the additional overhead I was referring to wasn't about whether or
> not we would need to iterate over all the records in a batch. I was
> referring to the potential additional overhead caused by the higher volume
> of calls to Sensor::record on the sensors for the new metrics (as compared
> to the existing batch only metrics), especially for high throughput
> connectors where batch sizes could be large. I guess we may want to do some
> sort of performance testing and get concrete numbers to verify whether this
> is a valid concern or not?
>

6.1. Got it, thanks for clarifying. I guess there could be a benchmark test
of the `Sensor::record` to get an idea of the performance impact.
Regardless, the fact that these are single-record metrics compared to
existing batch-only could be explicitly defined by setting these metrics at
a DEBUG or TRACE metric recording level, leaving the existing at INFO level.
wdyt?


>
> Thanks,
> Yash
>
> On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Sagar and Yash,
> >
> > > the way it's defined in
> > https://kafka.apache.org/documentation/#connect_monitoring for the
> metrics
> >
> > 4.1. Got it. Add it to the KIP.
> >
> > > The only thing I would argue is do we need sink-record-latency-min?
> Maybe
> > we
> > > could remove this min metric as well and make all of the 3 e2e metrics
> > > consistent
> >
> > 4.2 I see. Will remove it from the KIP.
> >
> > > Probably users can track the metrics at their end to
> > > figure that out. Do you think that makes sense?
> >
> > 4.3. Yes, agree. With these new metrics it should be easier for users to
> > track this.
> >
> > > I think it makes sense to not have a min metric for either to remain
> > > consistent with the existing put-batch and poll-batch metrics
> >
> > 5.1. Got it. Same as 4.2
> >
> > > Another naming related suggestion I had was with the
> > > "convert-time" metrics - we should probably include transformations in
> > the
> > > name since SMTs could definitely be attributable to a sizable chunk of
> > the
> > > latency depending on the specific transformation chain.
> >
> > 5.2. Make sense. I'm proposing to add `sink-record-convert-transform...`
> > and `source-record-transform-convert...` to represent correctly the order
> > of operations.
> >
> > > it seems like both source and sink tasks only record metrics at a
> "batch"
> > > level, not on an individual record level. I think it might be
> additional
> > > overhead if we want to record these new metrics all at the record
> level?
> >
> > 5.3. I considered at the beginning to implement all metrics at the batch
> > level, but given how the framework process records, I fallback to the
> > proposed approach:
> > - Sink Task:
> >   - `WorkerSinkTask#convertMessages(msgs)` already iterates over records,
> > so there is no additional overhead to capture record latency per record.
> > -
> >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L490-L514
> >   - `WorkerSinkTask#convertAndTransformRecord(record)` actually happens
> > individually. Measuring this operation per batch would include processing
> > that is not strictly part of "convert and transform"
> > -
> >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L518
> > - Source Task:
> >   - `AbstractWorkerSourceTask#sendRecords` iterates over a batch and
> > applies transforms and convert record individually as well:
> > -
> >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L389-L390
> >
> > > This might require some additional changes -
> > > for instance, with the "sink-record-latency" metric, we might only want
> > to
> > > have a "max" metric since "avg" would require recording a value on the
> > > sensor for each record (whereas we can get a "max" by only recording a
> > > metric value for the oldest recor

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-06 Thread Jorge Esteban Quilcate Otoya

type=sink-task-metrics,connector="{connector}",task="{task}"
> >
> > the way it's defined in
> > https://kafka.apache.org/documentation/#connect_monitoring for the
> > metrics.
> >
> > I see what you mean by the 3 metrics and how it can be interpreted. The
> > only thing I would argue is do we need sink-record-latency-min? Maybe we
> > could remove this min metric as well and make all of the 3 e2e metrics
> > consistent(since put-batch also doesn't expose a min which makes sense to
> > me). I think this is in contrast to what Yash pointed out above so I
> would
> > like to hear his thoughts as well.
> >
> > The other point Yash mentioned about the slightly flawed definition of
> e2e
> > is also true in a sense. But I have a feeling that's one the records are
> > polled by the connector tasks, it would be difficult to track the final
> leg
> > via the framework. Probably users can track the metrics at their end to
> > figure that out. Do you think that makes sense?
> >
> > Thanks!
> > Sagar.
> >
> >
> >
> >
> > On Thu, Sep 1, 2022 at 11:40 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi Sagar and Yash,
> > >
> > > Thanks for your feedback!
> > >
> > > > 1) I am assuming the new metrics would be task level metric.
> > >
> > > 1.1 Yes, it will be a task level metric, implemented on the
> > > Worker[Source/Sink]Task.
> > >
> > > > Could you specify the way it's done for other sink/source connector?
> > >
> > > 1.2. Not sure what do you mean by this. Could you elaborate a bit more?
> > >
> > > > 2. I am slightly confused about the e2e latency metric...
> > >
> > > 2.1. Yes, I see. I was trying to bring a similar concept as in Streams
> > with
> > > KIP-613, though the e2e concept may not be translatable.
> > > We could keep it as `sink-record-latency` to avoid conflating
> concepts. A
> > > similar metric naming was proposed in KIP-489 but at the consumer
> level —
> > > though it seems dormant for a couple of years.
> > >
> > > > However, the put-batch time measures the
> > > > time to put a batch of records to external sink. So, I would assume
> > the 2
> > > > can't be added as is to compute the e2e latency. Maybe I am missing
> > > > something here. Could you plz clarify this.
> > >
> > > 2.2. Yes, agree. Not necessarily added, but with the 3 latencies (poll,
> > > convert, putBatch) will be clearer where the bottleneck may be, and
> > > represent the internal processing.
> > >
> > > > however, as per the KIP it looks like it will be
> > > > the latency between when the record was written to Kafka and when the
> > > > record is returned by a sink task's consumer's poll?
> > >
> > > 3.1. Agree. 2.1. could help to clarify this.
> > >
> > > > One more thing - I was wondering
> > > > if there's a particular reason for having a min metric for e2e
> latency
> > > but
> > > > not for convert time?
> > >
> > > 3.2. Was following KIP-613 for e2e which seems useful to compare with
> > Max a
> > > get an idea of the window of results, though current latencies in
> > Connector
> > > do not include Min, and that's why I haven't added it for convert
> > latency.
> > > Do you think it make sense to extend latency metrics with Min?
> > >
> > > KIP is updated to clarify some of these changes.
> > >
> > > Many thanks,
> > > Jorge.
> > >
> > > On Thu, 1 Sept 2022 at 18:11, Yash Mayya  wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > Thanks for the KIP! I have the same confusion with the e2e-latency
> > > metrics
> > > > as Sagar above. "e2e" would seem to indicate the latency between when
> > the
> > > > record was written to Kafka and when the record was written to the
> sink
> > > > system by the connector - however, as per the KIP it looks like it
> will
> > > be
> > > > the latency between when the record was written to Kafka and when the
> > > > record is returned by a sink task's consumer's poll? I think that
> > metric
> > > > will be a little confusing to interpret. One more thing - I was
> > wondering
> > > > if there's a particular reason for having a min metric for e2

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-01 Thread Jorge Esteban Quilcate Otoya

Hi Sagar and Yash,

Thanks for your feedback!

> 1) I am assuming the new metrics would be task level metric.

1.1 Yes, it will be a task level metric, implemented on the
Worker[Source/Sink]Task.

> Could you specify the way it's done for other sink/source connector?

1.2. Not sure what do you mean by this. Could you elaborate a bit more?

> 2. I am slightly confused about the e2e latency metric...

2.1. Yes, I see. I was trying to bring a similar concept as in Streams with
KIP-613, though the e2e concept may not be translatable.
We could keep it as `sink-record-latency` to avoid conflating concepts. A
similar metric naming was proposed in KIP-489 but at the consumer level —
though it seems dormant for a couple of years.

> However, the put-batch time measures the
> time to put a batch of records to external sink. So, I would assume the 2
> can't be added as is to compute the e2e latency. Maybe I am missing
> something here. Could you plz clarify this.

2.2. Yes, agree. Not necessarily added, but with the 3 latencies (poll,
convert, putBatch) will be clearer where the bottleneck may be, and
represent the internal processing.

> however, as per the KIP it looks like it will be
> the latency between when the record was written to Kafka and when the
> record is returned by a sink task's consumer's poll?

3.1. Agree. 2.1. could help to clarify this.

> One more thing - I was wondering
> if there's a particular reason for having a min metric for e2e latency but
> not for convert time?

3.2. Was following KIP-613 for e2e which seems useful to compare with Max a
get an idea of the window of results, though current latencies in Connector
do not include Min, and that's why I haven't added it for convert latency.
Do you think it make sense to extend latency metrics with Min?

KIP is updated to clarify some of these changes.

Many thanks,
Jorge.

On Thu, 1 Sept 2022 at 18:11, Yash Mayya  wrote:

> Hi Jorge,
>
> Thanks for the KIP! I have the same confusion with the e2e-latency metrics
> as Sagar above. "e2e" would seem to indicate the latency between when the
> record was written to Kafka and when the record was written to the sink
> system by the connector - however, as per the KIP it looks like it will be
> the latency between when the record was written to Kafka and when the
> record is returned by a sink task's consumer's poll? I think that metric
> will be a little confusing to interpret. One more thing - I was wondering
> if there's a particular reason for having a min metric for e2e latency but
> not for convert time?
>
> Thanks,
> Yash
>
> On Thu, Sep 1, 2022 at 8:59 PM Sagar  wrote:
>
> > Hi Jorge,
> >
> > Thanks for the KIP. It looks like a very good addition. I skimmed through
> > once and had a couple of questions =>
> >
> > 1) I am assuming the new metrics would be task level metric. Could you
> > specify the way it's done for other sink/source connector?
> > 2) I am slightly confused about the e2e latency metric. Let's consider
> the
> > sink connector metric. If I look at the way it's supposed to be
> calculated,
> > i.e the difference between the record timestamp and the wall clock time,
> it
> > looks like a per record metric. However, the put-batch time measures the
> > time to put a batch of records to external sink. So, I would assume the 2
> > can't be added as is to compute the e2e latency. Maybe I am missing
> > something here. Could you plz clarify this.
> >
> > Thanks!
> > Sagar.
> >
> > On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a discussion thread on KIP-864: Add End-To-End
> Latency
> > > Metrics to Connectors.
> > > This KIP aims to improve the metrics available on Source and Sink
> > > Connectors to measure end-to-end latency, including source and sink
> > record
> > > conversion time, and sink record e2e latency (similar to KIP-613 for
> > > Streams).
> > >
> > > The KIP is here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > >
> > > Please take a look and let me know what you think.
> > >
> > > Cheers,
> > > Jorge.
> > >
> >
>

1 2 3 >

1 - 100 of 297 matches

Mail list logo