Re: [VOTE] KIP-714: Client metrics and observability

2023-12-09 Thread Andrew Schofield
I’d like to summarise the minor changes we made to KIP-714 as we completed the 
code
in Kafka 3.7.0.

* Introduced “*” to differentiate “all metrics subscribed” from “no metrics 
subscribed”.
* Corrected the ACL operation for AlterConfigs to ALTER_CONFIGS on CLUSTER.
* Removed the “block” option from `kafka-client-metrics.sh` because the idea 
needs
additional work to make it workable.
* Corrected of CRC32 to CRC32C.
* Added missing exceptions in the admin client interfaces.
* Some uses of “client metrics subscriptions” were incorrect and have been 
replaced
with “client metrics configuration resources”. The subscriptions are derived 
from
the configuration resources, but they are not the same thing.

Thanks,
Andrew


> On 16 Oct 2023, at 09:18, Andrew Schofield 
>  wrote:
>
> The vote for KIP-714 has now concluded and the KIP is APPROVED.
>
> The votes are:
> Binding:
>   +4 (Jason, Matthias, Sophie, Jun)
> Non-binding:
>   +3 (Milind, Kirk, Philip)
>   -1 (Ryanne)
>
> This KIP aims to improve monitoring and troubleshooting of client
> performance by enabling clients to push metrics to brokers. The lack of
> consistent telemetry across clients is an operational gap, and many cluster
> operators do not have control over the clients. Often, asking the client owner
> to change the configuration or even application code in order to troubleshoot
> problems is not workable. This is why the KIP enables the broker to request
> metrics from clients, giving a consistent, cross-platform mechanism.
>
> The feature is enabled by configuring a metrics plugin on the brokers which
> implements the ClientTelemetry interface. In the absence of a plugin with this
> interface, the brokers do not even support the new RPCs in this KIP and the
> clients will not attempt or be able to push metrics. So, a vanilla Apache 
> Kafka
> broker will not collect metrics.
>
> I would like to make available an open-source implementation of the 
> ClientTelemetry
> interface that works with an open-source monitoring solution.
>
> The KIP does put support for OTLP serialisation into the client, so there are
> new dependencies in the Java client, which are bundled and relocated (shaded).
> OTLP also opens up other use cases involving OpenTelemetry in the future, 
> which
> is emerging as the de facto standard for telemetry, and observability in 
> general.
>
> Thanks to everyone who has contributed to KIP-714 since Magnus Edenhill
> kicked it all off in February 2021.
>
> Andrew
>
>> On 14 Oct 2023, at 01:52, Jun Rao  wrote:
>>
>> Hi, Andrew,
>>
>> Thanks for the KIP. +1 from me too.
>>
>> Jun
>>
>> On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman 
>> wrote:
>>
>>> This looks great! +1 (binding)
>>>
>>> Sophie
>>>
>>> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax  wrote:
>>>
 +1 (binding)

 On 9/13/23 5:48 PM, Jason Gustafson wrote:
> Hey Andrew,
>
> +1 on the KIP. For many users of Kafka, it may not be fully understood
 how
> much of a challenge client monitoring is. With tens of clients in a
> cluster, it is already difficult to coordinate metrics collection. When
> there are thousands of clients, and when the cluster operator has no
> control over them, it is essentially impossible. For the fat clients
>>> that
> we have, the lack of useful telemetry is a huge operational gap.
> Consistency between clients has also been a major challenge. I think
>>> the
> effort toward standardization in this KIP will have some positive
>>> impact
> even in deployments which have effective client-side monitoring.
 Overall, I
> think this proposal will provide a lot of value across the board.
>
> Best,
> Jason
>
> On Wed, Sep 13, 2023 at 9:50 AM Philip Nee 
>>> wrote:
>
>> Hey Andrew -
>>
>> Thank you for taking the time to reply to my questions. I'm just
>>> adding
>> some notes to this discussion.
>>
>> 1. epoch: It can be helpful to know the delta of the client side and
>>> the
>> actual leader epoch.  It is helpful to understand why sometimes commit
>> fails/client not making progress.
>> 2. Client connection: If the client selects the "wrong" connection to
 push
>> out the data, I assume the request would timeout; which should lead to
>> disconnecting from the node and reselecting another node as you
 mentioned,
>> via the least loaded node.
>>
>> Cheers,
>> P
>>
>>
>> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>>
>>> Hi Philip,
>>> Thanks for your vote and interest in the KIP.
>>>
>>> KIP-714 does not introduce any new client metrics, and that’s
>> intentional.
>>> It does
>>> tell how that all of the client metrics can have their names
 transformed
>>> into
>>> equivalent "telemetry metric names”, and then potentially used in
 metrics
>>> subscriptions.
>>>

Re: [VOTE] KIP-714: Client metrics and observability

2023-10-16 Thread Andrew Schofield
The vote for KIP-714 has now concluded and the KIP is APPROVED.

The votes are:
Binding:
   +4 (Jason, Matthias, Sophie, Jun)
Non-binding:
   +3 (Milind, Kirk, Philip)
   -1 (Ryanne)

This KIP aims to improve monitoring and troubleshooting of client
performance by enabling clients to push metrics to brokers. The lack of
consistent telemetry across clients is an operational gap, and many cluster
operators do not have control over the clients. Often, asking the client owner
to change the configuration or even application code in order to troubleshoot
problems is not workable. This is why the KIP enables the broker to request
metrics from clients, giving a consistent, cross-platform mechanism.

The feature is enabled by configuring a metrics plugin on the brokers which
implements the ClientTelemetry interface. In the absence of a plugin with this
interface, the brokers do not even support the new RPCs in this KIP and the
clients will not attempt or be able to push metrics. So, a vanilla Apache Kafka
broker will not collect metrics.

I would like to make available an open-source implementation of the 
ClientTelemetry
interface that works with an open-source monitoring solution.

The KIP does put support for OTLP serialisation into the client, so there are
new dependencies in the Java client, which are bundled and relocated (shaded).
OTLP also opens up other use cases involving OpenTelemetry in the future, which
is emerging as the de facto standard for telemetry, and observability in 
general.

Thanks to everyone who has contributed to KIP-714 since Magnus Edenhill
kicked it all off in February 2021.

Andrew

> On 14 Oct 2023, at 01:52, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the KIP. +1 from me too.
>
> Jun
>
> On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman 
> wrote:
>
>> This looks great! +1 (binding)
>>
>> Sophie
>>
>> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax  wrote:
>>
>>> +1 (binding)
>>>
>>> On 9/13/23 5:48 PM, Jason Gustafson wrote:
 Hey Andrew,

 +1 on the KIP. For many users of Kafka, it may not be fully understood
>>> how
 much of a challenge client monitoring is. With tens of clients in a
 cluster, it is already difficult to coordinate metrics collection. When
 there are thousands of clients, and when the cluster operator has no
 control over them, it is essentially impossible. For the fat clients
>> that
 we have, the lack of useful telemetry is a huge operational gap.
 Consistency between clients has also been a major challenge. I think
>> the
 effort toward standardization in this KIP will have some positive
>> impact
 even in deployments which have effective client-side monitoring.
>>> Overall, I
 think this proposal will provide a lot of value across the board.

 Best,
 Jason

 On Wed, Sep 13, 2023 at 9:50 AM Philip Nee 
>> wrote:

> Hey Andrew -
>
> Thank you for taking the time to reply to my questions. I'm just
>> adding
> some notes to this discussion.
>
> 1. epoch: It can be helpful to know the delta of the client side and
>> the
> actual leader epoch.  It is helpful to understand why sometimes commit
> fails/client not making progress.
> 2. Client connection: If the client selects the "wrong" connection to
>>> push
> out the data, I assume the request would timeout; which should lead to
> disconnecting from the node and reselecting another node as you
>>> mentioned,
> via the least loaded node.
>
> Cheers,
> P
>
>
> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Philip,
>> Thanks for your vote and interest in the KIP.
>>
>> KIP-714 does not introduce any new client metrics, and that’s
> intentional.
>> It does
>> tell how that all of the client metrics can have their names
>>> transformed
>> into
>> equivalent "telemetry metric names”, and then potentially used in
>>> metrics
>> subscriptions.
>>
>> I am interested in the idea of client’s leader epoch in this context,
>>> but
>> I don’t have
>> an immediate plan for how best to do this, and it would take another
>>> KIP
>> to enhance
>> existing metrics or introduce some new ones. Those would then
>> naturally
> be
>> applicable to the metrics push introduced in KIP-714.
>>
>> In a similar vein, there are no existing client metrics specifically
>>> for
>> auto-commit.
>> We could add them to Kafka, but I really think this is just an
>> example
>>> of
>> asynchronous
>> commit in which the application has decided not to specify when the
> commit
>> should
>> begin.
>>
>> It is possible to increase the cadence of pushing by modifying the
>> interval.ms
>> configuration property of the CLIENT_METRICS resource.
>>
>> There is an “assigned-partitions” metric for each consumer, but not
>> 

Re: [VOTE] KIP-714: Client metrics and observability

2023-10-13 Thread Jun Rao
Hi, Andrew,

Thanks for the KIP. +1 from me too.

Jun

On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman 
wrote:

> This looks great! +1 (binding)
>
> Sophie
>
> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax  wrote:
>
> > +1 (binding)
> >
> > On 9/13/23 5:48 PM, Jason Gustafson wrote:
> > > Hey Andrew,
> > >
> > > +1 on the KIP. For many users of Kafka, it may not be fully understood
> > how
> > > much of a challenge client monitoring is. With tens of clients in a
> > > cluster, it is already difficult to coordinate metrics collection. When
> > > there are thousands of clients, and when the cluster operator has no
> > > control over them, it is essentially impossible. For the fat clients
> that
> > > we have, the lack of useful telemetry is a huge operational gap.
> > > Consistency between clients has also been a major challenge. I think
> the
> > > effort toward standardization in this KIP will have some positive
> impact
> > > even in deployments which have effective client-side monitoring.
> > Overall, I
> > > think this proposal will provide a lot of value across the board.
> > >
> > > Best,
> > > Jason
> > >
> > > On Wed, Sep 13, 2023 at 9:50 AM Philip Nee 
> wrote:
> > >
> > >> Hey Andrew -
> > >>
> > >> Thank you for taking the time to reply to my questions. I'm just
> adding
> > >> some notes to this discussion.
> > >>
> > >> 1. epoch: It can be helpful to know the delta of the client side and
> the
> > >> actual leader epoch.  It is helpful to understand why sometimes commit
> > >> fails/client not making progress.
> > >> 2. Client connection: If the client selects the "wrong" connection to
> > push
> > >> out the data, I assume the request would timeout; which should lead to
> > >> disconnecting from the node and reselecting another node as you
> > mentioned,
> > >> via the least loaded node.
> > >>
> > >> Cheers,
> > >> P
> > >>
> > >>
> > >> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
> > >> andrew_schofield_j...@outlook.com> wrote:
> > >>
> > >>> Hi Philip,
> > >>> Thanks for your vote and interest in the KIP.
> > >>>
> > >>> KIP-714 does not introduce any new client metrics, and that’s
> > >> intentional.
> > >>> It does
> > >>> tell how that all of the client metrics can have their names
> > transformed
> > >>> into
> > >>> equivalent "telemetry metric names”, and then potentially used in
> > metrics
> > >>> subscriptions.
> > >>>
> > >>> I am interested in the idea of client’s leader epoch in this context,
> > but
> > >>> I don’t have
> > >>> an immediate plan for how best to do this, and it would take another
> > KIP
> > >>> to enhance
> > >>> existing metrics or introduce some new ones. Those would then
> naturally
> > >> be
> > >>> applicable to the metrics push introduced in KIP-714.
> > >>>
> > >>> In a similar vein, there are no existing client metrics specifically
> > for
> > >>> auto-commit.
> > >>> We could add them to Kafka, but I really think this is just an
> example
> > of
> > >>> asynchronous
> > >>> commit in which the application has decided not to specify when the
> > >> commit
> > >>> should
> > >>> begin.
> > >>>
> > >>> It is possible to increase the cadence of pushing by modifying the
> > >>> interval.ms
> > >>> configuration property of the CLIENT_METRICS resource.
> > >>>
> > >>> There is an “assigned-partitions” metric for each consumer, but not
> one
> > >> for
> > >>> active partitions. We could add one, again as a follow-on KIP.
> > >>>
> > >>> I take your point about holding on to a connection in a channel which
> > >> might
> > >>> experience congestion. Do you have a suggestion for how to improve on
> > >> this?
> > >>> For example, the client does have the concept of a least-loaded node.
> > >> Maybe
> > >>> this is something we should investigate in the implementation and
> > decide
> > >>> on the
> > >>> best approach. In general, I think sticking with the same node for
> > >>> consecutive
> > >>> pushes is best, but if you choose the “wrong” node to start with,
> it’s
> > >> not
> > >>> ideal.
> > >>>
> > >>> Thanks,
> > >>> Andrew
> > >>>
> >  On 8 Sep 2023, at 19:29, Philip Nee  wrote:
> > 
> >  Hey Andrew -
> > 
> >  +1 but I don't have a binding vote!
> > 
> >  It took me a while to go through the KIP. Here are some of my notes
> > >>> during
> >  the reading:
> > 
> >  *Metrics*
> >  - Should we care about the client's leader epoch? There is a case
> > where
> > >>> the
> >  user recreates the topic, but the consumer thinks it is still the
> same
> >  topic and therefore, attempts to start from an offset that doesn't
> > >> exist.
> >  KIP-848 addresses this issue, but I can still see some potential
> > >> benefits
> >  from knowing the client's epoch information.
> >  - I assume poll idle is similar to poll interval: I needed to read
> the
> >  description a few times.
> >  - I don't have a clear use case in mind for the commit latency, but
> I
> > >> do
> >  

Re: [VOTE] KIP-714: Client metrics and observability

2023-10-11 Thread Sophie Blee-Goldman
This looks great! +1 (binding)

Sophie

On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax  wrote:

> +1 (binding)
>
> On 9/13/23 5:48 PM, Jason Gustafson wrote:
> > Hey Andrew,
> >
> > +1 on the KIP. For many users of Kafka, it may not be fully understood
> how
> > much of a challenge client monitoring is. With tens of clients in a
> > cluster, it is already difficult to coordinate metrics collection. When
> > there are thousands of clients, and when the cluster operator has no
> > control over them, it is essentially impossible. For the fat clients that
> > we have, the lack of useful telemetry is a huge operational gap.
> > Consistency between clients has also been a major challenge. I think the
> > effort toward standardization in this KIP will have some positive impact
> > even in deployments which have effective client-side monitoring.
> Overall, I
> > think this proposal will provide a lot of value across the board.
> >
> > Best,
> > Jason
> >
> > On Wed, Sep 13, 2023 at 9:50 AM Philip Nee  wrote:
> >
> >> Hey Andrew -
> >>
> >> Thank you for taking the time to reply to my questions. I'm just adding
> >> some notes to this discussion.
> >>
> >> 1. epoch: It can be helpful to know the delta of the client side and the
> >> actual leader epoch.  It is helpful to understand why sometimes commit
> >> fails/client not making progress.
> >> 2. Client connection: If the client selects the "wrong" connection to
> push
> >> out the data, I assume the request would timeout; which should lead to
> >> disconnecting from the node and reselecting another node as you
> mentioned,
> >> via the least loaded node.
> >>
> >> Cheers,
> >> P
> >>
> >>
> >> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
> >> andrew_schofield_j...@outlook.com> wrote:
> >>
> >>> Hi Philip,
> >>> Thanks for your vote and interest in the KIP.
> >>>
> >>> KIP-714 does not introduce any new client metrics, and that’s
> >> intentional.
> >>> It does
> >>> tell how that all of the client metrics can have their names
> transformed
> >>> into
> >>> equivalent "telemetry metric names”, and then potentially used in
> metrics
> >>> subscriptions.
> >>>
> >>> I am interested in the idea of client’s leader epoch in this context,
> but
> >>> I don’t have
> >>> an immediate plan for how best to do this, and it would take another
> KIP
> >>> to enhance
> >>> existing metrics or introduce some new ones. Those would then naturally
> >> be
> >>> applicable to the metrics push introduced in KIP-714.
> >>>
> >>> In a similar vein, there are no existing client metrics specifically
> for
> >>> auto-commit.
> >>> We could add them to Kafka, but I really think this is just an example
> of
> >>> asynchronous
> >>> commit in which the application has decided not to specify when the
> >> commit
> >>> should
> >>> begin.
> >>>
> >>> It is possible to increase the cadence of pushing by modifying the
> >>> interval.ms
> >>> configuration property of the CLIENT_METRICS resource.
> >>>
> >>> There is an “assigned-partitions” metric for each consumer, but not one
> >> for
> >>> active partitions. We could add one, again as a follow-on KIP.
> >>>
> >>> I take your point about holding on to a connection in a channel which
> >> might
> >>> experience congestion. Do you have a suggestion for how to improve on
> >> this?
> >>> For example, the client does have the concept of a least-loaded node.
> >> Maybe
> >>> this is something we should investigate in the implementation and
> decide
> >>> on the
> >>> best approach. In general, I think sticking with the same node for
> >>> consecutive
> >>> pushes is best, but if you choose the “wrong” node to start with, it’s
> >> not
> >>> ideal.
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
>  On 8 Sep 2023, at 19:29, Philip Nee  wrote:
> 
>  Hey Andrew -
> 
>  +1 but I don't have a binding vote!
> 
>  It took me a while to go through the KIP. Here are some of my notes
> >>> during
>  the reading:
> 
>  *Metrics*
>  - Should we care about the client's leader epoch? There is a case
> where
> >>> the
>  user recreates the topic, but the consumer thinks it is still the same
>  topic and therefore, attempts to start from an offset that doesn't
> >> exist.
>  KIP-848 addresses this issue, but I can still see some potential
> >> benefits
>  from knowing the client's epoch information.
>  - I assume poll idle is similar to poll interval: I needed to read the
>  description a few times.
>  - I don't have a clear use case in mind for the commit latency, but I
> >> do
>  think sometimes people lack clarity about how much progress was
> tracked
> >>> by
>  the auto-commit.  Would tracking auto-commit-related metrics be
> >> useful? I
>  was thinking: the last offset committed or the actual cadence in ms.
>  - Are there cases when we need to increase the cadence of telemetry
> >> data
>  push? i.e. variable interval.
>  - Thanks for implementing the randomized 

Re: [VOTE] KIP-714: Client metrics and observability

2023-10-11 Thread Matthias J. Sax

+1 (binding)

On 9/13/23 5:48 PM, Jason Gustafson wrote:

Hey Andrew,

+1 on the KIP. For many users of Kafka, it may not be fully understood how
much of a challenge client monitoring is. With tens of clients in a
cluster, it is already difficult to coordinate metrics collection. When
there are thousands of clients, and when the cluster operator has no
control over them, it is essentially impossible. For the fat clients that
we have, the lack of useful telemetry is a huge operational gap.
Consistency between clients has also been a major challenge. I think the
effort toward standardization in this KIP will have some positive impact
even in deployments which have effective client-side monitoring. Overall, I
think this proposal will provide a lot of value across the board.

Best,
Jason

On Wed, Sep 13, 2023 at 9:50 AM Philip Nee  wrote:


Hey Andrew -

Thank you for taking the time to reply to my questions. I'm just adding
some notes to this discussion.

1. epoch: It can be helpful to know the delta of the client side and the
actual leader epoch.  It is helpful to understand why sometimes commit
fails/client not making progress.
2. Client connection: If the client selects the "wrong" connection to push
out the data, I assume the request would timeout; which should lead to
disconnecting from the node and reselecting another node as you mentioned,
via the least loaded node.

Cheers,
P


On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:


Hi Philip,
Thanks for your vote and interest in the KIP.

KIP-714 does not introduce any new client metrics, and that’s

intentional.

It does
tell how that all of the client metrics can have their names transformed
into
equivalent "telemetry metric names”, and then potentially used in metrics
subscriptions.

I am interested in the idea of client’s leader epoch in this context, but
I don’t have
an immediate plan for how best to do this, and it would take another KIP
to enhance
existing metrics or introduce some new ones. Those would then naturally

be

applicable to the metrics push introduced in KIP-714.

In a similar vein, there are no existing client metrics specifically for
auto-commit.
We could add them to Kafka, but I really think this is just an example of
asynchronous
commit in which the application has decided not to specify when the

commit

should
begin.

It is possible to increase the cadence of pushing by modifying the
interval.ms
configuration property of the CLIENT_METRICS resource.

There is an “assigned-partitions” metric for each consumer, but not one

for

active partitions. We could add one, again as a follow-on KIP.

I take your point about holding on to a connection in a channel which

might

experience congestion. Do you have a suggestion for how to improve on

this?

For example, the client does have the concept of a least-loaded node.

Maybe

this is something we should investigate in the implementation and decide
on the
best approach. In general, I think sticking with the same node for
consecutive
pushes is best, but if you choose the “wrong” node to start with, it’s

not

ideal.

Thanks,
Andrew


On 8 Sep 2023, at 19:29, Philip Nee  wrote:

Hey Andrew -

+1 but I don't have a binding vote!

It took me a while to go through the KIP. Here are some of my notes

during

the reading:

*Metrics*
- Should we care about the client's leader epoch? There is a case where

the

user recreates the topic, but the consumer thinks it is still the same
topic and therefore, attempts to start from an offset that doesn't

exist.

KIP-848 addresses this issue, but I can still see some potential

benefits

from knowing the client's epoch information.
- I assume poll idle is similar to poll interval: I needed to read the
description a few times.
- I don't have a clear use case in mind for the commit latency, but I

do

think sometimes people lack clarity about how much progress was tracked

by

the auto-commit.  Would tracking auto-commit-related metrics be

useful? I

was thinking: the last offset committed or the actual cadence in ms.
- Are there cases when we need to increase the cadence of telemetry

data

push? i.e. variable interval.
- Thanks for implementing the randomized initial metric push; I think

it

is

really important.
- Is there a potential use case for tracking the number of active
partitions? The consumer can pause partitions via API, during

revocation,

or during offset reset for the stream.

*Connections*:
- The KIP stated that it will keep the same connection until the

connection

is disconnected. I wonder if that could potentially cause congestion if

it

is already a busy channel, which leads to connection timeout and
subsequently disconnection.

Thanks,
P

On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:


Bumping the voting thread for KIP-714.

So far, we have:
Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)

Thanks,
Andrew


On 4 Aug 2023, at 

Re: [VOTE] KIP-714: Client metrics and observability

2023-09-13 Thread Jason Gustafson
Hey Andrew,

+1 on the KIP. For many users of Kafka, it may not be fully understood how
much of a challenge client monitoring is. With tens of clients in a
cluster, it is already difficult to coordinate metrics collection. When
there are thousands of clients, and when the cluster operator has no
control over them, it is essentially impossible. For the fat clients that
we have, the lack of useful telemetry is a huge operational gap.
Consistency between clients has also been a major challenge. I think the
effort toward standardization in this KIP will have some positive impact
even in deployments which have effective client-side monitoring. Overall, I
think this proposal will provide a lot of value across the board.

Best,
Jason

On Wed, Sep 13, 2023 at 9:50 AM Philip Nee  wrote:

> Hey Andrew -
>
> Thank you for taking the time to reply to my questions. I'm just adding
> some notes to this discussion.
>
> 1. epoch: It can be helpful to know the delta of the client side and the
> actual leader epoch.  It is helpful to understand why sometimes commit
> fails/client not making progress.
> 2. Client connection: If the client selects the "wrong" connection to push
> out the data, I assume the request would timeout; which should lead to
> disconnecting from the node and reselecting another node as you mentioned,
> via the least loaded node.
>
> Cheers,
> P
>
>
> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
> > Hi Philip,
> > Thanks for your vote and interest in the KIP.
> >
> > KIP-714 does not introduce any new client metrics, and that’s
> intentional.
> > It does
> > tell how that all of the client metrics can have their names transformed
> > into
> > equivalent "telemetry metric names”, and then potentially used in metrics
> > subscriptions.
> >
> > I am interested in the idea of client’s leader epoch in this context, but
> > I don’t have
> > an immediate plan for how best to do this, and it would take another KIP
> > to enhance
> > existing metrics or introduce some new ones. Those would then naturally
> be
> > applicable to the metrics push introduced in KIP-714.
> >
> > In a similar vein, there are no existing client metrics specifically for
> > auto-commit.
> > We could add them to Kafka, but I really think this is just an example of
> > asynchronous
> > commit in which the application has decided not to specify when the
> commit
> > should
> > begin.
> >
> > It is possible to increase the cadence of pushing by modifying the
> > interval.ms
> > configuration property of the CLIENT_METRICS resource.
> >
> > There is an “assigned-partitions” metric for each consumer, but not one
> for
> > active partitions. We could add one, again as a follow-on KIP.
> >
> > I take your point about holding on to a connection in a channel which
> might
> > experience congestion. Do you have a suggestion for how to improve on
> this?
> > For example, the client does have the concept of a least-loaded node.
> Maybe
> > this is something we should investigate in the implementation and decide
> > on the
> > best approach. In general, I think sticking with the same node for
> > consecutive
> > pushes is best, but if you choose the “wrong” node to start with, it’s
> not
> > ideal.
> >
> > Thanks,
> > Andrew
> >
> > > On 8 Sep 2023, at 19:29, Philip Nee  wrote:
> > >
> > > Hey Andrew -
> > >
> > > +1 but I don't have a binding vote!
> > >
> > > It took me a while to go through the KIP. Here are some of my notes
> > during
> > > the reading:
> > >
> > > *Metrics*
> > > - Should we care about the client's leader epoch? There is a case where
> > the
> > > user recreates the topic, but the consumer thinks it is still the same
> > > topic and therefore, attempts to start from an offset that doesn't
> exist.
> > > KIP-848 addresses this issue, but I can still see some potential
> benefits
> > > from knowing the client's epoch information.
> > > - I assume poll idle is similar to poll interval: I needed to read the
> > > description a few times.
> > > - I don't have a clear use case in mind for the commit latency, but I
> do
> > > think sometimes people lack clarity about how much progress was tracked
> > by
> > > the auto-commit.  Would tracking auto-commit-related metrics be
> useful? I
> > > was thinking: the last offset committed or the actual cadence in ms.
> > > - Are there cases when we need to increase the cadence of telemetry
> data
> > > push? i.e. variable interval.
> > > - Thanks for implementing the randomized initial metric push; I think
> it
> > is
> > > really important.
> > > - Is there a potential use case for tracking the number of active
> > > partitions? The consumer can pause partitions via API, during
> revocation,
> > > or during offset reset for the stream.
> > >
> > > *Connections*:
> > > - The KIP stated that it will keep the same connection until the
> > connection
> > > is disconnected. I wonder if that could potentially cause congestion if
> > it
> > > is 

Re: [VOTE] KIP-714: Client metrics and observability

2023-09-13 Thread Philip Nee
Hey Andrew -

Thank you for taking the time to reply to my questions. I'm just adding
some notes to this discussion.

1. epoch: It can be helpful to know the delta of the client side and the
actual leader epoch.  It is helpful to understand why sometimes commit
fails/client not making progress.
2. Client connection: If the client selects the "wrong" connection to push
out the data, I assume the request would timeout; which should lead to
disconnecting from the node and reselecting another node as you mentioned,
via the least loaded node.

Cheers,
P


On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi Philip,
> Thanks for your vote and interest in the KIP.
>
> KIP-714 does not introduce any new client metrics, and that’s intentional.
> It does
> tell how that all of the client metrics can have their names transformed
> into
> equivalent "telemetry metric names”, and then potentially used in metrics
> subscriptions.
>
> I am interested in the idea of client’s leader epoch in this context, but
> I don’t have
> an immediate plan for how best to do this, and it would take another KIP
> to enhance
> existing metrics or introduce some new ones. Those would then naturally be
> applicable to the metrics push introduced in KIP-714.
>
> In a similar vein, there are no existing client metrics specifically for
> auto-commit.
> We could add them to Kafka, but I really think this is just an example of
> asynchronous
> commit in which the application has decided not to specify when the commit
> should
> begin.
>
> It is possible to increase the cadence of pushing by modifying the
> interval.ms
> configuration property of the CLIENT_METRICS resource.
>
> There is an “assigned-partitions” metric for each consumer, but not one for
> active partitions. We could add one, again as a follow-on KIP.
>
> I take your point about holding on to a connection in a channel which might
> experience congestion. Do you have a suggestion for how to improve on this?
> For example, the client does have the concept of a least-loaded node. Maybe
> this is something we should investigate in the implementation and decide
> on the
> best approach. In general, I think sticking with the same node for
> consecutive
> pushes is best, but if you choose the “wrong” node to start with, it’s not
> ideal.
>
> Thanks,
> Andrew
>
> > On 8 Sep 2023, at 19:29, Philip Nee  wrote:
> >
> > Hey Andrew -
> >
> > +1 but I don't have a binding vote!
> >
> > It took me a while to go through the KIP. Here are some of my notes
> during
> > the reading:
> >
> > *Metrics*
> > - Should we care about the client's leader epoch? There is a case where
> the
> > user recreates the topic, but the consumer thinks it is still the same
> > topic and therefore, attempts to start from an offset that doesn't exist.
> > KIP-848 addresses this issue, but I can still see some potential benefits
> > from knowing the client's epoch information.
> > - I assume poll idle is similar to poll interval: I needed to read the
> > description a few times.
> > - I don't have a clear use case in mind for the commit latency, but I do
> > think sometimes people lack clarity about how much progress was tracked
> by
> > the auto-commit.  Would tracking auto-commit-related metrics be useful? I
> > was thinking: the last offset committed or the actual cadence in ms.
> > - Are there cases when we need to increase the cadence of telemetry data
> > push? i.e. variable interval.
> > - Thanks for implementing the randomized initial metric push; I think it
> is
> > really important.
> > - Is there a potential use case for tracking the number of active
> > partitions? The consumer can pause partitions via API, during revocation,
> > or during offset reset for the stream.
> >
> > *Connections*:
> > - The KIP stated that it will keep the same connection until the
> connection
> > is disconnected. I wonder if that could potentially cause congestion if
> it
> > is already a busy channel, which leads to connection timeout and
> > subsequently disconnection.
> >
> > Thanks,
> > P
> >
> > On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield <
> > andrew_schofield_j...@outlook.com> wrote:
> >
> >> Bumping the voting thread for KIP-714.
> >>
> >> So far, we have:
> >> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 4 Aug 2023, at 09:45, Andrew Schofield 
> >> wrote:
> >>>
> >>> Hi,
> >>> After almost 2 1/2 years in the making, I would like to call a vote for
> >> KIP-714 (
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> >> ).
> >>>
> >>> This KIP aims to improve monitoring and troubleshooting of client
> >> performance by enabling clients to push metrics to brokers.
> >>>
> >>> I’d like to thank everyone that participated in the discussion,
> >> especially the librdkafka team since one of the aims of the KIP is to
> >> enable any client to participate, not just the 

Re: [VOTE] KIP-714: Client metrics and observability

2023-09-12 Thread Andrew Schofield
Hi Philip,
Thanks for your vote and interest in the KIP.

KIP-714 does not introduce any new client metrics, and that’s intentional. It 
does
tell how that all of the client metrics can have their names transformed into
equivalent "telemetry metric names”, and then potentially used in metrics
subscriptions.

I am interested in the idea of client’s leader epoch in this context, but I 
don’t have
an immediate plan for how best to do this, and it would take another KIP to 
enhance
existing metrics or introduce some new ones. Those would then naturally be
applicable to the metrics push introduced in KIP-714.

In a similar vein, there are no existing client metrics specifically for 
auto-commit.
We could add them to Kafka, but I really think this is just an example of 
asynchronous
commit in which the application has decided not to specify when the commit 
should
begin.

It is possible to increase the cadence of pushing by modifying the interval.ms
configuration property of the CLIENT_METRICS resource.

There is an “assigned-partitions” metric for each consumer, but not one for
active partitions. We could add one, again as a follow-on KIP.

I take your point about holding on to a connection in a channel which might
experience congestion. Do you have a suggestion for how to improve on this?
For example, the client does have the concept of a least-loaded node. Maybe
this is something we should investigate in the implementation and decide on the
best approach. In general, I think sticking with the same node for consecutive
pushes is best, but if you choose the “wrong” node to start with, it’s not 
ideal.

Thanks,
Andrew

> On 8 Sep 2023, at 19:29, Philip Nee  wrote:
>
> Hey Andrew -
>
> +1 but I don't have a binding vote!
>
> It took me a while to go through the KIP. Here are some of my notes during
> the reading:
>
> *Metrics*
> - Should we care about the client's leader epoch? There is a case where the
> user recreates the topic, but the consumer thinks it is still the same
> topic and therefore, attempts to start from an offset that doesn't exist.
> KIP-848 addresses this issue, but I can still see some potential benefits
> from knowing the client's epoch information.
> - I assume poll idle is similar to poll interval: I needed to read the
> description a few times.
> - I don't have a clear use case in mind for the commit latency, but I do
> think sometimes people lack clarity about how much progress was tracked by
> the auto-commit.  Would tracking auto-commit-related metrics be useful? I
> was thinking: the last offset committed or the actual cadence in ms.
> - Are there cases when we need to increase the cadence of telemetry data
> push? i.e. variable interval.
> - Thanks for implementing the randomized initial metric push; I think it is
> really important.
> - Is there a potential use case for tracking the number of active
> partitions? The consumer can pause partitions via API, during revocation,
> or during offset reset for the stream.
>
> *Connections*:
> - The KIP stated that it will keep the same connection until the connection
> is disconnected. I wonder if that could potentially cause congestion if it
> is already a busy channel, which leads to connection timeout and
> subsequently disconnection.
>
> Thanks,
> P
>
> On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Bumping the voting thread for KIP-714.
>>
>> So far, we have:
>> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)
>>
>> Thanks,
>> Andrew
>>
>>> On 4 Aug 2023, at 09:45, Andrew Schofield 
>> wrote:
>>>
>>> Hi,
>>> After almost 2 1/2 years in the making, I would like to call a vote for
>> KIP-714 (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
>> ).
>>>
>>> This KIP aims to improve monitoring and troubleshooting of client
>> performance by enabling clients to push metrics to brokers.
>>>
>>> I’d like to thank everyone that participated in the discussion,
>> especially the librdkafka team since one of the aims of the KIP is to
>> enable any client to participate, not just the Apache Kafka project’s Java
>> clients.
>>>
>>> Thanks,
>>> Andrew




Re: [VOTE] KIP-714: Client metrics and observability

2023-09-08 Thread Philip Nee
Hey Andrew -

+1 but I don't have a binding vote!

It took me a while to go through the KIP. Here are some of my notes during
the reading:

*Metrics*
- Should we care about the client's leader epoch? There is a case where the
user recreates the topic, but the consumer thinks it is still the same
topic and therefore, attempts to start from an offset that doesn't exist.
KIP-848 addresses this issue, but I can still see some potential benefits
from knowing the client's epoch information.
- I assume poll idle is similar to poll interval: I needed to read the
description a few times.
- I don't have a clear use case in mind for the commit latency, but I do
think sometimes people lack clarity about how much progress was tracked by
the auto-commit.  Would tracking auto-commit-related metrics be useful? I
was thinking: the last offset committed or the actual cadence in ms.
- Are there cases when we need to increase the cadence of telemetry data
push? i.e. variable interval.
- Thanks for implementing the randomized initial metric push; I think it is
really important.
- Is there a potential use case for tracking the number of active
partitions? The consumer can pause partitions via API, during revocation,
or during offset reset for the stream.

*Connections*:
- The KIP stated that it will keep the same connection until the connection
is disconnected. I wonder if that could potentially cause congestion if it
is already a busy channel, which leads to connection timeout and
subsequently disconnection.

Thanks,
P

On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Bumping the voting thread for KIP-714.
>
> So far, we have:
> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)
>
> Thanks,
> Andrew
>
> > On 4 Aug 2023, at 09:45, Andrew Schofield 
> wrote:
> >
> > Hi,
> > After almost 2 1/2 years in the making, I would like to call a vote for
> KIP-714 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> ).
> >
> > This KIP aims to improve monitoring and troubleshooting of client
> performance by enabling clients to push metrics to brokers.
> >
> > I’d like to thank everyone that participated in the discussion,
> especially the librdkafka team since one of the aims of the KIP is to
> enable any client to participate, not just the Apache Kafka project’s Java
> clients.
> >
> > Thanks,
> > Andrew
>
>
>


Re: [VOTE] KIP-714: Client metrics and observability

2023-09-08 Thread Andrew Schofield
Bumping the voting thread for KIP-714.

So far, we have:
Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)

Thanks,
Andrew

> On 4 Aug 2023, at 09:45, Andrew Schofield  wrote:
> 
> Hi,
> After almost 2 1/2 years in the making, I would like to call a vote for 
> KIP-714 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability).
> 
> This KIP aims to improve monitoring and troubleshooting of client performance 
> by enabling clients to push metrics to brokers.
> 
> I’d like to thank everyone that participated in the discussion, especially 
> the librdkafka team since one of the aims of the KIP is to enable any client 
> to participate, not just the Apache Kafka project’s Java clients.
> 
> Thanks,
> Andrew




Re: [VOTE] KIP-714: Client metrics and observability

2023-08-09 Thread Ryanne Dolan
-1, non-binding, for reasons previously stated.

Ryanne

On Fri, Aug 4, 2023, 3:46 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi,
> After almost 2 1/2 years in the making, I would like to call a vote for
> KIP-714 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> ).
>
> This KIP aims to improve monitoring and troubleshooting of client
> performance by enabling clients to push metrics to brokers.
>
> I’d like to thank everyone that participated in the discussion, especially
> the librdkafka team since one of the aims of the KIP is to enable any
> client to participate, not just the Apache Kafka project’s Java clients.
>
> Thanks,
> Andrew


Re: [VOTE] KIP-714: Client metrics and observability

2023-08-09 Thread Kirk True
Hi Andrew,

+1 (non-binding)

This is a huge step in enabling end-to-end observability for users and 
hopefully even help us get a better idea where we can improvement the client 
behavior.

And +100 re: librdkafka team involvement. 

Thanks!

> On Aug 8, 2023, at 4:00 AM, Milind Luthra  
> wrote:
> 
> Hi Andrew, thanks for working on the KIP.
> 
> +1 (non binding)
> 
> Thanks,
> Milind
> 
> On Fri, Aug 4, 2023 at 2:16 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
> 
>> Hi,
>> After almost 2 1/2 years in the making, I would like to call a vote for
>> KIP-714 (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
>> ).
>> 
>> This KIP aims to improve monitoring and troubleshooting of client
>> performance by enabling clients to push metrics to brokers.
>> 
>> I’d like to thank everyone that participated in the discussion, especially
>> the librdkafka team since one of the aims of the KIP is to enable any
>> client to participate, not just the Apache Kafka project’s Java clients.
>> 
>> Thanks,
>> Andrew



Re: [VOTE] KIP-714: Client metrics and observability

2023-08-08 Thread Milind Luthra
Hi Andrew, thanks for working on the KIP.

+1 (non binding)

Thanks,
Milind

On Fri, Aug 4, 2023 at 2:16 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi,
> After almost 2 1/2 years in the making, I would like to call a vote for
> KIP-714 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> ).
>
> This KIP aims to improve monitoring and troubleshooting of client
> performance by enabling clients to push metrics to brokers.
>
> I’d like to thank everyone that participated in the discussion, especially
> the librdkafka team since one of the aims of the KIP is to enable any
> client to participate, not just the Apache Kafka project’s Java clients.
>
> Thanks,
> Andrew


Re: [VOTE] KIP-714: Client Metrics and Observability

2022-05-24 Thread Jason Gustafson
+1 Thanks Magnus!

On Tue, May 17, 2022 at 5:43 AM Magnus Edenhill  wrote:

> Hey all,
>
> It's that time of year again where we re-restart this vote thread after
> some additional
> discussions on the disco thread and minor adjustments to the
> KIP.
>
> We're currently at +5 (non-binding) and -1 (non-binding) votes.
>
> Please cast your votes, people.
>
>
> Thanks,
> Magnus
>
>
> Den tors 3 mars 2022 kl 15:39 skrev Julien Chanaud <
> chanaud.jul...@gmail.com
> >:
>
> > +1
> > As a member of a team which operates several Kafka clusters, I am
> > unequipped when it comes to troubleshooting issues with project teams
> > that did not understand the importance of configuring client-side
> > monitoring.
> > Kafka represents a fraction of their work and they don't have enough
> > experience, time or interest in trying to understand the meaning behind
> > every metric.
> >
> > I stand 100% behind what Colin stated back in June in the Discuss thread
> :
> >
> > > Magnus and I explained a few times the reasons why it does matter.
> Within
> > > most organizations, there are usually several teams using clients,
> which
> > > are separate from the team which maintains the Kafka cluster. The Kafka
> > > team has the Kafka experts, which makes it the best place to centralize
> > > collecting and analyzing Kafka metrics.
> >
> >
> > Thanks for this KIP.
> >
> > Le mer. 26 janv. 2022 à 16:01, rifer...@riferrei.com <
> > rifer...@riferrei.com>
> > a écrit :
> >
> > > +1
> > >
> > > I think this KIP solves a problem that has been around for some time
> with
> > > Kafka deployments, which is the ability to assess the current state of
> a
> > > Kafka architecture but looking at the whole picture. I also share other
> > > folks' concerns regarding adding runtime dependencies to the clients;
> > this
> > > may be problematic for large deployments. Still, I think it is worth
> > > refactoring.
> > >
> > > IMHO, it is a fair trade-off.
> > >
> > > — Ricardo
> > >
> > > > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill 
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > it's been a while and there's been some more discussions of the KIP
> > which
> > > > have been
> > > > addressed on the KIP page.
> > > >
> > > > I think it's a good time to revive this vote thread and get things
> > > moving.
> > > >
> > > > We're currently at +3 (non-binding) and -1 (non-binding) votes.
> > > >
> > > > Regards,
> > > > Magnus
> > > >
> > > >
> > > > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers :
> > > >
> > > >> +1
> > > >>
> > > >> Thank you for the KIP!
> > > >>
> > > >> Our organization runs kafka at large scale in a multi-tenant
> > > configuration.
> > > >> We actually have many other enterprises connecting up to our system
> to
> > > >> retrieve stream data. These feeds vary greatly in volume and
> velocity.
> > > The
> > > >> peak rates are a multiplicative factor of the nominal.  There is
> > extreme
> > > >> skew in our datasets in a number of ways.
> > > >>
> > > >> We don't have time to work with every new internal/external client
> to
> > > tune
> > > >> their feeds. They need to be able to take one of the many kafka
> > clients
> > > and
> > > >> go off to the races.
> > > >>
> > > >> Being able to retrieve client metrics would be invaluable here as
> it's
> > > hard
> > > >> and time consuming to communicate out of the enterprise walls.
> > > >>
> > > >> This KIP is important to us to expand the use of our datasets
> > internally
> > > >> and outside the borders of the enterprise. Our clients like the
> > > performance
> > > >> and data safeties related to the kafka connection. The observability
> > has
> > > >> been a problem...
> > > >>
> > > >> Jonathan Rivers
> > > >> jrivers...@gmail.com
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan <
> ryannedo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> -1
> > > >>>
> > > >>> Ryanne
> > > >>>
> > > >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill 
> > > >> wrote:
> > > >>>
> > >  Hi all,
> > > 
> > >  I'd like to start a vote on KIP-714.
> > >  https://cwiki.apache.org/confluence/x/2xRRCg
> > > 
> > >  Discussion thread:
> > >  https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
> > > 
> > >  Thanks,
> > >  Magnus
> > > 
> > > >>>
> > > >>
> > >
> > >
> >
>


Re: [VOTE] KIP-714: Client Metrics and Observability

2022-05-17 Thread Magnus Edenhill
Hey all,

It's that time of year again where we re-restart this vote thread after
some additional
discussions on the disco thread and minor adjustments to the
KIP.

We're currently at +5 (non-binding) and -1 (non-binding) votes.

Please cast your votes, people.


Thanks,
Magnus


Den tors 3 mars 2022 kl 15:39 skrev Julien Chanaud :

> +1
> As a member of a team which operates several Kafka clusters, I am
> unequipped when it comes to troubleshooting issues with project teams
> that did not understand the importance of configuring client-side
> monitoring.
> Kafka represents a fraction of their work and they don't have enough
> experience, time or interest in trying to understand the meaning behind
> every metric.
>
> I stand 100% behind what Colin stated back in June in the Discuss thread :
>
> > Magnus and I explained a few times the reasons why it does matter. Within
> > most organizations, there are usually several teams using clients, which
> > are separate from the team which maintains the Kafka cluster. The Kafka
> > team has the Kafka experts, which makes it the best place to centralize
> > collecting and analyzing Kafka metrics.
>
>
> Thanks for this KIP.
>
> Le mer. 26 janv. 2022 à 16:01, rifer...@riferrei.com <
> rifer...@riferrei.com>
> a écrit :
>
> > +1
> >
> > I think this KIP solves a problem that has been around for some time with
> > Kafka deployments, which is the ability to assess the current state of a
> > Kafka architecture but looking at the whole picture. I also share other
> > folks' concerns regarding adding runtime dependencies to the clients;
> this
> > may be problematic for large deployments. Still, I think it is worth
> > refactoring.
> >
> > IMHO, it is a fair trade-off.
> >
> > — Ricardo
> >
> > > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill 
> wrote:
> > >
> > > Hi all,
> > >
> > > it's been a while and there's been some more discussions of the KIP
> which
> > > have been
> > > addressed on the KIP page.
> > >
> > > I think it's a good time to revive this vote thread and get things
> > moving.
> > >
> > > We're currently at +3 (non-binding) and -1 (non-binding) votes.
> > >
> > > Regards,
> > > Magnus
> > >
> > >
> > > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers :
> > >
> > >> +1
> > >>
> > >> Thank you for the KIP!
> > >>
> > >> Our organization runs kafka at large scale in a multi-tenant
> > configuration.
> > >> We actually have many other enterprises connecting up to our system to
> > >> retrieve stream data. These feeds vary greatly in volume and velocity.
> > The
> > >> peak rates are a multiplicative factor of the nominal.  There is
> extreme
> > >> skew in our datasets in a number of ways.
> > >>
> > >> We don't have time to work with every new internal/external client to
> > tune
> > >> their feeds. They need to be able to take one of the many kafka
> clients
> > and
> > >> go off to the races.
> > >>
> > >> Being able to retrieve client metrics would be invaluable here as it's
> > hard
> > >> and time consuming to communicate out of the enterprise walls.
> > >>
> > >> This KIP is important to us to expand the use of our datasets
> internally
> > >> and outside the borders of the enterprise. Our clients like the
> > performance
> > >> and data safeties related to the kafka connection. The observability
> has
> > >> been a problem...
> > >>
> > >> Jonathan Rivers
> > >> jrivers...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan 
> > >> wrote:
> > >>
> > >>> -1
> > >>>
> > >>> Ryanne
> > >>>
> > >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill 
> > >> wrote:
> > >>>
> >  Hi all,
> > 
> >  I'd like to start a vote on KIP-714.
> >  https://cwiki.apache.org/confluence/x/2xRRCg
> > 
> >  Discussion thread:
> >  https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
> > 
> >  Thanks,
> >  Magnus
> > 
> > >>>
> > >>
> >
> >
>


Re: [VOTE] KIP-714: Client Metrics and Observability

2022-03-03 Thread Julien Chanaud
+1
As a member of a team which operates several Kafka clusters, I am
unequipped when it comes to troubleshooting issues with project teams
that did not understand the importance of configuring client-side
monitoring.
Kafka represents a fraction of their work and they don't have enough
experience, time or interest in trying to understand the meaning behind
every metric.

I stand 100% behind what Colin stated back in June in the Discuss thread :

> Magnus and I explained a few times the reasons why it does matter. Within
> most organizations, there are usually several teams using clients, which
> are separate from the team which maintains the Kafka cluster. The Kafka
> team has the Kafka experts, which makes it the best place to centralize
> collecting and analyzing Kafka metrics.


Thanks for this KIP.

Le mer. 26 janv. 2022 à 16:01, rifer...@riferrei.com 
a écrit :

> +1
>
> I think this KIP solves a problem that has been around for some time with
> Kafka deployments, which is the ability to assess the current state of a
> Kafka architecture but looking at the whole picture. I also share other
> folks' concerns regarding adding runtime dependencies to the clients; this
> may be problematic for large deployments. Still, I think it is worth
> refactoring.
>
> IMHO, it is a fair trade-off.
>
> — Ricardo
>
> > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill  wrote:
> >
> > Hi all,
> >
> > it's been a while and there's been some more discussions of the KIP which
> > have been
> > addressed on the KIP page.
> >
> > I think it's a good time to revive this vote thread and get things
> moving.
> >
> > We're currently at +3 (non-binding) and -1 (non-binding) votes.
> >
> > Regards,
> > Magnus
> >
> >
> > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers :
> >
> >> +1
> >>
> >> Thank you for the KIP!
> >>
> >> Our organization runs kafka at large scale in a multi-tenant
> configuration.
> >> We actually have many other enterprises connecting up to our system to
> >> retrieve stream data. These feeds vary greatly in volume and velocity.
> The
> >> peak rates are a multiplicative factor of the nominal.  There is extreme
> >> skew in our datasets in a number of ways.
> >>
> >> We don't have time to work with every new internal/external client to
> tune
> >> their feeds. They need to be able to take one of the many kafka clients
> and
> >> go off to the races.
> >>
> >> Being able to retrieve client metrics would be invaluable here as it's
> hard
> >> and time consuming to communicate out of the enterprise walls.
> >>
> >> This KIP is important to us to expand the use of our datasets internally
> >> and outside the borders of the enterprise. Our clients like the
> performance
> >> and data safeties related to the kafka connection. The observability has
> >> been a problem...
> >>
> >> Jonathan Rivers
> >> jrivers...@gmail.com
> >>
> >>
> >>
> >>
> >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan 
> >> wrote:
> >>
> >>> -1
> >>>
> >>> Ryanne
> >>>
> >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill 
> >> wrote:
> >>>
>  Hi all,
> 
>  I'd like to start a vote on KIP-714.
>  https://cwiki.apache.org/confluence/x/2xRRCg
> 
>  Discussion thread:
>  https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
> 
>  Thanks,
>  Magnus
> 
> >>>
> >>
>
>


Re: [VOTE] KIP-714: Client Metrics and Observability

2022-01-26 Thread rifer...@riferrei.com
+1

I think this KIP solves a problem that has been around for some time with Kafka 
deployments, which is the ability to assess the current state of a Kafka 
architecture but looking at the whole picture. I also share other folks' 
concerns regarding adding runtime dependencies to the clients; this may be 
problematic for large deployments. Still, I think it is worth refactoring.

IMHO, it is a fair trade-off.

— Ricardo

> On Jan 26, 2022, at 9:34 AM, Magnus Edenhill  wrote:
> 
> Hi all,
> 
> it's been a while and there's been some more discussions of the KIP which
> have been
> addressed on the KIP page.
> 
> I think it's a good time to revive this vote thread and get things moving.
> 
> We're currently at +3 (non-binding) and -1 (non-binding) votes.
> 
> Regards,
> Magnus
> 
> 
> Den mån 1 nov. 2021 kl 21:19 skrev J Rivers :
> 
>> +1
>> 
>> Thank you for the KIP!
>> 
>> Our organization runs kafka at large scale in a multi-tenant configuration.
>> We actually have many other enterprises connecting up to our system to
>> retrieve stream data. These feeds vary greatly in volume and velocity. The
>> peak rates are a multiplicative factor of the nominal.  There is extreme
>> skew in our datasets in a number of ways.
>> 
>> We don't have time to work with every new internal/external client to tune
>> their feeds. They need to be able to take one of the many kafka clients and
>> go off to the races.
>> 
>> Being able to retrieve client metrics would be invaluable here as it's hard
>> and time consuming to communicate out of the enterprise walls.
>> 
>> This KIP is important to us to expand the use of our datasets internally
>> and outside the borders of the enterprise. Our clients like the performance
>> and data safeties related to the kafka connection. The observability has
>> been a problem...
>> 
>> Jonathan Rivers
>> jrivers...@gmail.com
>> 
>> 
>> 
>> 
>> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan 
>> wrote:
>> 
>>> -1
>>> 
>>> Ryanne
>>> 
>>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill 
>> wrote:
>>> 
 Hi all,
 
 I'd like to start a vote on KIP-714.
 https://cwiki.apache.org/confluence/x/2xRRCg
 
 Discussion thread:
 https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
 
 Thanks,
 Magnus
 
>>> 
>> 



Re: [VOTE] KIP-714: Client Metrics and Observability

2022-01-26 Thread Magnus Edenhill
Hi all,

it's been a while and there's been some more discussions of the KIP which
have been
addressed on the KIP page.

I think it's a good time to revive this vote thread and get things moving.

We're currently at +3 (non-binding) and -1 (non-binding) votes.

Regards,
Magnus


Den mån 1 nov. 2021 kl 21:19 skrev J Rivers :

> +1
>
> Thank you for the KIP!
>
> Our organization runs kafka at large scale in a multi-tenant configuration.
> We actually have many other enterprises connecting up to our system to
> retrieve stream data. These feeds vary greatly in volume and velocity. The
> peak rates are a multiplicative factor of the nominal.  There is extreme
> skew in our datasets in a number of ways.
>
> We don't have time to work with every new internal/external client to tune
> their feeds. They need to be able to take one of the many kafka clients and
> go off to the races.
>
> Being able to retrieve client metrics would be invaluable here as it's hard
> and time consuming to communicate out of the enterprise walls.
>
> This KIP is important to us to expand the use of our datasets internally
> and outside the borders of the enterprise. Our clients like the performance
> and data safeties related to the kafka connection. The observability has
> been a problem...
>
> Jonathan Rivers
> jrivers...@gmail.com
>
>
>
>
> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan 
> wrote:
>
> > -1
> >
> > Ryanne
> >
> > On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill 
> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a vote on KIP-714.
> > > https://cwiki.apache.org/confluence/x/2xRRCg
> > >
> > > Discussion thread:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
> > >
> > > Thanks,
> > > Magnus
> > >
> >
>


RE: Re: [VOTE] KIP-714: Client Metrics and Observability

2021-11-05 Thread Igor Buzatovic
+1

We also have a lot of clients using our central Kafka cluster, and it would
be great to have client metrics so we can provide end-to-end monitoring.

Igor Buzatović
Porsche Digital

On 2021/11/01 20:19:20 J Rivers wrote:
> +1
>
> Thank you for the KIP!
>
> Our organization runs kafka at large scale in a multi-tenant
configuration.
> We actually have many other enterprises connecting up to our system to
> retrieve stream data. These feeds vary greatly in volume and velocity. The
> peak rates are a multiplicative factor of the nominal.  There is extreme
> skew in our datasets in a number of ways.
>
> We don't have time to work with every new internal/external client to tune
> their feeds. They need to be able to take one of the many kafka clients
and
> go off to the races.
>
> Being able to retrieve client metrics would be invaluable here as it's
hard
> and time consuming to communicate out of the enterprise walls.
>
> This KIP is important to us to expand the use of our datasets internally
> and outside the borders of the enterprise. Our clients like the
performance
> and data safeties related to the kafka connection. The observability has
> been a problem...
>
> Jonathan Rivers
> jrivers...@gmail.com
>
>
>
>
> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan  wrote:
>
> > -1
> >
> > Ryanne
> >
> > On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a vote on KIP-714.
> > > https://cwiki.apache.org/confluence/x/2xRRCg
> > >
> > > Discussion thread:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
> > >
> > > Thanks,
> > > Magnus
> > >
> >
>


Re: [VOTE] KIP-714: Client Metrics and Observability

2021-11-01 Thread J Rivers
+1

Thank you for the KIP!

Our organization runs kafka at large scale in a multi-tenant configuration.
We actually have many other enterprises connecting up to our system to
retrieve stream data. These feeds vary greatly in volume and velocity. The
peak rates are a multiplicative factor of the nominal.  There is extreme
skew in our datasets in a number of ways.

We don't have time to work with every new internal/external client to tune
their feeds. They need to be able to take one of the many kafka clients and
go off to the races.

Being able to retrieve client metrics would be invaluable here as it's hard
and time consuming to communicate out of the enterprise walls.

This KIP is important to us to expand the use of our datasets internally
and outside the borders of the enterprise. Our clients like the performance
and data safeties related to the kafka connection. The observability has
been a problem...

Jonathan Rivers
jrivers...@gmail.com




On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan  wrote:

> -1
>
> Ryanne
>
> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill  wrote:
>
> > Hi all,
> >
> > I'd like to start a vote on KIP-714.
> > https://cwiki.apache.org/confluence/x/2xRRCg
> >
> > Discussion thread:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
> >
> > Thanks,
> > Magnus
> >
>


Re: [VOTE] KIP-714: Client Metrics and Observability

2021-10-18 Thread Ryanne Dolan
-1

Ryanne

On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill  wrote:

> Hi all,
>
> I'd like to start a vote on KIP-714.
> https://cwiki.apache.org/confluence/x/2xRRCg
>
> Discussion thread:
> https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
>
> Thanks,
> Magnus
>


Re: [VOTE] KIP-714: Client Metrics and Observability

2021-10-18 Thread Anna McDonald
Hi MagnUs,

Thanks for the KIP.
+1 (non-binding)

Cheers,
Anna

On Mon, Oct 18, 2021, 5:30 AM Magnus Edenhill  wrote:

> Hi all,
>
> I'd like to start a vote on KIP-714.
> https://cwiki.apache.org/confluence/x/2xRRCg
>
> Discussion thread:
> https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html
>
> Thanks,
> Magnus
>