Kafka Broker monitoring

2024-05-16 Thread Vinay Bagare

Hi Team,

We have some Splunk dashboards along with custom UI elements to report Kafka 
health status. We forward all Kafka health check statuses to be loaded into 
Splunk. However, we are encountering capacity issues on Splunk as we service 
multiple Kafka clusters across our data center as well as three PCs (with 
weekly growth in deployment).

The only problem is that there is an hour or two of latency.

While I have some ideas, I am trying to understand what you all use for 
monitoring broker status. The idea behind this is to provide real-time updates 
to all our Kafka installations (we will be upgrading to 3.6 soon).

We prefer open-source solutions over vendor-locked options.

Can you please share any best-known methods (BKMs) for achieving real-time 
cluster updates?

Best,
Vinay Bagare


GetLogsDir API in Kafka 3.3.1 returns all topics even when topic name specified in args

2024-05-16 Thread Maxim Senin
Hello.

I’m having a problem with Kafka protocol API.

Requests:
DescribeLogDirs Request (Version: 0) => [topics]
  topics => topic [partitions]
topic => STRING
partitions => INT32

My request contains `[{topic: “blah”, partitions: [0,1,2,3,4,5,6,7,8,9]}]`, but 
the result

Responses:
DescribeLogDirs Response (Version: 0) => throttle_time_ms [results]
  throttle_time_ms => INT32
  results => error_code log_dir [topics]
error_code => INT16
log_dir => STRING
topics => name [partitions]
  name => STRING
  partitions => partition_index partition_size offset_lag is_future_key
partition_index => INT32
partition_size => INT64
offset_lag => INT64
is_future_key => BOOLEAN



 contains entries for *all* topics. My workaround had been to filter the 
returned list by topic name to find the one I was requesting the data for, but 
I don’t understand why it’s not limiting the results to just the topic I 
requested in the first place.

Also, I think there should be an option to just specify ALL_PARTITIONS because 
that would save me from having to retrieve topic metadata from the broker to 
count the number of partitions. Kafka server would probably have means to do 
that more efficiently.

Is this a bug or am I doing something wrong?

Thanks,
Maxim




COGILITY SOFTWARE CORPORATION LEGAL DISCLAIMER: The information in this email 
is confidential and is intended solely for the addressee. Access to this email 
by anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful.


Re: [ Questions on log4j file & version ]

2024-05-16 Thread Greg Harris
Hi Ashok,

Kafka 2.7.1 was built from the 2.7.1 tag [1] and looking at the
dependencies in that version [2], it should have shipped with 1.2.17.
You can verify this by looking for the log4j jar in your installation.
Because of the security vulnerabilities you mention, Kafka switched to
reload4j in [3] around 3.2.0, and last upgraded reload4j in 3.6.0 [4].

You should consider upgrading to a more recent version of Kafka
(recommended, as 2.7 is well out-of-support) or consider swapping out
the log4j jar with a recent version of reload4j (not recommended).

[1] https://github.com/apache/kafka/tree/2.7.1
[2] 
https://github.com/apache/kafka/blob/61dbce85d0d41457d81a4096ecaea049f3a4b3ae/gradle/dependencies.gradle#L76
[3] https://issues.apache.org/jira/browse/KAFKA-13660
[4] https://github.com/apache/kafka/pull/13673

Thanks,
Greg

On Thu, May 16, 2024 at 5:50 AM Ashok Kumar Ragupathi
 wrote:
>
> Hello Kafka Team,
>
> Request your help...
>
> We are using Apache Kafka kafka_2.13-2.7.1 & installed on a server.
>
> I understand it uses log4j java for logger purposes.
>
> But we don't know, what is the log4j version it is using?
>
> Recently we came to know that log4j_1.2.17 has some security issues, how to
> upgrade the log4j_v2 version? how to find what version internally it uses
> or refers ?
>
> Thanks & Regards
> Ashok Kumar
> Denovo Systems


Re: Kafka streams stores key in multiple state store instances

2024-05-16 Thread Matthias J. Sax

Hello Kay,

What you describe is "by design" -- unfortunately.

The problem is, that when we build the `Topology` we don't know the 
partition count of the input topics, and thus, StreamsBuilder cannot 
insert a repartition topic for this case (we always assume that the 
partition count is the same for all input topic).


To work around this, you would need to rewrite the program to use either 
`groupBy((k,v) -> k)` instead of `groupByKey()`, or do a 
`.repartition().groupByKey()`.


Does this make sense?


-Matthias


On 5/16/24 2:11 AM, Kay Hannay wrote:

Hi,

we have a Kafka streams application which merges (merge, groupByKey, 
aggretgate) a few topics into one topic. The application is stateful, of 
course. There are currently six instances of the application running in 
parallel.

We had an issue where one new Topic for aggregation did have another partition 
count than all other topics. This caused data corruption in our application. We 
expected that a re-partitioning topic would be created automatically by Kafka 
streams or that we would get an error. But this did not happen. Instead, some 
of the keys (all merged topics share the same key schema) found their way into 
at least two different instances of the application. One key is available in 
more than one local state store. Can you explain why this happened? As already 
said, we would have expected to get an error or a re-partitioning topic in this 
case.

Cheers
Kay



Re: Request to be added to kafka contributors list

2024-05-16 Thread Matthias J. Sax

Thanks for reaching out Yang. You should be all set.

-Matthias

On 5/16/24 7:40 AM, Yang Fan wrote:

Dear Apache Kafka Team,

I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I 
kindly request to be added to the contributors list for Apache Kafka. Being 
part of this list would allow me to be assigned to JIRA tickets and work on 
them.
Thank you for considering my request.
Best regards,
Fan


Request to be added to kafka contributors list

2024-05-16 Thread Yang Fan
Dear Apache Kafka Team,

I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I 
kindly request to be added to the contributors list for Apache Kafka. Being 
part of this list would allow me to be assigned to JIRA tickets and work on 
them.
Thank you for considering my request.
Best regards,
Fan


Large messages

2024-05-16 Thread Marks Gmail
Hi,

I am having a bit of a problem here. It is a bit of an unusual use case. I am 
running a single node Kafka server and
noticed that some messages I send are not passed through Kafka. They are 
silently lost.  The topic is created but no 
data. Other topics on the same server work just fine. More investigation found 
that the message size I use on the 
afflicted topic is somewhat larger then the default 1 MB.  I found a 
description how to work with larger messages at:
https://www.conduktor.io/kafka/how-to-send-large-messages-in-apache-kafka/

But I am stuck: the description asks for configuring replica.fetch.max.bytes in 
server.properties. Or message.max.bytes
I did this and restarted Kafka but kafka-configs.sh —bootstrap-server 
localhost:9092 —broker 0 —describe -all 
still shows the old default values. As if my entries in server.properties are 
ignored. 

How to proceed? What is the right way to work with larger messages in Kafka? In 
my use case, 2MB would be 
enough


Best Regards,

Mark Koennecke



[ Questions on log4j file & version ]

2024-05-16 Thread Ashok Kumar Ragupathi
Hello Kafka Team,

Request your help...

We are using Apache Kafka kafka_2.13-2.7.1 & installed on a server.

I understand it uses log4j java for logger purposes.

But we don't know, what is the log4j version it is using?

Recently we came to know that log4j_1.2.17 has some security issues, how to
upgrade the log4j_v2 version? how to find what version internally it uses
or refers ?

Thanks & Regards
Ashok Kumar
Denovo Systems


Kafka streams stores key in multiple state store instances

2024-05-16 Thread Kay Hannay
Hi,

we have a Kafka streams application which merges (merge, groupByKey, 
aggretgate) a few topics into one topic. The application is stateful, of 
course. There are currently six instances of the application running in 
parallel.

We had an issue where one new Topic for aggregation did have another partition 
count than all other topics. This caused data corruption in our application. We 
expected that a re-partitioning topic would be created automatically by Kafka 
streams or that we would get an error. But this did not happen. Instead, some 
of the keys (all merged topics share the same key schema) found their way into 
at least two different instances of the application. One key is available in 
more than one local state store. Can you explain why this happened? As already 
said, we would have expected to get an error or a re-partitioning topic in this 
case.

Cheers
Kay