Kafka Broker monitoring
Hi Team, We have some Splunk dashboards along with custom UI elements to report Kafka health status. We forward all Kafka health check statuses to be loaded into Splunk. However, we are encountering capacity issues on Splunk as we service multiple Kafka clusters across our data center as well as three PCs (with weekly growth in deployment). The only problem is that there is an hour or two of latency. While I have some ideas, I am trying to understand what you all use for monitoring broker status. The idea behind this is to provide real-time updates to all our Kafka installations (we will be upgrading to 3.6 soon). We prefer open-source solutions over vendor-locked options. Can you please share any best-known methods (BKMs) for achieving real-time cluster updates? Best, Vinay Bagare
GetLogsDir API in Kafka 3.3.1 returns all topics even when topic name specified in args
Hello. I’m having a problem with Kafka protocol API. Requests: DescribeLogDirs Request (Version: 0) => [topics] topics => topic [partitions] topic => STRING partitions => INT32 My request contains `[{topic: “blah”, partitions: [0,1,2,3,4,5,6,7,8,9]}]`, but the result Responses: DescribeLogDirs Response (Version: 0) => throttle_time_ms [results] throttle_time_ms => INT32 results => error_code log_dir [topics] error_code => INT16 log_dir => STRING topics => name [partitions] name => STRING partitions => partition_index partition_size offset_lag is_future_key partition_index => INT32 partition_size => INT64 offset_lag => INT64 is_future_key => BOOLEAN contains entries for *all* topics. My workaround had been to filter the returned list by topic name to find the one I was requesting the data for, but I don’t understand why it’s not limiting the results to just the topic I requested in the first place. Also, I think there should be an option to just specify ALL_PARTITIONS because that would save me from having to retrieve topic metadata from the broker to count the number of partitions. Kafka server would probably have means to do that more efficiently. Is this a bug or am I doing something wrong? Thanks, Maxim COGILITY SOFTWARE CORPORATION LEGAL DISCLAIMER: The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Re: [ Questions on log4j file & version ]
Hi Ashok, Kafka 2.7.1 was built from the 2.7.1 tag [1] and looking at the dependencies in that version [2], it should have shipped with 1.2.17. You can verify this by looking for the log4j jar in your installation. Because of the security vulnerabilities you mention, Kafka switched to reload4j in [3] around 3.2.0, and last upgraded reload4j in 3.6.0 [4]. You should consider upgrading to a more recent version of Kafka (recommended, as 2.7 is well out-of-support) or consider swapping out the log4j jar with a recent version of reload4j (not recommended). [1] https://github.com/apache/kafka/tree/2.7.1 [2] https://github.com/apache/kafka/blob/61dbce85d0d41457d81a4096ecaea049f3a4b3ae/gradle/dependencies.gradle#L76 [3] https://issues.apache.org/jira/browse/KAFKA-13660 [4] https://github.com/apache/kafka/pull/13673 Thanks, Greg On Thu, May 16, 2024 at 5:50 AM Ashok Kumar Ragupathi wrote: > > Hello Kafka Team, > > Request your help... > > We are using Apache Kafka kafka_2.13-2.7.1 & installed on a server. > > I understand it uses log4j java for logger purposes. > > But we don't know, what is the log4j version it is using? > > Recently we came to know that log4j_1.2.17 has some security issues, how to > upgrade the log4j_v2 version? how to find what version internally it uses > or refers ? > > Thanks & Regards > Ashok Kumar > Denovo Systems
Re: Kafka streams stores key in multiple state store instances
Hello Kay, What you describe is "by design" -- unfortunately. The problem is, that when we build the `Topology` we don't know the partition count of the input topics, and thus, StreamsBuilder cannot insert a repartition topic for this case (we always assume that the partition count is the same for all input topic). To work around this, you would need to rewrite the program to use either `groupBy((k,v) -> k)` instead of `groupByKey()`, or do a `.repartition().groupByKey()`. Does this make sense? -Matthias On 5/16/24 2:11 AM, Kay Hannay wrote: Hi, we have a Kafka streams application which merges (merge, groupByKey, aggretgate) a few topics into one topic. The application is stateful, of course. There are currently six instances of the application running in parallel. We had an issue where one new Topic for aggregation did have another partition count than all other topics. This caused data corruption in our application. We expected that a re-partitioning topic would be created automatically by Kafka streams or that we would get an error. But this did not happen. Instead, some of the keys (all merged topics share the same key schema) found their way into at least two different instances of the application. One key is available in more than one local state store. Can you explain why this happened? As already said, we would have expected to get an error or a re-partitioning topic in this case. Cheers Kay
Re: Request to be added to kafka contributors list
Thanks for reaching out Yang. You should be all set. -Matthias On 5/16/24 7:40 AM, Yang Fan wrote: Dear Apache Kafka Team, I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I kindly request to be added to the contributors list for Apache Kafka. Being part of this list would allow me to be assigned to JIRA tickets and work on them. Thank you for considering my request. Best regards, Fan
Request to be added to kafka contributors list
Dear Apache Kafka Team, I hope this email finds you well. My name is Fan Yang, JIRA ID is fanyan, I kindly request to be added to the contributors list for Apache Kafka. Being part of this list would allow me to be assigned to JIRA tickets and work on them. Thank you for considering my request. Best regards, Fan
Large messages
Hi, I am having a bit of a problem here. It is a bit of an unusual use case. I am running a single node Kafka server and noticed that some messages I send are not passed through Kafka. They are silently lost. The topic is created but no data. Other topics on the same server work just fine. More investigation found that the message size I use on the afflicted topic is somewhat larger then the default 1 MB. I found a description how to work with larger messages at: https://www.conduktor.io/kafka/how-to-send-large-messages-in-apache-kafka/ But I am stuck: the description asks for configuring replica.fetch.max.bytes in server.properties. Or message.max.bytes I did this and restarted Kafka but kafka-configs.sh —bootstrap-server localhost:9092 —broker 0 —describe -all still shows the old default values. As if my entries in server.properties are ignored. How to proceed? What is the right way to work with larger messages in Kafka? In my use case, 2MB would be enough Best Regards, Mark Koennecke
[ Questions on log4j file & version ]
Hello Kafka Team, Request your help... We are using Apache Kafka kafka_2.13-2.7.1 & installed on a server. I understand it uses log4j java for logger purposes. But we don't know, what is the log4j version it is using? Recently we came to know that log4j_1.2.17 has some security issues, how to upgrade the log4j_v2 version? how to find what version internally it uses or refers ? Thanks & Regards Ashok Kumar Denovo Systems
Kafka streams stores key in multiple state store instances
Hi, we have a Kafka streams application which merges (merge, groupByKey, aggretgate) a few topics into one topic. The application is stateful, of course. There are currently six instances of the application running in parallel. We had an issue where one new Topic for aggregation did have another partition count than all other topics. This caused data corruption in our application. We expected that a re-partitioning topic would be created automatically by Kafka streams or that we would get an error. But this did not happen. Instead, some of the keys (all merged topics share the same key schema) found their way into at least two different instances of the application. One key is available in more than one local state store. Can you explain why this happened? As already said, we would have expected to get an error or a re-partitioning topic in this case. Cheers Kay