Hi, If we observe the screenshot the majority utilization is in pollKafkaBroker() method under nextTuple(), whereas the jira ticket refers to KafkaConsumer.committed() which is invoked from emitIfWaitingNotEmitted() under nextTuple();
Meanwhile, working on pulling the fix and see if it changes anything. Thanks, Nithin From: [email protected] At: 08/07/18 12:14:12To: [email protected] Subject: Re: Kafka Spout Performance Tuning Building on what Stig said, the best way for you to see if the patch on STORM-3102 addresses the issue you are facing is for you get the code that includes this patch and run your topology to see what is the impact on performance. On Tue, Aug 7, 2018 at 8:37 AM Stig Rohde Døssing <[email protected]> wrote: Before you start turning other knobs, you should be aware of https://issues.apache.org/jira/browse/STORM-3102, which is a performance penalty when using Kafka 0.11 or higher. It should be fixed in the latest code, but hasn't been released yet. 2018-08-07 16:53 GMT+02:00 Nithin Uppalapati (BLOOMBERG/ 731 LEX) <[email protected]>: Hi, Using storm version 1.2.1 and kafka version 1.1.0. I have a live profile of the application. Please, find attached the screen shots. Major utilization is in following methods: KafkaSpout.nextTuple(); KafkaSpout.emitIfWaitingNotEmitted(); KafkaSpout.pollKafkaBroker(); Thanks, Nithin From: [email protected] At: 08/07/18 10:30:07To: [email protected] Subject: Re: Kafka Spout Performance Tuning Which Storm and Kafka versions are you using ? How many Kafka partitions do you have ? Is there a way for you to do a live profile of the application to see what is happening ? You can control the number of records fetched on each poll using properties such as max.poll.records fetch.max.bytes max.partition.fetch.bytes You can check the Kafka new consumer properties documentation for details. Hugo On Aug 7, 2018, at 6:48 AM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) <[email protected]> wrote: Hi, The CPU utilization is going high to around 400% with our topology. So to analyze more deeply and segregate areas of high CPU utilization I commented out the entire topology except the KafkaSpout, so basically my topology only has KafkaSpout and CPU utilization is around 150% on a 20 core machine. Topology is running using a single worker process with Kafka Parallelism set equal to the number of partitions in the kafka. The data load during this phase is a total of 50k records, at a rate of 1600/sec - 2200/sec. Question: how to tune the performance of KafkaSpout, to reduce CPU utilization which is around 150% with just kafkaspout? The below parameters definitions does not give an idea. Also, is there a way to control the reading of data from the kafka in a spout? Following are the values of some of the parameters: *poll.timeout.ms to 200. *offset.commit.period.ms to 30000 (30 seconds). *max.uncommitted.offsets to 10000000 (ten million)
