Before you start turning other knobs, you should be aware of https://issues.apache.org/jira/browse/STORM-3102, which is a performance penalty when using Kafka 0.11 or higher. It should be fixed in the latest code, but hasn't been released yet.
2018-08-07 16:53 GMT+02:00 Nithin Uppalapati (BLOOMBERG/ 731 LEX) < [email protected]>: > Hi, > Using storm version 1.2.1 and kafka version 1.1.0. > I have a live profile of the application. Please, find attached the screen > shots. > > Major utilization is in following methods: > KafkaSpout.nextTuple(); > KafkaSpout.emitIfWaitingNotEmitted(); > KafkaSpout.pollKafkaBroker(); > > Thanks, > Nithin > > From: [email protected] At: 08/07/18 10:30:07 > To: [email protected] > Subject: Re: Kafka Spout Performance Tuning > > > > Which Storm and Kafka versions are you using ? How many Kafka partitions > do you have ? Is there a way for you to do a live profile of the > application to see what is happening ? > > You can control the number of records fetched on each poll using > properties such as > > max.poll.records > fetch.max.bytes > max.partition.fetch.bytes > > You can check the Kafka new consumer properties documentation for details. > > Hugo > > On Aug 7, 2018, at 6:48 AM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) < > [email protected]> wrote: > > Hi, > > The CPU utilization is going high to around 400% with our topology. So to > analyze more deeply and segregate areas of high CPU utilization I commented > out the entire topology except the KafkaSpout, so basically my topology > only has KafkaSpout and CPU utilization is around 150% on a 20 core > machine. Topology is running using a single worker process with Kafka > Parallelism set equal to the number of partitions in the kafka. > > The data load during this phase is a total of 50k records, at a rate of > 1600/sec - 2200/sec. > > Question: how to tune the performance of KafkaSpout, to reduce CPU > utilization which is around 150% with just kafkaspout? The below parameters > definitions does not give an idea. Also, is there a way to control the > reading of data from the kafka in a spout? > > Following are the values of some of the parameters: > > - poll.timeout.ms to 200. > - offset.commit.period.ms to 30000 (30 seconds). > - max.uncommitted.offsets to 10000000 (ten million) > > >
