Hi,

If we observe the screenshot the majority utilization is in pollKafkaBroker() 
method under nextTuple(), whereas the jira ticket refers to 
KafkaConsumer.committed() which is invoked from emitIfWaitingNotEmitted() under 
nextTuple();

Meanwhile, working on pulling the fix and see if it changes anything.

Thanks, Nithin

From: [email protected] At: 08/07/18 12:14:12To:  [email protected]
Subject: Re: Kafka Spout Performance Tuning

Building on what Stig said, the best way for you to see if the patch on 
STORM-3102 addresses the issue you are facing is for you get the code that 
includes this patch and run your topology to see what is the impact on 
performance.
On Tue, Aug 7, 2018 at 8:37 AM Stig Rohde Døssing <[email protected]> wrote:

Before you start turning other knobs, you should be aware of 
https://issues.apache.org/jira/browse/STORM-3102, which is a performance 
penalty when using Kafka 0.11 or higher. It should be fixed in the latest code, 
but hasn't been released yet.

2018-08-07 16:53 GMT+02:00 Nithin Uppalapati (BLOOMBERG/ 731 LEX) 
<[email protected]>:

Hi,
Using storm version 1.2.1 and kafka version 1.1.0.
I have a live profile of the application. Please, find attached the screen 
shots.

Major utilization is in following methods:
KafkaSpout.nextTuple();
KafkaSpout.emitIfWaitingNotEmitted();
KafkaSpout.pollKafkaBroker();

Thanks,
Nithin

From: [email protected] At: 08/07/18 10:30:07To:  [email protected]
Subject: Re: Kafka Spout Performance Tuning


Which Storm and Kafka versions are you using ? How many Kafka partitions do you 
have ? Is there a way for you to do a live profile of the application to see 
what is happening ?

You can control the number of records fetched on each poll using properties 
such as 

max.poll.records
fetch.max.bytes
max.partition.fetch.bytes

You can check the Kafka new consumer properties documentation for details.

Hugo

On Aug 7, 2018, at 6:48 AM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) 
<[email protected]> wrote:


Hi,

The CPU utilization is going high to around 400% with our topology. So to 
analyze more deeply and segregate areas of high CPU utilization I commented out 
the entire topology except the KafkaSpout, so basically my topology only has 
KafkaSpout and CPU utilization is around 150% on a 20 core machine. Topology is 
running using a single worker process with Kafka Parallelism set equal to the 
number of partitions in the kafka. 

The data load during this phase is a total of 50k records, at a rate of 
1600/sec - 2200/sec.

Question: how to tune the performance of KafkaSpout, to reduce CPU utilization 
which is around 150% with just kafkaspout? The below parameters definitions 
does not give an idea. Also, is there a way to control the reading of data from 
the kafka in a spout?

Following are the values of some of the parameters:

*poll.timeout.ms to 200.
*offset.commit.period.ms to 30000 (30 seconds).
*max.uncommitted.offsets to 10000000 (ten million)


Reply via email to