Before you start turning other knobs, you should be aware of
https://issues.apache.org/jira/browse/STORM-3102, which is a performance
penalty when using Kafka 0.11 or higher. It should be fixed in the latest
code, but hasn't been released yet.

2018-08-07 16:53 GMT+02:00 Nithin Uppalapati (BLOOMBERG/ 731 LEX) <
[email protected]>:

> Hi,
> Using storm version 1.2.1 and kafka version 1.1.0.
> I have a live profile of the application. Please, find attached the screen
> shots.
>
> Major utilization is in following methods:
> KafkaSpout.nextTuple();
> KafkaSpout.emitIfWaitingNotEmitted();
> KafkaSpout.pollKafkaBroker();
>
> Thanks,
> Nithin
>
> From: [email protected] At: 08/07/18 10:30:07
> To: [email protected]
> Subject: Re: Kafka Spout Performance Tuning
>
>
>
> Which Storm and Kafka versions are you using ? How many Kafka partitions
> do you have ? Is there a way for you to do a live profile of the
> application to see what is happening ?
>
> You can control the number of records fetched on each poll using
> properties such as
>
> max.poll.records
> fetch.max.bytes
> max.partition.fetch.bytes
>
> You can check the Kafka new consumer properties documentation for details.
>
> Hugo
>
> On Aug 7, 2018, at 6:48 AM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) <
> [email protected]> wrote:
>
> Hi,
>
> The CPU utilization is going high to around 400% with our topology. So to
> analyze more deeply and segregate areas of high CPU utilization I commented
> out the entire topology except the KafkaSpout, so basically my topology
> only has KafkaSpout and CPU utilization is around 150% on a 20 core
> machine. Topology is running using a single worker process with Kafka
> Parallelism set equal to the number of partitions in the kafka.
>
> The data load during this phase is a total of 50k records, at a rate of
> 1600/sec - 2200/sec.
>
> Question: how to tune the performance of KafkaSpout, to reduce CPU
> utilization which is around 150% with just kafkaspout? The below parameters
> definitions does not give an idea. Also, is there a way to control the
> reading of data from the kafka in a spout?
>
> Following are the values of some of the parameters:
>
>    - poll.timeout.ms to 200.
>    - offset.commit.period.ms to 30000 (30 seconds).
>    - max.uncommitted.offsets to 10000000 (ten million)
>
>
>

Reply via email to