Building on what Stig said, the best way for you to see if the patch on
STORM-3102 addresses the issue you are facing is for you get the code that
includes this patch and run your topology to see what is the impact on
performance.

On Tue, Aug 7, 2018 at 8:37 AM Stig Rohde Døssing <[email protected]> wrote:

> Before you start turning other knobs, you should be aware of
> https://issues.apache.org/jira/browse/STORM-3102, which is a performance
> penalty when using Kafka 0.11 or higher. It should be fixed in the latest
> code, but hasn't been released yet.
>
> 2018-08-07 16:53 GMT+02:00 Nithin Uppalapati (BLOOMBERG/ 731 LEX) <
> [email protected]>:
>
>> Hi,
>> Using storm version 1.2.1 and kafka version 1.1.0.
>> I have a live profile of the application. Please, find attached the
>> screen shots.
>>
>> Major utilization is in following methods:
>> KafkaSpout.nextTuple();
>> KafkaSpout.emitIfWaitingNotEmitted();
>> KafkaSpout.pollKafkaBroker();
>>
>> Thanks,
>> Nithin
>>
>> From: [email protected] At: 08/07/18 10:30:07
>> To: [email protected]
>> Subject: Re: Kafka Spout Performance Tuning
>>
>>
>>
>> Which Storm and Kafka versions are you using ? How many Kafka partitions
>> do you have ? Is there a way for you to do a live profile of the
>> application to see what is happening ?
>>
>> You can control the number of records fetched on each poll using
>> properties such as
>>
>> max.poll.records
>> fetch.max.bytes
>> max.partition.fetch.bytes
>>
>> You can check the Kafka new consumer properties documentation for details.
>>
>> Hugo
>>
>> On Aug 7, 2018, at 6:48 AM, Nithin Uppalapati (BLOOMBERG/ 731 LEX) <
>> [email protected]> wrote:
>>
>> Hi,
>>
>> The CPU utilization is going high to around 400% with our topology. So to
>> analyze more deeply and segregate areas of high CPU utilization I commented
>> out the entire topology except the KafkaSpout, so basically my topology
>> only has KafkaSpout and CPU utilization is around 150% on a 20 core
>> machine. Topology is running using a single worker process with Kafka
>> Parallelism set equal to the number of partitions in the kafka.
>>
>> The data load during this phase is a total of 50k records, at a rate of
>> 1600/sec - 2200/sec.
>>
>> Question: how to tune the performance of KafkaSpout, to reduce CPU
>> utilization which is around 150% with just kafkaspout? The below parameters
>> definitions does not give an idea. Also, is there a way to control the
>> reading of data from the kafka in a spout?
>>
>> Following are the values of some of the parameters:
>>
>>    - poll.timeout.ms to 200.
>>    - offset.commit.period.ms to 30000 (30 seconds).
>>    - max.uncommitted.offsets to 10000000 (ten million)
>>
>>
>>
>

Reply via email to