Hi,

The CPU utilization is going high to around 400% with our topology. So to 
analyze more deeply and segregate areas of high CPU utilization I commented out 
the entire topology except the KafkaSpout, so basically my topology only has 
KafkaSpout and CPU utilization is around 150% on a 20 core machine. Topology is 
running using a single worker process with Kafka Parallelism set equal to the 
number of partitions in the kafka. 

The data load during this phase is a total of 50k records, at a rate of 
1600/sec - 2200/sec.

Question: how to tune the performance of KafkaSpout, to reduce CPU utilization 
which is around 150% with just kafkaspout? The below parameters definitions 
does not give an idea. Also, is there a way to control the reading of data from 
the kafka in a spout?

Following are the values of some of the parameters:

*poll.timeout.ms to 200.
*offset.commit.period.ms to 30000 (30 seconds).
*max.uncommitted.offsets to 10000000 (ten million)

Reply via email to