Hi all,

I am using Storm-0.9.4 to build a Trident topology that uses
OpaqueTridentKafkaSpout to read data
from Kafka, processes it and save the result to the database. The topology
is working but I am seeing
it uses a lot of CPU. When the topology is idle (no new data in spout),
each process use 30-60%
of the CPU and when it starts consuming data the CPU is even much higher
(160-250% each).
I tried to tune the GC but it does not help, the CPU usage is still high. I
don't know if I have missing
anything or whether I missed configure it. This is the configuration that I
am using:

>worker.childopts: "-verbose:gc -Xmx4096m -Xms4096m -Xss256k
-XX:NewSize=3200m -XX:MaxNewSize=3200m -XX:MaxPermSize=128m
-XX:PermSize=96m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+AggressiveOpts -XX:+UseCompressedOops -XX:+CMSParallelRemarkEnabled
-XX:-CMSConcurrentMTEnabled -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=75 -XX:MaxTenuringThreshold=4
-XX:SurvivorRatio=9 -Djava.net.preferIPv4Stack=true
-Xloggc:/var/log/storm/gc-worker-%ID%.log -XX:GCLogFileSize=1m
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:+PrintGCDateStamps
-XX:+PrintGCDetails"
>topology.receiver.buffer.size: 8
>topology.transfer.buffer.size: 1024
>topology.executor.receive.buffer.size: 1024
>topology.executor.send.buffer.size: 2048
>topology.sleep.spout.wait.strategy.time.ms: 100
>storm.messaging.netty.server_worker_threads: 4
>storm.messaging.netty.client_worker_threads: 4
>storm.messaging.netty.buffer_size: 10485760

The box that I am running has 24 cores and the topology has 11 workers, 212
executors. The spout
has 5 tasks to read from a topic with 10 partitions, the batch's max size
is set to 1MB, with this setting
the topology can pull 7-8MB/sec from Kafka and process latency is about
1200ms - 1500ms.

I can't figure out why the CPU is so high, can anyone please help?

Thank you
-Binh

Reply via email to