Hi,

have a look at Config.TOPOLOGY_MAX_SPOUT_PENDING this should take care of the 
OOM, if set to a prudent value since it determines “The maximum number of 
tuples that can be pending on a spout task at any given time”. 
(https://nathanmarz.github.io/storm/doc/backtype/storm/Config.html)

Regards,
Tom

From: Jakes John [mailto:[email protected]]
Sent: Mittwoch, 2. September 2015 07:57
To: [email protected]
Subject: Kafka Spout rate

Hey,
       I have a 5 node storm cluster. I have deployed a storm topology with 
Kafka Spout that reads from Kafka cluster and a bolt that writes to database.  
When I tested java Kafka consumer independently, I got throughput around 1M 
messages per second.  Also, when I tested my database independently, i got 
throughput maximum around 100k messages per second.    Since, my database is 
very slow at consuming messages, I need to reduce the intake of messages by 
Kafka Spout. Adding more parallelism  to DB bolt doesn't help as I have reached 
the maximum throughput of database.  Periodically I am seeing "Out of memory 
exception" in Kafka Spout and processing stops.

1. How can I reduce the rate of Kafka Spout intake of messages? . I assume the 
reason for OOM exceptions is  that Kafka Spout is fast to read more messages 
from Kafka but, DB bolt is not able to flush the messages to database at the 
same rate.   Is that the reason? I tried playing around by reducing fetchsize 
but it didn't help.
2. Suppose, if my DB bolt is somehow able to flush entire messages  to the 
database at the same rate as Kafka spout, but if database gets slow in the 
future, will the message intake rate get reduced dynamically to ensure that OOM 
exception doesn't happen? How can i pro actively take measures?
3. What is the best way to tune my system parameters? Also, how do I test 
performance(throughput) of my storm topology?

I would like to see how the current storm community deals with my problem
Thanks for your time

Reply via email to