Hi, have a look at Config.TOPOLOGY_MAX_SPOUT_PENDING this should take care of the OOM, if set to a prudent value since it determines “The maximum number of tuples that can be pending on a spout task at any given time”. (https://nathanmarz.github.io/storm/doc/backtype/storm/Config.html)
Regards, Tom From: Jakes John [mailto:[email protected]] Sent: Mittwoch, 2. September 2015 07:57 To: [email protected] Subject: Kafka Spout rate Hey, I have a 5 node storm cluster. I have deployed a storm topology with Kafka Spout that reads from Kafka cluster and a bolt that writes to database. When I tested java Kafka consumer independently, I got throughput around 1M messages per second. Also, when I tested my database independently, i got throughput maximum around 100k messages per second. Since, my database is very slow at consuming messages, I need to reduce the intake of messages by Kafka Spout. Adding more parallelism to DB bolt doesn't help as I have reached the maximum throughput of database. Periodically I am seeing "Out of memory exception" in Kafka Spout and processing stops. 1. How can I reduce the rate of Kafka Spout intake of messages? . I assume the reason for OOM exceptions is that Kafka Spout is fast to read more messages from Kafka but, DB bolt is not able to flush the messages to database at the same rate. Is that the reason? I tried playing around by reducing fetchsize but it didn't help. 2. Suppose, if my DB bolt is somehow able to flush entire messages to the database at the same rate as Kafka spout, but if database gets slow in the future, will the message intake rate get reduced dynamically to ensure that OOM exception doesn't happen? How can i pro actively take measures? 3. What is the best way to tune my system parameters? Also, how do I test performance(throughput) of my storm topology? I would like to see how the current storm community deals with my problem Thanks for your time
