RE: Kafka Spout rate

Ziemer, Tom Wed, 02 Sep 2015 22:59:05 -0700

Hi,

Find a value that suits your needs – don’t hardcode it – put it in a property 
file so you can change it easily. It does not automatically adjust to any 
changes downstream – it is merely meant to limit the # of tuples currently “in 
transit” – without this setting you’re guaranteed to face OOM since kafka will 
always be faster than your db.


“Without controlling our ingest rate, it’s possible to swamp our topology so 
that it collapses under the weight of incoming data. Max spout pending lets us 
erect a dam in front of our topology, apply back pressure, and avoid being 
overwhelmed. We recommend that, despite the optional nature of max spout 
pending, you always set it.” (see “Storm Applied: Strategies for real-time 
event processing”, chapter 6)

Regards,
Tom


From: Jakes John [mailto:[email protected]]
Sent: Mittwoch, 2. September 2015 17:22
To: [email protected]
Subject: Re: Kafka Spout rate

So, you are asking me to hard code to a value of say 1000. Still, it is a fixed 
value right? how does it automatically adjust to database flush rate?. If 
backend systems get slow, how does the topology automatically adjust and 
throttle the rate?. I once saw that actual error was in database writes but 
storm topology stalled because of OOM exception. I feel that this should be a 
common problem for all.
Also, how can I view the number of pending spout messages at any given time?

Thanks,
Johnu

On Wed, Sep 2, 2015 at 12:05 AM, Ziemer, Tom 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

have a look at Config.TOPOLOGY_MAX_SPOUT_PENDING this should take care of the 
OOM, if set to a prudent value since it determines “The maximum number of 
tuples that can be pending on a spout task at any given time”. 
(https://nathanmarz.github.io/storm/doc/backtype/storm/Config.html)

Regards,
Tom

From: Jakes John 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Mittwoch, 2. September 2015 07:57
To: [email protected]<mailto:[email protected]>
Subject: Kafka Spout rate

Hey,
       I have a 5 node storm cluster. I have deployed a storm topology with 
Kafka Spout that reads from Kafka cluster and a bolt that writes to database.  
When I tested java Kafka consumer independently, I got throughput around 1M 
messages per second.  Also, when I tested my database independently, i got 
throughput maximum around 100k messages per second.    Since, my database is 
very slow at consuming messages, I need to reduce the intake of messages by 
Kafka Spout. Adding more parallelism  to DB bolt doesn't help as I have reached 
the maximum throughput of database.  Periodically I am seeing "Out of memory 
exception" in Kafka Spout and processing stops.
1. How can I reduce the rate of Kafka Spout intake of messages? . I assume the 
reason for OOM exceptions is  that Kafka Spout is fast to read more messages 
from Kafka but, DB bolt is not able to flush the messages to database at the 
same rate.   Is that the reason? I tried playing around by reducing fetchsize 
but it didn't help.
2. Suppose, if my DB bolt is somehow able to flush entire messages  to the 
database at the same rate as Kafka spout, but if database gets slow in the 
future, will the message intake rate get reduced dynamically to ensure that OOM 
exception doesn't happen? How can i pro actively take measures?
3. What is the best way to tune my system parameters? Also, how do I test 
performance(throughput) of my storm topology?

I would like to see how the current storm community deals with my problem
Thanks for your time

RE: Kafka Spout rate

Reply via email to