Ok, I see now. So, everytime that Storm asks your spout for another tuple - your spout doesn't necessarily emit one. Which means that your topology is necessarily not being "maxed out". Or maybe better said, you are not experiencing topology behavior when MAX_SPOUT_PENDING has been reached and therefore used to limit the number of records processing within the topology.
When you are seeing large numbers of tuples in Kestrel MQ, your spout is more likely being limited by MAX_SPOUT_PENDING. When you look at your bolts and spouts within the Storm UI, what number do you see for capacity? The number will vary from 0 to 1. The closer the number to 1, the fewer additional in process tuples you can expect to add to the topology and expect results. Note that there are 3 spout level Latencies : * per spout - complete latency milliseconds * per bolt - process latency milliseconds * per bolt - execution latency milliseconds Complete Latency - how long does it take a tuple to flow all the way through the topology and back to the spout Process Latency - how long does it take a tuple to flow through the worker Execution latency - how long does it take a tuple to flow through a bolt's execute method Complete latency, therefore, is made up of the process latency and execution latency of every bolt in the topology, plus latency due to something else....I myself thing of this as the missing latency or system's latency. I've noticed that as you increase the number of in process tuples ( via MAX_SPOUT_PENDING ), that the complete latency increases much quicker than the execution and process latency of individual bolts. In fact, what I have seen is that at a certain point of increasing in process tuples, the records processed per millisecond begins to drop. An this appears to be solely related to the missing aka system latency. It sounds to me like what you are experiencing is this very thing. I think that the solution is to add bolt instances, which then may lead you to adding cpu's. Thank you for your time! +++++++++++++++++++++ Jeff Maass <[email protected]> linkedin.com/in/jeffmaass stackoverflow.com/users/373418/maassql +++++++++++++++++++++ On Tue, May 12, 2015 at 9:08 AM, Kutlu Araslı <[email protected]> wrote: > I meant our tuple queues in Kestrel MQ which spout consumes. > > > 12 May 2015 Sal, 17:00 tarihinde, Jeffery Maass <[email protected]> şunu > yazdı: > > To what number / metric are you referring when you say, "When number of >> tuples increases in queue"? What you are describing sounds like the >> beginning of queue explosion. If so, increasing max spout pending will >> make the situation worse. >> >> Thank you for your time! >> >> +++++++++++++++++++++ >> Jeff Maass <[email protected]> >> linkedin.com/in/jeffmaass >> stackoverflow.com/users/373418/maassql >> +++++++++++++++++++++ >> >> >> On Tue, May 12, 2015 at 6:22 AM, Kutlu Araslı <[email protected]> wrote: >> >>> Hi everyone, >>> >>> Our topology consumes tuples from a Kestrel MQ and runs a series of >>> bolts to process items including some db connections. Storm version is >>> 0.8.3 and supervisors are run on VMs. >>> When number of tuples increases in queue, we observe that, a single >>> tuple execution time also rise dramatically in paralel which ends up with >>> a throttle behaviour. >>> In the meantime CPU and memory usage looks comfortable.From database >>> point, we have not observed a problem so far under stress. >>> Is there any configuration trick or an advice for handling such a load? >>> There is already a limit on MAX_SPOUT_PENDING as 32. >>> >>> Thanks, >>> >>> >>> >>
