Ok, I see now.

So, everytime that Storm asks your spout for another tuple - your spout
doesn't necessarily emit one.  Which means that your topology is
necessarily not being "maxed out".  Or maybe better said, you are not
experiencing topology behavior when MAX_SPOUT_PENDING has been reached and
therefore used to limit the number of records processing within the
topology.

When you are seeing large numbers of tuples in Kestrel MQ, your spout is
more likely being limited by MAX_SPOUT_PENDING.

When you look at your bolts and spouts within the Storm UI, what number do
you see for capacity?  The number will vary from 0 to 1.  The closer the
number to 1, the fewer additional in process tuples you can expect to add
to the topology and expect results.

Note that there are 3 spout level Latencies :
* per spout - complete latency milliseconds
* per bolt  - process latency milliseconds
* per bolt - execution latency milliseconds

Complete Latency - how long does it take a tuple to flow all the way
through the topology and back to the spout
Process Latency - how long does it take a tuple to flow through the worker
Execution latency - how long does it take a tuple to flow through a bolt's
execute method

Complete latency, therefore, is made up of the process latency and
execution latency of every bolt in the topology, plus latency due to
something else....I myself thing of this as the missing latency or system's
latency.

I've noticed that as you increase the number of in process tuples ( via
MAX_SPOUT_PENDING ), that the complete latency increases much quicker than
the execution and process latency of individual bolts.  In fact, what I
have seen is that at a certain point of increasing in process tuples, the
records processed per millisecond begins to drop.  An this appears to be
solely related to the missing aka system latency.

It sounds to me like what you are experiencing is this very thing.  I think
that the solution is to add bolt instances, which then may lead you to
adding cpu's.


Thank you for your time!

+++++++++++++++++++++
Jeff Maass <[email protected]>
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++


On Tue, May 12, 2015 at 9:08 AM, Kutlu Araslı <[email protected]> wrote:

> I meant our tuple queues in Kestrel MQ which spout consumes.
>
>
> 12 May 2015 Sal, 17:00 tarihinde, Jeffery Maass <[email protected]> şunu
> yazdı:
>
> To what number / metric are you referring when you say, "When number of
>> tuples increases in queue"?  What you are describing sounds like the
>> beginning of queue explosion.  If so, increasing max spout pending will
>> make the situation worse.
>>
>> Thank you for your time!
>>
>> +++++++++++++++++++++
>> Jeff Maass <[email protected]>
>> linkedin.com/in/jeffmaass
>> stackoverflow.com/users/373418/maassql
>> +++++++++++++++++++++
>>
>>
>> On Tue, May 12, 2015 at 6:22 AM, Kutlu Araslı <[email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> Our topology consumes tuples from a Kestrel MQ and runs a series of
>>> bolts to process items including some db connections. Storm version is
>>> 0.8.3 and supervisors are run on VMs.
>>> When number of tuples increases in queue, we observe that, a single
>>> tuple execution time also rise  dramatically in paralel which ends up with
>>> a throttle behaviour.
>>> In the meantime CPU and memory usage looks comfortable.From database
>>> point, we have not observed a problem so far under stress.
>>> Is there any configuration trick or an advice for handling such a load?
>>> There is already a limit on MAX_SPOUT_PENDING as 32.
>>>
>>> Thanks,
>>>
>>>
>>>
>>

Reply via email to