Re: Max Spout Pending - Question

Kashyap Mhaisekar Wed, 29 Jul 2015 14:51:18 -0700

Thanks Nathan.
Will go with assumption that if latency at spout is much higher than
combined latencies at rest of bolts, then the additional latencies at spout
are due to messages piling on out bound queue..


Regards
Kashyap
On Jul 29, 2015 1:49 PM, "Nathan Leung" <[email protected]> wrote:

> Roughly speaking (disregarding any network latencies and any benefits from
> having multiple threads servicing the output queues vs having 1), having 1
> spout with high max pending should be similar to many spouts with low max
> pending.  Your total number of tuples pending in the topology is the same
> either way.
>
> Spout latency includes processing time for the for the entire tuple tree.
> https://storm.apache.org/documentation/Guaranteeing-message-processing.html
>
> Also I would review
> https://storm.apache.org/documentation/Acking-framework-implementation.html
> with regards to your timeout question.
>
> On Wed, Jul 29, 2015 at 2:16 PM, Kashyap Mhaisekar <[email protected]>
> wrote:
>
>> Nathan,
>> So the following is true? -
>> Spout Latency = (Time spent output queues *[A]*)+(time difference
>> between emit from NextTuple and time spent in acking *[B]*)
>>
>> So does it mean that if the complete latency at spout level is high but
>> the bolts have very low latencies, then instead of increasing the Max Spout
>> Pending, we can keep the max spout pending to a low number but increase the
>> parallelism in the Spout so that the overall messages in the topology could
>> be higher but the load on individual spout instance is low.
>>
>> I was using a RedisSpout that gets message from a Redis Publish and then
>> populates a in memory queue. The nextTuple feeds off this queue. I am
>> constrained to use only one instance of Spout as multiple instances meant
>> all of them listen to the same message and call topology.
>>
>> Thanks
>> Kashyap
>>
>> On Wed, Jul 29, 2015 at 11:59 AM, Nathan Leung <[email protected]> wrote:
>>
>>> 1 second is too short.  Spout latency includes time spent in the output
>>> queue from the spout (increasing max spout pending potentially increases
>>> your end-to-end latency, depending on whether you have anything buffered in
>>> the spout output queues).
>>>
>>> On Wed, Jul 29, 2015 at 12:40 PM, Kashyap Mhaisekar <[email protected]
>>> > wrote:
>>>
>>>> Thanks Nathan. But in this case how should the Spout Latency be
>>>> interpreted. In the same example you quoted above -
>>>> spout a -> bolt b (emits 10 tuples per msg) -> bolt c
>>>>
>>>> I see the process latency and execute latencies under 5 ms both for B
>>>> and C. While the spout is at 1500ms. The Bolts dont don anything much other
>>>> than appending to an existing string. From what I understand, the complete
>>>> latency at a spout level is the time spent from nextTuple() to the time
>>>> ack() is called (if successful) and does not include the time the message
>>>> is spent waiting because of the Max Spout Topology property. To add to the
>>>> mysterry, i set the messasge timeout is at 1 sec. I dont see any failures
>>>> (fail() not called) but the spout latency is at 1.5 seconds.
>>>>
>>>> Regards,
>>>> Kashyap
>>>>
>>>> On Wed, Jul 29, 2015 at 10:35 AM, Nathan Leung <[email protected]>
>>>> wrote:
>>>>
>>>>> No.  You need to consider your system more carefully.  As a trivial
>>>>> example, imagine you have spout a -> bolt b -> bolt c, with bolt b
>>>>> splitting tuple into 10 tuples.  Each component has 1 task.  If each
>>>>> component takes 1ms, your latency will not be the sum of the per bolt
>>>>> latency because of your fan out.
>>>>> On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Nathan.
>>>>>>
>>>>>> If I see the complete latency at spout is greater than the process
>>>>>> latencies of all bolts put together, does it mean that the ACKERS are a
>>>>>> problem and need to be increased?
>>>>>>
>>>>>> thanks
>>>>>> kashyap
>>>>>>
>>>>>> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> The count is tracked from each spout task and does not include bolt
>>>>>>> fan out. If the setting is 100 and you have 8 spout tasks you can have 
>>>>>>> 800
>>>>>>> tuples from the spout in your system.
>>>>>>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> Does Max Spout Topology limitation apply to tuples emitted out of
>>>>>>>> bolts too?
>>>>>>>> For E.g.,.
>>>>>>>> 1. MAX_SPOUT_PENDING value is 1000
>>>>>>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>>>>>>
>>>>>>>> Does this mean there can be 1000X1000 tuples in the topology? Or
>>>>>>>> does it mean that only one tuple is emitted from Spout because each 
>>>>>>>> bolt
>>>>>>>> emits 1000 tuples?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> kashyap
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Max Spout Pending - Question

Reply via email to