Re: Decreasing Complete latency with growing number of executors

Dima Dragan Wed, 20 May 2015 06:47:32 -0700

Nathan,

Thank you very much for your explanation and time!
I`ve got it.


On Wed, May 20, 2015 at 4:19 PM, Nathan Leung <[email protected]> wrote:

> I'll use a somewhat canned example for illustrative purposes.  Let's say
> the spout has emitted 1000 tuples and they are sitting in the output queue,
> and your bolt is the only bolt in the system.
>
> If you have 2 executors, each tuple takes 5.6ms.  This means that each
> bolt can process 178.5 tuples / s, and combined they can process 357 tuples
> / s.  This means it will take 2.8 seconds to process all of the tuples.
> The average tuple complete latency in this case will be 1.4ms (divide total
> time by 2 for the average end time, and by 1000 because that is how many
> tuples we have).
>
> If you have 64 executors, each tuple takes 28.5ms.  This means each bolt
> can process 35 tuples / s, and combined they can process 2245 tuples / s
> (note the values aren't precise to make things more legible).  This means
> it takes 0.445 seconds to process your tuples.  Average complete latency is
> 0.222ms.
>
> On Wed, May 20, 2015 at 8:12 AM, Dima Dragan <[email protected]>
> wrote:
>
>> Nathan,
>>
>> Process and execute latency are growing, should it mean that we spend
>> more time for processing tuple, cause it spends more time in bolt queue?
>>
>> I thought that "Complete latency" and "Process latency" should be
>> correlated. Am I right?
>>
>>
>> On Wed, May 20, 2015 at 2:10 PM, Nathan Leung <[email protected]> wrote:
>>
>>> My point with increased throughput was that if you have items queued
>>> from the spout waiting to be processed, that counts towards the complete
>>> latency for the spout. If your bolts go through the tuples faster (and as
>>> you add more they do, you have 6x speedup from more bolts) then you will
>>> see the complete latency drop.
>>> On May 20, 2015 4:01 AM, "Dima Dragan" <[email protected]> wrote:
>>>
>>>> Thank you, Jeffrey and Devang for your answers.
>>>>
>>>> Jeffrey, as far as I use shuffle grouping, I think, network
>>>> serialization will left, but there will be no network delays (for remove it
>>>> there is localOrShuffling grouping). For all experiments, I use only one
>>>> worker, so it does not explain why complete latency could decrease.
>>>>
>>>> But I think you are right about definitions)
>>>>
>>>> Devang, no, I set up 1 worker and 1 acker for all tests.
>>>>
>>>>
>>>> Best regards,
>>>> Dmytro Dragan
>>>> On May 20, 2015 05:03, "Devang Shah" <[email protected]> wrote:
>>>>
>>>>> Was the number of workers or number of ackers changed across your
>>>>> experiments ? What are the numbers you used ?
>>>>>
>>>>> When you have many executors, increasing the ackers reduces the
>>>>> complete latency.
>>>>>
>>>>> Thanks and Regards,
>>>>> Devang
>>>>>  On 20 May 2015 03:15, "Jeffery Maass" <[email protected]> wrote:
>>>>>
>>>>>> Maybe the difference has to do with where the executors were
>>>>>> running.  If your entire topology is running within the same worker, it
>>>>>> would mean that a serialization for the worker to worker networking layer
>>>>>> is left out of the picture.  I suppose that would mean the complete 
>>>>>> latency
>>>>>> could decrease.  At the same time, process latency could very well
>>>>>> increase, since all the work is being done within the same worker.  My
>>>>>> understanding that process latency is measured from the time the tuple
>>>>>> enters the executor until it leaves the executor.  Or was it from the 
>>>>>> time
>>>>>> the tuple enters the worker until it leaves the worker?  I don't recall.
>>>>>>
>>>>>> I bet a firm definition of the latency terms would shed some light.
>>>>>>
>>>>>> Thank you for your time!
>>>>>>
>>>>>> +++++++++++++++++++++
>>>>>> Jeff Maass <[email protected]>
>>>>>> linkedin.com/in/jeffmaass
>>>>>> stackoverflow.com/users/373418/maassql
>>>>>> +++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Thanks Nathan for your answer,
>>>>>>>
>>>>>>> But I`m afraid that you understand me wrong :  With increasing
>>>>>>> executors by 32x, each executor's throughput *increased* by 5x, but
>>>>>>> complete latency dropped.
>>>>>>>
>>>>>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It depends on your application and the characteristics of the io.
>>>>>>>> You increased executors by 32x and each executor's throughput dropped 
>>>>>>>> by
>>>>>>>> 5x, so it makes sense that latency will drop.
>>>>>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I have found a strange behavior in topology metrics.
>>>>>>>>>
>>>>>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>>>>>>> Spout A -> Bolt B -> Bolt C
>>>>>>>>>
>>>>>>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping)
>>>>>>>>> each to Bolt C. Also Bolts B and C make some read/write operations to 
>>>>>>>>> db.
>>>>>>>>>
>>>>>>>>> Input flow is continuous and static.
>>>>>>>>>
>>>>>>>>> Based on logic, setting up a more higher number of executors for
>>>>>>>>> Bolt C than number of cores should be useless (the bigger part of 
>>>>>>>>> threads
>>>>>>>>> will be sleeping).
>>>>>>>>> It is confirmed by increasing execute and process latency.
>>>>>>>>>
>>>>>>>>> But I noticed that complete latency has started to decrease. And I
>>>>>>>>> do not understand why.
>>>>>>>>>
>>>>>>>>> For example, stats for bolt C:
>>>>>>>>>
>>>>>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.2764
>>>>>>>>> 6.3526.36428.432345.454
>>>>>>>>>
>>>>>>>>> Is it side effect of IO bound tasks?
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Dmytro Dragan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Dmytro Dragan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>
>>
>> --
>> Best regards,
>> Dmytro Dragan
>>
>>
>>
>


-- 
Best regards,
Dmytro Dragan

Re: Decreasing Complete latency with growing number of executors

Reply via email to