Nathan, Thank you very much for your explanation and time! I`ve got it.
On Wed, May 20, 2015 at 4:19 PM, Nathan Leung <[email protected]> wrote: > I'll use a somewhat canned example for illustrative purposes. Let's say > the spout has emitted 1000 tuples and they are sitting in the output queue, > and your bolt is the only bolt in the system. > > If you have 2 executors, each tuple takes 5.6ms. This means that each > bolt can process 178.5 tuples / s, and combined they can process 357 tuples > / s. This means it will take 2.8 seconds to process all of the tuples. > The average tuple complete latency in this case will be 1.4ms (divide total > time by 2 for the average end time, and by 1000 because that is how many > tuples we have). > > If you have 64 executors, each tuple takes 28.5ms. This means each bolt > can process 35 tuples / s, and combined they can process 2245 tuples / s > (note the values aren't precise to make things more legible). This means > it takes 0.445 seconds to process your tuples. Average complete latency is > 0.222ms. > > On Wed, May 20, 2015 at 8:12 AM, Dima Dragan <[email protected]> > wrote: > >> Nathan, >> >> Process and execute latency are growing, should it mean that we spend >> more time for processing tuple, cause it spends more time in bolt queue? >> >> I thought that "Complete latency" and "Process latency" should be >> correlated. Am I right? >> >> >> On Wed, May 20, 2015 at 2:10 PM, Nathan Leung <[email protected]> wrote: >> >>> My point with increased throughput was that if you have items queued >>> from the spout waiting to be processed, that counts towards the complete >>> latency for the spout. If your bolts go through the tuples faster (and as >>> you add more they do, you have 6x speedup from more bolts) then you will >>> see the complete latency drop. >>> On May 20, 2015 4:01 AM, "Dima Dragan" <[email protected]> wrote: >>> >>>> Thank you, Jeffrey and Devang for your answers. >>>> >>>> Jeffrey, as far as I use shuffle grouping, I think, network >>>> serialization will left, but there will be no network delays (for remove it >>>> there is localOrShuffling grouping). For all experiments, I use only one >>>> worker, so it does not explain why complete latency could decrease. >>>> >>>> But I think you are right about definitions) >>>> >>>> Devang, no, I set up 1 worker and 1 acker for all tests. >>>> >>>> >>>> Best regards, >>>> Dmytro Dragan >>>> On May 20, 2015 05:03, "Devang Shah" <[email protected]> wrote: >>>> >>>>> Was the number of workers or number of ackers changed across your >>>>> experiments ? What are the numbers you used ? >>>>> >>>>> When you have many executors, increasing the ackers reduces the >>>>> complete latency. >>>>> >>>>> Thanks and Regards, >>>>> Devang >>>>> On 20 May 2015 03:15, "Jeffery Maass" <[email protected]> wrote: >>>>> >>>>>> Maybe the difference has to do with where the executors were >>>>>> running. If your entire topology is running within the same worker, it >>>>>> would mean that a serialization for the worker to worker networking layer >>>>>> is left out of the picture. I suppose that would mean the complete >>>>>> latency >>>>>> could decrease. At the same time, process latency could very well >>>>>> increase, since all the work is being done within the same worker. My >>>>>> understanding that process latency is measured from the time the tuple >>>>>> enters the executor until it leaves the executor. Or was it from the >>>>>> time >>>>>> the tuple enters the worker until it leaves the worker? I don't recall. >>>>>> >>>>>> I bet a firm definition of the latency terms would shed some light. >>>>>> >>>>>> Thank you for your time! >>>>>> >>>>>> +++++++++++++++++++++ >>>>>> Jeff Maass <[email protected]> >>>>>> linkedin.com/in/jeffmaass >>>>>> stackoverflow.com/users/373418/maassql >>>>>> +++++++++++++++++++++ >>>>>> >>>>>> >>>>>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks Nathan for your answer, >>>>>>> >>>>>>> But I`m afraid that you understand me wrong : With increasing >>>>>>> executors by 32x, each executor's throughput *increased* by 5x, but >>>>>>> complete latency dropped. >>>>>>> >>>>>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> It depends on your application and the characteristics of the io. >>>>>>>> You increased executors by 32x and each executor's throughput dropped >>>>>>>> by >>>>>>>> 5x, so it makes sense that latency will drop. >>>>>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I have found a strange behavior in topology metrics. >>>>>>>>> >>>>>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology >>>>>>>>> Spout A -> Bolt B -> Bolt C >>>>>>>>> >>>>>>>>> Bolt B splits message on 320 parts and emits (shuffle grouping) >>>>>>>>> each to Bolt C. Also Bolts B and C make some read/write operations to >>>>>>>>> db. >>>>>>>>> >>>>>>>>> Input flow is continuous and static. >>>>>>>>> >>>>>>>>> Based on logic, setting up a more higher number of executors for >>>>>>>>> Bolt C than number of cores should be useless (the bigger part of >>>>>>>>> threads >>>>>>>>> will be sleeping). >>>>>>>>> It is confirmed by increasing execute and process latency. >>>>>>>>> >>>>>>>>> But I noticed that complete latency has started to decrease. And I >>>>>>>>> do not understand why. >>>>>>>>> >>>>>>>>> For example, stats for bolt C: >>>>>>>>> >>>>>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.2764 >>>>>>>>> 6.3526.36428.432345.454 >>>>>>>>> >>>>>>>>> Is it side effect of IO bound tasks? >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards, >>>>>>>>> Dmytro Dragan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Dmytro Dragan >>>>>>> >>>>>>> >>>>>>> >>>>>> >> >> >> -- >> Best regards, >> Dmytro Dragan >> >> >> > -- Best regards, Dmytro Dragan
