Well, did you see the capacity of your bolts also increasing? Because if the former is true, then that means that your use-case is compute-bound and you end up having increased latency for your workers.
Nick On Mon, Oct 5, 2015 at 11:32 AM, Kashyap Mhaisekar <[email protected]> wrote: > After dropping to 1 worker per node, my latency was bad.... Probably my > use case was different. > > On Mon, Oct 5, 2015 at 10:29 AM, Nick R. Katsipoulakis < > [email protected]> wrote: > >> Hello guys, >> >> This is a really interesting discussion. I am also trying to fine-tune >> the performance of my cluster and especially my end-to-end-latency which >> ranges from 200-1200 msec for a topology with 2 spouts (each one with 2k >> tuples per second input rate) and 3 bolts. My cluster consists of 3 >> zookeeper nodes (1 shared with nimbus) and 6 supervisor nodes, all of them >> being AWS m4.xlarge instances. >> >> I am pretty sure that the latency I am experiencing is ridiculous and I >> currently have no ideas what to do to improve that. I have 3 workers per >> node, which I will drop it to one worker per node after this discussion and >> see if I have better results. >> >> Thanks, >> Nick >> >> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <[email protected]> >> wrote: >> >>> Anshu, >>> My methodology was as follows. Since the true parallelism of a machine >>> is the the no. of cores, I set the workers equal to no. of cores. (5 in my >>> case). That being said, since we have 32 GB per box, we usually leave 50% >>> off leaving us 16 GB spread across 5 machines. Hence we set the worker heap >>> at 3g. >>> >>> This was before Javiers and Michaels suggestion of keeping one JVM per >>> node... >>> >>> Ours is a single topology running on the boxes and hence I would be >>> changing it to one JVM (worker) per box and rerunning. >>> >>> Thanks >>> Kashyap >>> >>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <[email protected]> >>> wrote: >>> >>>> Sorry for reposting !! Any suggestions Please . >>>> >>>> Just one query How we can map - >>>> *1-no of workers to number of cores * >>>> *2-no of slots on one machine to number of cores over that machine* >>>> >>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <[email protected]> >>>> wrote: >>>> >>>>> Hi Javier, >>>>> >>>>> Gotcha, I am seeing the same thing, and I see a ton of worker restarts >>>>> as well. >>>>> >>>>> Thanks >>>>> >>>>> --John >>>>> >>>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <[email protected]> >>>>> wrote: >>>>> >>>>>> I don't have numbers, but I did see a very noticeable degradation of >>>>>> throughput and latency when using multiple workers per node with the same >>>>>> topology. >>>>>> On Oct 5, 2015 7:25 AM, "John Yost" <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Everyone, >>>>>>> >>>>>>> I am curious--are there any benchmark numbers that demonstrate how >>>>>>> much better one worker per node is? The reason I ask is that I may >>>>>>> need to >>>>>>> double up the workers on my cluster and I was wondering how much of a >>>>>>> throughput hit I may take from having two workers per node. >>>>>>> >>>>>>> Any info would be very much appreciated--thanks! :) >>>>>>> >>>>>>> --John >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I would suggest sticking with a single worker per machine. It makes >>>>>>>> memory allocation easier and it makes inter-component communication >>>>>>>> much >>>>>>>> more efficient. Configure the executors with your parallelism hints to >>>>>>>> take >>>>>>>> advantage of all your availabe CPU cores. >>>>>>>> >>>>>>>> Regards, >>>>>>>> JG >>>>>>>> >>>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I was trying to come up with an approach to evaluate the >>>>>>>>> parallelism needed for a topology. >>>>>>>>> >>>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology >>>>>>>>> has one spout and 5 bolts. >>>>>>>>> >>>>>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per >>>>>>>>> machine ie 40 workers over all) >>>>>>>>> 2. Each worker spawns one executor per component per worker, it >>>>>>>>> translates to 6 executors per worker which is 40x6= 240 executors. >>>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave >>>>>>>>> parallelism hint at 40 (total workers), else increase parallelism >>>>>>>>> hint >>>>>>>>> beyond 40 till you hit a number beyond which there is no more visible >>>>>>>>> performance. >>>>>>>>> >>>>>>>>> Does this look right? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Kashyap >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Javier González Nicolini >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> Anshu Shukla >>>> >>> >>> >> >> >> -- >> Nikolaos Romanos Katsipoulakis, >> University of Pittsburgh, PhD student >> > > -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD student
