Hi Nick, Yeah, I am seeing decreased throughput with 2 workers/node as well as instability where communcation exceptions are thrown and workers restart. I need to confirm if the communcation exceptions are the root exception(s) or are they a manifestation. Gonna pour through some logs this afternoon. :)
--John On Mon, Oct 5, 2015 at 11:29 AM, Nick R. Katsipoulakis < [email protected]> wrote: > Hello guys, > > This is a really interesting discussion. I am also trying to fine-tune the > performance of my cluster and especially my end-to-end-latency which ranges > from 200-1200 msec for a topology with 2 spouts (each one with 2k tuples > per second input rate) and 3 bolts. My cluster consists of 3 zookeeper > nodes (1 shared with nimbus) and 6 supervisor nodes, all of them being AWS > m4.xlarge instances. > > I am pretty sure that the latency I am experiencing is ridiculous and I > currently have no ideas what to do to improve that. I have 3 workers per > node, which I will drop it to one worker per node after this discussion and > see if I have better results. > > Thanks, > Nick > > On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <[email protected]> > wrote: > >> Anshu, >> My methodology was as follows. Since the true parallelism of a machine is >> the the no. of cores, I set the workers equal to no. of cores. (5 in my >> case). That being said, since we have 32 GB per box, we usually leave 50% >> off leaving us 16 GB spread across 5 machines. Hence we set the worker heap >> at 3g. >> >> This was before Javiers and Michaels suggestion of keeping one JVM per >> node... >> >> Ours is a single topology running on the boxes and hence I would be >> changing it to one JVM (worker) per box and rerunning. >> >> Thanks >> Kashyap >> >> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <[email protected]> >> wrote: >> >>> Sorry for reposting !! Any suggestions Please . >>> >>> Just one query How we can map - >>> *1-no of workers to number of cores * >>> *2-no of slots on one machine to number of cores over that machine* >>> >>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <[email protected]> >>> wrote: >>> >>>> Hi Javier, >>>> >>>> Gotcha, I am seeing the same thing, and I see a ton of worker restarts >>>> as well. >>>> >>>> Thanks >>>> >>>> --John >>>> >>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <[email protected]> >>>> wrote: >>>> >>>>> I don't have numbers, but I did see a very noticeable degradation of >>>>> throughput and latency when using multiple workers per node with the same >>>>> topology. >>>>> On Oct 5, 2015 7:25 AM, "John Yost" <[email protected]> wrote: >>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> I am curious--are there any benchmark numbers that demonstrate how >>>>>> much better one worker per node is? The reason I ask is that I may need >>>>>> to >>>>>> double up the workers on my cluster and I was wondering how much of a >>>>>> throughput hit I may take from having two workers per node. >>>>>> >>>>>> Any info would be very much appreciated--thanks! :) >>>>>> >>>>>> --John >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I would suggest sticking with a single worker per machine. It makes >>>>>>> memory allocation easier and it makes inter-component communication much >>>>>>> more efficient. Configure the executors with your parallelism hints to >>>>>>> take >>>>>>> advantage of all your availabe CPU cores. >>>>>>> >>>>>>> Regards, >>>>>>> JG >>>>>>> >>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I was trying to come up with an approach to evaluate the >>>>>>>> parallelism needed for a topology. >>>>>>>> >>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology >>>>>>>> has one spout and 5 bolts. >>>>>>>> >>>>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per >>>>>>>> machine ie 40 workers over all) >>>>>>>> 2. Each worker spawns one executor per component per worker, it >>>>>>>> translates to 6 executors per worker which is 40x6= 240 executors. >>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave >>>>>>>> parallelism hint at 40 (total workers), else increase parallelism hint >>>>>>>> beyond 40 till you hit a number beyond which there is no more visible >>>>>>>> performance. >>>>>>>> >>>>>>>> Does this look right? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Kashyap >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Javier González Nicolini >>>>>>> >>>>>> >>>>>> >>>> >>> >>> >>> -- >>> Thanks & Regards, >>> Anshu Shukla >>> >> >> > > > -- > Nikolaos Romanos Katsipoulakis, > University of Pittsburgh, PhD student >
