Hi Everyone, I am curious--are there any benchmark numbers that demonstrate how much better one worker per node is? The reason I ask is that I may need to double up the workers on my cluster and I was wondering how much of a throughput hit I may take from having two workers per node.
Any info would be very much appreciated--thanks! :) --John On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <[email protected]> wrote: > I would suggest sticking with a single worker per machine. It makes memory > allocation easier and it makes inter-component communication much more > efficient. Configure the executors with your parallelism hints to take > advantage of all your availabe CPU cores. > > Regards, > JG > > On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <[email protected]> > wrote: > >> Hi, >> I was trying to come up with an approach to evaluate the parallelism >> needed for a topology. >> >> Assuming I have 5 machines with 8 cores and 32 gb. And my topology has >> one spout and 5 bolts. >> >> 1. Define one worker port per CPU to start off. (= 8 workers per machine >> ie 40 workers over all) >> 2. Each worker spawns one executor per component per worker, it >> translates to 6 executors per worker which is 40x6= 240 executors. >> 3. Of this, if the bolt logic is CPU intensive, then leave parallelism >> hint at 40 (total workers), else increase parallelism hint beyond 40 till >> you hit a number beyond which there is no more visible performance. >> >> Does this look right? >> >> Thanks >> Kashyap >> > > > > -- > Javier González Nicolini >
