I don't have numbers, but I did see a very noticeable degradation of throughput and latency when using multiple workers per node with the same topology. On Oct 5, 2015 7:25 AM, "John Yost" <[email protected]> wrote:
> Hi Everyone, > > I am curious--are there any benchmark numbers that demonstrate how much > better one worker per node is? The reason I ask is that I may need to > double up the workers on my cluster and I was wondering how much of a > throughput hit I may take from having two workers per node. > > Any info would be very much appreciated--thanks! :) > > --John > > > > On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <[email protected]> > wrote: > >> I would suggest sticking with a single worker per machine. It makes >> memory allocation easier and it makes inter-component communication much >> more efficient. Configure the executors with your parallelism hints to take >> advantage of all your availabe CPU cores. >> >> Regards, >> JG >> >> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <[email protected]> >> wrote: >> >>> Hi, >>> I was trying to come up with an approach to evaluate the parallelism >>> needed for a topology. >>> >>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology has >>> one spout and 5 bolts. >>> >>> 1. Define one worker port per CPU to start off. (= 8 workers per machine >>> ie 40 workers over all) >>> 2. Each worker spawns one executor per component per worker, it >>> translates to 6 executors per worker which is 40x6= 240 executors. >>> 3. Of this, if the bolt logic is CPU intensive, then leave parallelism >>> hint at 40 (total workers), else increase parallelism hint beyond 40 till >>> you hit a number beyond which there is no more visible performance. >>> >>> Does this look right? >>> >>> Thanks >>> Kashyap >>> >> >> >> >> -- >> Javier González Nicolini >> > >
