Michael, You referred to a link on Gumbo on wassermelon. Did you use it? Are there docs/api/examples that I can use?
Thanks Kashyap On Mon, Oct 5, 2015 at 11:05 AM, Kashyap Mhaisekar <[email protected]> wrote: > I changed it back before I could look at capacity. Will check again. > My use case is compute bound for sure because what happens is one tuple to > my spout creates 20 more tuples downstream with each one iterating for 500 > times. And I get tuples at high TPS rates. > > On Mon, Oct 5, 2015 at 10:34 AM, Nick R. Katsipoulakis < > [email protected]> wrote: > >> Well, did you see the capacity of your bolts also increasing? Because if >> the former is true, then that means that your use-case is compute-bound and >> you end up having increased latency for your workers. >> >> Nick >> >> On Mon, Oct 5, 2015 at 11:32 AM, Kashyap Mhaisekar <[email protected]> >> wrote: >> >>> After dropping to 1 worker per node, my latency was bad.... Probably my >>> use case was different. >>> >>> On Mon, Oct 5, 2015 at 10:29 AM, Nick R. Katsipoulakis < >>> [email protected]> wrote: >>> >>>> Hello guys, >>>> >>>> This is a really interesting discussion. I am also trying to fine-tune >>>> the performance of my cluster and especially my end-to-end-latency which >>>> ranges from 200-1200 msec for a topology with 2 spouts (each one with 2k >>>> tuples per second input rate) and 3 bolts. My cluster consists of 3 >>>> zookeeper nodes (1 shared with nimbus) and 6 supervisor nodes, all of them >>>> being AWS m4.xlarge instances. >>>> >>>> I am pretty sure that the latency I am experiencing is ridiculous and I >>>> currently have no ideas what to do to improve that. I have 3 workers per >>>> node, which I will drop it to one worker per node after this discussion and >>>> see if I have better results. >>>> >>>> Thanks, >>>> Nick >>>> >>>> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <[email protected] >>>> > wrote: >>>> >>>>> Anshu, >>>>> My methodology was as follows. Since the true parallelism of a machine >>>>> is the the no. of cores, I set the workers equal to no. of cores. (5 in my >>>>> case). That being said, since we have 32 GB per box, we usually leave 50% >>>>> off leaving us 16 GB spread across 5 machines. Hence we set the worker >>>>> heap >>>>> at 3g. >>>>> >>>>> This was before Javiers and Michaels suggestion of keeping one JVM per >>>>> node... >>>>> >>>>> Ours is a single topology running on the boxes and hence I would be >>>>> changing it to one JVM (worker) per box and rerunning. >>>>> >>>>> Thanks >>>>> Kashyap >>>>> >>>>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <[email protected]> >>>>> wrote: >>>>> >>>>>> Sorry for reposting !! Any suggestions Please . >>>>>> >>>>>> Just one query How we can map - >>>>>> *1-no of workers to number of cores * >>>>>> *2-no of slots on one machine to number of cores over that machine* >>>>>> >>>>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Javier, >>>>>>> >>>>>>> Gotcha, I am seeing the same thing, and I see a ton of worker >>>>>>> restarts as well. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> --John >>>>>>> >>>>>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I don't have numbers, but I did see a very noticeable degradation >>>>>>>> of throughput and latency when using multiple workers per node with the >>>>>>>> same topology. >>>>>>>> On Oct 5, 2015 7:25 AM, "John Yost" <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Everyone, >>>>>>>>> >>>>>>>>> I am curious--are there any benchmark numbers that demonstrate how >>>>>>>>> much better one worker per node is? The reason I ask is that I may >>>>>>>>> need to >>>>>>>>> double up the workers on my cluster and I was wondering how much of a >>>>>>>>> throughput hit I may take from having two workers per node. >>>>>>>>> >>>>>>>>> Any info would be very much appreciated--thanks! :) >>>>>>>>> >>>>>>>>> --John >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I would suggest sticking with a single worker per machine. It >>>>>>>>>> makes memory allocation easier and it makes inter-component >>>>>>>>>> communication >>>>>>>>>> much more efficient. Configure the executors with your parallelism >>>>>>>>>> hints to >>>>>>>>>> take advantage of all your availabe CPU cores. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> JG >>>>>>>>>> >>>>>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> I was trying to come up with an approach to evaluate the >>>>>>>>>>> parallelism needed for a topology. >>>>>>>>>>> >>>>>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my >>>>>>>>>>> topology has one spout and 5 bolts. >>>>>>>>>>> >>>>>>>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per >>>>>>>>>>> machine ie 40 workers over all) >>>>>>>>>>> 2. Each worker spawns one executor per component per worker, it >>>>>>>>>>> translates to 6 executors per worker which is 40x6= 240 executors. >>>>>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave >>>>>>>>>>> parallelism hint at 40 (total workers), else increase parallelism >>>>>>>>>>> hint >>>>>>>>>>> beyond 40 till you hit a number beyond which there is no more >>>>>>>>>>> visible >>>>>>>>>>> performance. >>>>>>>>>>> >>>>>>>>>>> Does this look right? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Kashyap >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Javier González Nicolini >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> Anshu Shukla >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nikolaos Romanos Katsipoulakis, >>>> University of Pittsburgh, PhD student >>>> >>> >>> >> >> >> -- >> Nikolaos Romanos Katsipoulakis, >> University of Pittsburgh, PhD student >> > >
