Re: Approach to parallelism

Kashyap Mhaisekar Mon, 05 Oct 2015 10:06:20 -0700

Michael,
You referred to a link on Gumbo on wassermelon. Did you use it? Are there
docs/api/examples that I can use?


Thanks
Kashyap

On Mon, Oct 5, 2015 at 11:05 AM, Kashyap Mhaisekar <[email protected]>
wrote:

> I changed it back before I could look at capacity. Will check again.
> My use case is compute bound for sure because what happens is one tuple to
> my spout creates 20 more tuples downstream with each one iterating for 500
> times. And I get tuples at high TPS rates.
>
> On Mon, Oct 5, 2015 at 10:34 AM, Nick R. Katsipoulakis <
> [email protected]> wrote:
>
>> Well, did you see the capacity of your bolts also increasing? Because if
>> the former is true, then that means that your use-case is compute-bound and
>> you end up having increased latency for your workers.
>>
>> Nick
>>
>> On Mon, Oct 5, 2015 at 11:32 AM, Kashyap Mhaisekar <[email protected]>
>> wrote:
>>
>>> After dropping to 1 worker per node, my latency was bad.... Probably my
>>> use case was different.
>>>
>>> On Mon, Oct 5, 2015 at 10:29 AM, Nick R. Katsipoulakis <
>>> [email protected]> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> This is a really interesting discussion. I am also trying to fine-tune
>>>> the performance of my cluster and especially my end-to-end-latency which
>>>> ranges from 200-1200 msec for a topology with 2 spouts (each one with 2k
>>>> tuples per second input rate) and 3 bolts. My cluster consists of 3
>>>> zookeeper nodes (1 shared with nimbus) and 6 supervisor nodes, all of them
>>>> being AWS m4.xlarge instances.
>>>>
>>>> I am pretty sure that the latency I am experiencing is ridiculous and I
>>>> currently have no ideas what to do to improve that. I have 3 workers per
>>>> node, which I will drop it to one worker per node after this discussion and
>>>> see if I have better results.
>>>>
>>>> Thanks,
>>>> Nick
>>>>
>>>> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <[email protected]
>>>> > wrote:
>>>>
>>>>> Anshu,
>>>>> My methodology was as follows. Since the true parallelism of a machine
>>>>> is the the no. of cores, I set the workers equal to no. of cores. (5 in my
>>>>> case). That being said, since we have 32 GB per box, we usually leave 50%
>>>>> off leaving us 16 GB spread across 5 machines. Hence we set the worker 
>>>>> heap
>>>>> at 3g.
>>>>>
>>>>> This was before Javiers and Michaels suggestion of keeping one JVM per
>>>>> node...
>>>>>
>>>>> Ours is a single topology running on the boxes and hence I would be
>>>>> changing it to one JVM (worker) per box and rerunning.
>>>>>
>>>>> Thanks
>>>>> Kashyap
>>>>>
>>>>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Sorry for reposting !! Any suggestions Please .
>>>>>>
>>>>>> Just one query How we can map -
>>>>>> *1-no of workers to number of  cores *
>>>>>> *2-no of slots on one machine to number of cores over that machine*
>>>>>>
>>>>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Javier,
>>>>>>>
>>>>>>> Gotcha, I am seeing the same thing, and I see a ton of worker
>>>>>>> restarts as well.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> --John
>>>>>>>
>>>>>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I don't have numbers, but I did see a very noticeable degradation
>>>>>>>> of throughput and latency when using multiple workers per node with the
>>>>>>>> same topology.
>>>>>>>> On Oct 5, 2015 7:25 AM, "John Yost" <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Everyone,
>>>>>>>>>
>>>>>>>>> I am curious--are there any benchmark numbers that demonstrate how
>>>>>>>>> much better one worker per node is?  The reason I ask is that I may 
>>>>>>>>> need to
>>>>>>>>> double up the workers on my cluster and I was wondering how much of a
>>>>>>>>> throughput hit I may take from having two workers per node.
>>>>>>>>>
>>>>>>>>> Any info would be very much appreciated--thanks! :)
>>>>>>>>>
>>>>>>>>> --John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I would suggest sticking with a single worker per machine. It
>>>>>>>>>> makes memory allocation easier and it makes inter-component 
>>>>>>>>>> communication
>>>>>>>>>> much more efficient. Configure the executors with your parallelism 
>>>>>>>>>> hints to
>>>>>>>>>> take advantage of all your availabe CPU cores.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> JG
>>>>>>>>>>
>>>>>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I was trying to come up with an approach to evaluate the
>>>>>>>>>>> parallelism needed for a topology.
>>>>>>>>>>>
>>>>>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my
>>>>>>>>>>> topology has one spout and 5 bolts.
>>>>>>>>>>>
>>>>>>>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per
>>>>>>>>>>> machine ie 40 workers over all)
>>>>>>>>>>> 2. Each worker spawns one executor per component per worker, it
>>>>>>>>>>> translates to 6 executors per worker which is 40x6= 240 executors.
>>>>>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave
>>>>>>>>>>> parallelism hint  at 40 (total workers), else increase parallelism 
>>>>>>>>>>> hint
>>>>>>>>>>> beyond 40 till you hit a number beyond which there is no more 
>>>>>>>>>>> visible
>>>>>>>>>>> performance.
>>>>>>>>>>>
>>>>>>>>>>> Does this look right?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Kashyap
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Javier González Nicolini
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anshu Shukla
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nikolaos Romanos Katsipoulakis,
>>>> University of Pittsburgh, PhD student
>>>>
>>>
>>>
>>
>>
>> --
>> Nikolaos Romanos Katsipoulakis,
>> University of Pittsburgh, PhD student
>>
>
>

Re: Approach to parallelism

Reply via email to