Re: Approach to parallelism

Abe Oppenheim Mon, 05 Oct 2015 06:08:26 -0700

Hi All,

Any tips for determining the heap size for node's single JVM?


> On Oct 5, 2015, at 5:25 AM, anshu shukla <[email protected]> wrote:
> 
> I was also   facing the  same issue of balancing the latency and tradeoff 
> .Got a nice dicussion here .
> 
> Just one query How we can map -
> 1-no of workers to number of  cores 
> 2-no of slots on one machine to number of cores over that machine
> 
>> On Sun, Oct 4, 2015 at 2:00 AM, Kashyap Mhaisekar <[email protected]> 
>> wrote:
>> Thanks guys.
>> So when you say one jvm per node, then it means that one port say 6700 on 
>> each machine and for that we assign high amount of heap?
>> So in this case, it translates into 5 (5 machines) workers with atleast 4g 
>> heap and all bolts spread across these 5 workers?
>> 
>> Is there a guideline on how should I arrive at parallelism hints of bolts 
>> themselves? I mean, when complete latency at spout is higher but execute 
>> latencies at bolts are very very small...
>> 
>> Will jump into the links right away.
>> 
>> Thanks
>> Kashyap
>> 
>>> On Oct 3, 2015 12:00 PM, "Michael Vogiatzis" <[email protected]> 
>>> wrote:
>>> I will agree with Javier, one JVM per node should eliminate the number of 
>>> messages that need to be serialized.
>>> 
>>> For tuning Storm topologies you may find the following links useful:
>>> 
>>> https://gist.github.com/mrflip/5958028
>>> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/
>>> Talk:
>>> http://demo.ooyala.com/player.html?width=640&height=360&embedCode=Q1eXg5NzpKqUUzBm5WTIb6bXuiWHrRMi&videoPcode=9waHc6zKpbJKt9byfS7l4O4sn7Qn
>>> 
>>> Cheers,
>>> Michael
>>> @mvogiatzis
>>> 
>>> 
>>>> On Sat, 3 Oct 2015 at 14:04 Javier Gonzalez <[email protected]> wrote:
>>>> I would suggest sticking with a single worker per machine. It makes memory 
>>>> allocation easier and it makes inter-component communication much more 
>>>> efficient. Configure the executors with your parallelism hints to take 
>>>> advantage of all your availabe CPU cores.
>>>> 
>>>> Regards,
>>>> JG
>>>> 
>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <[email protected]> 
>>>>> wrote:
>>>>> Hi,
>>>>> I was trying to come up with an approach to evaluate the parallelism 
>>>>> needed for a topology.
>>>>> 
>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology has 
>>>>> one spout and 5 bolts.
>>>>> 
>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per machine 
>>>>> ie 40 workers over all)
>>>>> 2. Each worker spawns one executor per component per worker, it 
>>>>> translates to 6 executors per worker which is 40x6= 240 executors.
>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave parallelism 
>>>>> hint  at 40 (total workers), else increase parallelism hint beyond 40 
>>>>> till you hit a number beyond which there is no more visible performance.
>>>>> 
>>>>> Does this look right?
>>>>> 
>>>>> Thanks
>>>>> Kashyap
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Javier González Nicolini
>>> 
>>> -- 
>>> Michael Vogiatzis
>>> Twitter: @mvogiatzis
>>> http://micvog.com/
> 
> 
> 
> -- 
> Thanks & Regards,
> Anshu Shukla

Re: Approach to parallelism

Reply via email to