Re: Approach to parallelism

Kashyap Mhaisekar Sat, 03 Oct 2015 13:31:41 -0700

Thanks guys.
So when you say one jvm per node, then it means that one port say 6700 on
each machine and for that we assign high amount of heap?
So in this case, it translates into 5 (5 machines) workers with atleast 4g
heap and all bolts spread across these 5 workers?


Is there a guideline on how should I arrive at parallelism hints of bolts
themselves? I mean, when complete latency at spout is higher but execute
latencies at bolts are very very small...

Will jump into the links right away.

Thanks
Kashyap
On Oct 3, 2015 12:00 PM, "Michael Vogiatzis" <[email protected]>
wrote:

> I will agree with Javier, one JVM per node should eliminate the number of
> messages that need to be serialized.
>
> For tuning Storm topologies you may find the following links useful:
>
> https://gist.github.com/mrflip/5958028
> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/
> Talk:
>
> http://demo.ooyala.com/player.html?width=640&height=360&embedCode=Q1eXg5NzpKqUUzBm5WTIb6bXuiWHrRMi&videoPcode=9waHc6zKpbJKt9byfS7l4O4sn7Qn
>
> Cheers,
> Michael
> @mvogiatzis <https://twitter.com/mvogiatzis>
>
>
> On Sat, 3 Oct 2015 at 14:04 Javier Gonzalez <[email protected]> wrote:
>
>> I would suggest sticking with a single worker per machine. It makes
>> memory allocation easier and it makes inter-component communication much
>> more efficient. Configure the executors with your parallelism hints to take
>> advantage of all your availabe CPU cores.
>>
>> Regards,
>> JG
>>
>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <[email protected]>
>> wrote:
>>
>>> Hi,
>>> I was trying to come up with an approach to evaluate the parallelism
>>> needed for a topology.
>>>
>>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology has
>>> one spout and 5 bolts.
>>>
>>> 1. Define one worker port per CPU to start off. (= 8 workers per machine
>>> ie 40 workers over all)
>>> 2. Each worker spawns one executor per component per worker, it
>>> translates to 6 executors per worker which is 40x6= 240 executors.
>>> 3. Of this, if the bolt logic is CPU intensive, then leave parallelism
>>> hint  at 40 (total workers), else increase parallelism hint beyond 40 till
>>> you hit a number beyond which there is no more visible performance.
>>>
>>> Does this look right?
>>>
>>> Thanks
>>> Kashyap
>>>
>>
>>
>>
>> --
>> Javier González Nicolini
>>
> --
> Michael Vogiatzis
> Twitter: @mvogiatzis
> http://micvog.com/
>

Re: Approach to parallelism

Reply via email to