Re: Worker re-spawn and dynamic node joining

Han JU Tue, 20 May 2014 01:40:32 -0700

Thank you guys for the detailed answer.
Akhil, yes I would like to have a try of your tool. Is it open-sourced?



2014-05-17 17:55 GMT+02:00 Mayur Rustagi <mayur.rust...@gmail.com>:

> A better way would be use Mesos (and quite possibly Yarn in 1.0.0).
> That will allow you to add nodes on the fly & leverage it for Spark.
> Frankly Standalone mode is not meant to handle those issues. That said we
> use our deployment tool as stopping the cluster for adding nodes is not
> really an issue at the moment.
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Sat, May 17, 2014 at 9:05 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Thanks for the info about adding/removing nodes dynamically. That's
>> valuable.
>>
>> 2014년 5월 16일 금요일, Akhil Das<ak...@sigmoidanalytics.com>님이 작성한 메시지:
>>
>>  Hi Han :)
>>>
>>> 1. Is there a way to automatically re-spawn spark workers? We've
>>> situations where executor OOM causes worker process to be DEAD and it does
>>> not came back automatically.
>>>
>>> => Yes. You can either add OOM killer 
>>> exception<http://backdrift.org/how-to-create-oom-killer-exceptions> on
>>> all of your Spark processes. Or you can have a cronjob which will keep
>>> monitoring your worker processes and if they goes down the cronjob will
>>> bring it back.
>>>
>>>   2. How to dynamically add (or remove) some worker machines to (from)
>>> the cluster? We'd like to leverage the auto-scaling group in EC2 for
>>> example.
>>>
>>> => You can add/remove worker nodes on the fly by spawning a new machine
>>> and then adding that machine's ip address in the master node then rsyncing
>>> the spark directory with all worker machines including the one you added.
>>> Then simply you can use the *start-all.sh* script inside the master
>>> node to bring up the new worker in action. For removing a worker machine
>>> from master can be done in the same way, you have to remove the workers IP
>>> address from the masters *slaves *file and then you can restart your
>>> slaves and that will get your worker removed.
>>>
>>>
>>> FYI, we have a deployment tool (a web-based UI) that we use for internal
>>> purposes, it is build on top of the spark-ec2 script (with some changes)
>>> and it has a module for adding/removing worker nodes on the fly. It looks
>>> like the attached screenshot. If you want i can give you some access.
>>>
>>> Thanks
>>> Best Regards
>>>
>>>
>>> On Wed, May 14, 2014 at 9:52 PM, Han JU <ju.han.fe...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Just 2 questions:
>>>>
>>>>   1. Is there a way to automatically re-spawn spark workers? We've
>>>> situations where executor OOM causes worker process to be DEAD and it does
>>>> not came back automatically.
>>>>
>>>>   2. How to dynamically add (or remove) some worker machines to (from)
>>>> the cluster? We'd like to leverage the auto-scaling group in EC2 for
>>>> example.
>>>>
>>>> We're using spark-standalone.
>>>>
>>>> Thanks a lot.
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> Data Engineer @ Botify.com
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>


-- 
*JU Han*

Data Engineer @ Botify.com

+33 0619608888

Re: Worker re-spawn and dynamic node joining

Reply via email to