A better way would be use Mesos (and quite possibly Yarn in 1.0.0). That will allow you to add nodes on the fly & leverage it for Spark. Frankly Standalone mode is not meant to handle those issues. That said we use our deployment tool as stopping the cluster for adding nodes is not really an issue at the moment.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Sat, May 17, 2014 at 9:05 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Thanks for the info about adding/removing nodes dynamically. That's > valuable. > > 2014년 5월 16일 금요일, Akhil Das<ak...@sigmoidanalytics.com>님이 작성한 메시지: > > Hi Han :) >> >> 1. Is there a way to automatically re-spawn spark workers? We've >> situations where executor OOM causes worker process to be DEAD and it does >> not came back automatically. >> >> => Yes. You can either add OOM killer >> exception<http://backdrift.org/how-to-create-oom-killer-exceptions> on >> all of your Spark processes. Or you can have a cronjob which will keep >> monitoring your worker processes and if they goes down the cronjob will >> bring it back. >> >> 2. How to dynamically add (or remove) some worker machines to (from) >> the cluster? We'd like to leverage the auto-scaling group in EC2 for >> example. >> >> => You can add/remove worker nodes on the fly by spawning a new machine >> and then adding that machine's ip address in the master node then rsyncing >> the spark directory with all worker machines including the one you added. >> Then simply you can use the *start-all.sh* script inside the master node >> to bring up the new worker in action. For removing a worker machine from >> master can be done in the same way, you have to remove the workers IP >> address from the masters *slaves *file and then you can restart your >> slaves and that will get your worker removed. >> >> >> FYI, we have a deployment tool (a web-based UI) that we use for internal >> purposes, it is build on top of the spark-ec2 script (with some changes) >> and it has a module for adding/removing worker nodes on the fly. It looks >> like the attached screenshot. If you want i can give you some access. >> >> Thanks >> Best Regards >> >> >> On Wed, May 14, 2014 at 9:52 PM, Han JU <ju.han.fe...@gmail.com> wrote: >> >>> Hi all, >>> >>> Just 2 questions: >>> >>> 1. Is there a way to automatically re-spawn spark workers? We've >>> situations where executor OOM causes worker process to be DEAD and it does >>> not came back automatically. >>> >>> 2. How to dynamically add (or remove) some worker machines to (from) >>> the cluster? We'd like to leverage the auto-scaling group in EC2 for >>> example. >>> >>> We're using spark-standalone. >>> >>> Thanks a lot. >>> >>> -- >>> *JU Han* >>> >>> Data Engineer @ Botify.com >>> >>> +33 0619608888 >>> >> >>