Thank you guys for the detailed answer. Akhil, yes I would like to have a try of your tool. Is it open-sourced?
2014-05-17 17:55 GMT+02:00 Mayur Rustagi <mayur.rust...@gmail.com>: > A better way would be use Mesos (and quite possibly Yarn in 1.0.0). > That will allow you to add nodes on the fly & leverage it for Spark. > Frankly Standalone mode is not meant to handle those issues. That said we > use our deployment tool as stopping the cluster for adding nodes is not > really an issue at the moment. > > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Sat, May 17, 2014 at 9:05 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks for the info about adding/removing nodes dynamically. That's >> valuable. >> >> 2014년 5월 16일 금요일, Akhil Das<ak...@sigmoidanalytics.com>님이 작성한 메시지: >> >> Hi Han :) >>> >>> 1. Is there a way to automatically re-spawn spark workers? We've >>> situations where executor OOM causes worker process to be DEAD and it does >>> not came back automatically. >>> >>> => Yes. You can either add OOM killer >>> exception<http://backdrift.org/how-to-create-oom-killer-exceptions> on >>> all of your Spark processes. Or you can have a cronjob which will keep >>> monitoring your worker processes and if they goes down the cronjob will >>> bring it back. >>> >>> 2. How to dynamically add (or remove) some worker machines to (from) >>> the cluster? We'd like to leverage the auto-scaling group in EC2 for >>> example. >>> >>> => You can add/remove worker nodes on the fly by spawning a new machine >>> and then adding that machine's ip address in the master node then rsyncing >>> the spark directory with all worker machines including the one you added. >>> Then simply you can use the *start-all.sh* script inside the master >>> node to bring up the new worker in action. For removing a worker machine >>> from master can be done in the same way, you have to remove the workers IP >>> address from the masters *slaves *file and then you can restart your >>> slaves and that will get your worker removed. >>> >>> >>> FYI, we have a deployment tool (a web-based UI) that we use for internal >>> purposes, it is build on top of the spark-ec2 script (with some changes) >>> and it has a module for adding/removing worker nodes on the fly. It looks >>> like the attached screenshot. If you want i can give you some access. >>> >>> Thanks >>> Best Regards >>> >>> >>> On Wed, May 14, 2014 at 9:52 PM, Han JU <ju.han.fe...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> Just 2 questions: >>>> >>>> 1. Is there a way to automatically re-spawn spark workers? We've >>>> situations where executor OOM causes worker process to be DEAD and it does >>>> not came back automatically. >>>> >>>> 2. How to dynamically add (or remove) some worker machines to (from) >>>> the cluster? We'd like to leverage the auto-scaling group in EC2 for >>>> example. >>>> >>>> We're using spark-standalone. >>>> >>>> Thanks a lot. >>>> >>>> -- >>>> *JU Han* >>>> >>>> Data Engineer @ Botify.com >>>> >>>> +33 0619608888 >>>> >>> >>> > -- *JU Han* Data Engineer @ Botify.com +33 0619608888