Thr is a talk to install spark on Amazon ( not sure if its updated for 0.9.0). http://www.youtube.com/watch?v=G0lSWUqyOhw In this case the bootstrap script will run on the new slave when it comes up. I am not sure how clean & production quality this is. He seems to be leveraging spot instances where this needs to be done properly.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Feb 28, 2014 at 10:52 AM, Aureliano Buendia <[email protected]>wrote: > Also, in this talk http://www.youtube.com/watch?v=OhpjgaBVUtU on using > spark streaming in production, the author seems to have missed the topic of > how to manage cloud instances. > > > On Fri, Feb 28, 2014 at 6:48 PM, Aureliano Buendia > <[email protected]>wrote: > >> What's the updated way of deploying spark streaming apps on EMR? Using >> YARN? >> >> There are some out of date solutions like >> https://github.com/ianoc/SparkEMRBootstrap which setup mesos on EMR. I >> wonder if this can be simplified by spark 0.9. >> >> Spark-ec2 comes with a considerable amount of configuration, and some >> useful utilities like deploy to workers, porting it to a managed service >> such as EMR is not as trivial as it might seem to be. >> >> >> On Fri, Feb 28, 2014 at 6:19 PM, Mayur Rustagi >> <[email protected]>wrote: >> >>> I think what you are looking for is sort of a managed service ala EMR or >>> Qubole. Spark-ec2 is just software to boot up machines & integrate them >>> together using Whirr. >>> I agree a managed service for Streaming would be really useful. >>> Regards >>> Mayur >>> >>> Mayur Rustagi >>> Ph: +1 (760) 203 3257 >>> http://www.sigmoidanalytics.com >>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>> >>> >>> >>> On Fri, Feb 28, 2014 at 8:50 AM, Aureliano Buendia <[email protected] >>> > wrote: >>> >>>> Another subject that was not that important in spark, but it could be >>>> crucial for 24/7 spark streaming, is reconstruction of lost nodes. By that, >>>> I do not mean lost data reconstruction by self healing, but bringing up new >>>> ec2 instances once they die for whatever reasons. Is this also supported in >>>> spark ec2? >>>> >>>> >>>> On Fri, Feb 28, 2014 at 2:24 AM, Tathagata Das < >>>> [email protected]> wrote: >>>> >>>>> Yes, the default spark EC2 cluster runs the standalone deploy mode. >>>>> Since Spark 0.9, the standalone deploy mode allows you to launch the >>>>> driver >>>>> app within the cluster itself and automatically restart it if it fails. >>>>> You >>>>> can read about launching your app inside the cluster >>>>> here<http://spark.incubator.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster>. >>>>> Using this you can launch your streaming app as well. >>>>> >>>>> TD >>>>> >>>>> >>>>> On Thu, Feb 27, 2014 at 5:35 PM, Aureliano Buendia < >>>>> [email protected]> wrote: >>>>> >>>>>> How about spark stream app itself? Does the ec2 script also provide >>>>>> means for daemonizing and monitoring spark streaming apps which are >>>>>> supposed to run 24/7? If not, any suggestions for how to do this? >>>>>> >>>>>> >>>>>> On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Zookeeper is automatically set up in the cluster as Spark uses >>>>>>> Zookeeper. However, you have to setup your own input source like Kafka >>>>>>> or >>>>>>> Flume. >>>>>>> >>>>>>> TD >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 27, 2014 at 10:32 AM, Aureliano Buendia < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Yes! Spark streaming programs are just like any spark program and >>>>>>>>> so any ec2 cluster setup using the spark-ec2 scripts can be used to >>>>>>>>> run >>>>>>>>> spark streaming programs as well. >>>>>>>>> >>>>>>>> >>>>>>>> Great. Does it come with any input source support as well? (Eg >>>>>>>> kafka requires setting up zookeeper). >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Does the ec2 support for spark 0.9 also include spark streaming? >>>>>>>>>> If not, is there an equivalent? >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
