Yeah, the Spark on EMR bootstrap scripts referenced here<http://aws.amazon.com/articles/4926593393724923>need some polishing. I had a lot of trouble just getting through that tutorial. And yes, the version of Spark they're using is 0.8.1.
On Fri, Feb 28, 2014 at 2:39 PM, Aureliano Buendia <[email protected]>wrote: > Unfortunately, that script is not under active maintenance. Given that > spark is getting accelerated release cycles, solutions like this get > outdated quickly. > > > On Fri, Feb 28, 2014 at 7:36 PM, Mayur Rustagi <[email protected]>wrote: > >> Thr is a talk to install spark on Amazon ( not sure if its updated for >> 0.9.0). >> http://www.youtube.com/watch?v=G0lSWUqyOhw >> In this case the bootstrap script will run on the new slave when it comes >> up. I am not sure how clean & production quality this is. He seems to be >> leveraging spot instances where this needs to be done properly. >> >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoidanalytics.com >> @mayur_rustagi <https://twitter.com/mayur_rustagi> >> >> >> >> On Fri, Feb 28, 2014 at 10:52 AM, Aureliano Buendia <[email protected] >> > wrote: >> >>> Also, in this talk http://www.youtube.com/watch?v=OhpjgaBVUtU on using >>> spark streaming in production, the author seems to have missed the topic of >>> how to manage cloud instances. >>> >>> >>> On Fri, Feb 28, 2014 at 6:48 PM, Aureliano Buendia <[email protected] >>> > wrote: >>> >>>> What's the updated way of deploying spark streaming apps on EMR? Using >>>> YARN? >>>> >>>> There are some out of date solutions like >>>> https://github.com/ianoc/SparkEMRBootstrap which setup mesos on EMR. I >>>> wonder if this can be simplified by spark 0.9. >>>> >>>> Spark-ec2 comes with a considerable amount of configuration, and some >>>> useful utilities like deploy to workers, porting it to a managed service >>>> such as EMR is not as trivial as it might seem to be. >>>> >>>> >>>> On Fri, Feb 28, 2014 at 6:19 PM, Mayur Rustagi <[email protected] >>>> > wrote: >>>> >>>>> I think what you are looking for is sort of a managed service ala EMR >>>>> or Qubole. Spark-ec2 is just software to boot up machines & integrate them >>>>> together using Whirr. >>>>> I agree a managed service for Streaming would be really useful. >>>>> Regards >>>>> Mayur >>>>> >>>>> Mayur Rustagi >>>>> Ph: +1 (760) 203 3257 >>>>> http://www.sigmoidanalytics.com >>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>>>> >>>>> >>>>> >>>>> On Fri, Feb 28, 2014 at 8:50 AM, Aureliano Buendia < >>>>> [email protected]> wrote: >>>>> >>>>>> Another subject that was not that important in spark, but it could be >>>>>> crucial for 24/7 spark streaming, is reconstruction of lost nodes. By >>>>>> that, >>>>>> I do not mean lost data reconstruction by self healing, but bringing up >>>>>> new >>>>>> ec2 instances once they die for whatever reasons. Is this also supported >>>>>> in >>>>>> spark ec2? >>>>>> >>>>>> >>>>>> On Fri, Feb 28, 2014 at 2:24 AM, Tathagata Das < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Yes, the default spark EC2 cluster runs the standalone deploy mode. >>>>>>> Since Spark 0.9, the standalone deploy mode allows you to launch the >>>>>>> driver >>>>>>> app within the cluster itself and automatically restart it if it fails. >>>>>>> You >>>>>>> can read about launching your app inside the cluster >>>>>>> here<http://spark.incubator.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster>. >>>>>>> Using this you can launch your streaming app as well. >>>>>>> >>>>>>> TD >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 27, 2014 at 5:35 PM, Aureliano Buendia < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> How about spark stream app itself? Does the ec2 script also provide >>>>>>>> means for daemonizing and monitoring spark streaming apps which are >>>>>>>> supposed to run 24/7? If not, any suggestions for how to do this? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Zookeeper is automatically set up in the cluster as Spark uses >>>>>>>>> Zookeeper. However, you have to setup your own input source like >>>>>>>>> Kafka or >>>>>>>>> Flume. >>>>>>>>> >>>>>>>>> TD >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 27, 2014 at 10:32 AM, Aureliano Buendia < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Yes! Spark streaming programs are just like any spark program >>>>>>>>>>> and so any ec2 cluster setup using the spark-ec2 scripts can be >>>>>>>>>>> used to run >>>>>>>>>>> spark streaming programs as well. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Great. Does it come with any input source support as well? (Eg >>>>>>>>>> kafka requires setting up zookeeper). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Does the ec2 support for spark 0.9 also include spark >>>>>>>>>>>> streaming? If not, is there an equivalent? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
