Also, in this talk http://www.youtube.com/watch?v=OhpjgaBVUtU on using spark streaming in production, the author seems to have missed the topic of how to manage cloud instances.
On Fri, Feb 28, 2014 at 6:48 PM, Aureliano Buendia <[email protected]>wrote: > What's the updated way of deploying spark streaming apps on EMR? Using > YARN? > > There are some out of date solutions like > https://github.com/ianoc/SparkEMRBootstrap which setup mesos on EMR. I > wonder if this can be simplified by spark 0.9. > > Spark-ec2 comes with a considerable amount of configuration, and some > useful utilities like deploy to workers, porting it to a managed service > such as EMR is not as trivial as it might seem to be. > > > On Fri, Feb 28, 2014 at 6:19 PM, Mayur Rustagi <[email protected]>wrote: > >> I think what you are looking for is sort of a managed service ala EMR or >> Qubole. Spark-ec2 is just software to boot up machines & integrate them >> together using Whirr. >> I agree a managed service for Streaming would be really useful. >> Regards >> Mayur >> >> Mayur Rustagi >> Ph: +1 (760) 203 3257 >> http://www.sigmoidanalytics.com >> @mayur_rustagi <https://twitter.com/mayur_rustagi> >> >> >> >> On Fri, Feb 28, 2014 at 8:50 AM, Aureliano Buendia >> <[email protected]>wrote: >> >>> Another subject that was not that important in spark, but it could be >>> crucial for 24/7 spark streaming, is reconstruction of lost nodes. By that, >>> I do not mean lost data reconstruction by self healing, but bringing up new >>> ec2 instances once they die for whatever reasons. Is this also supported in >>> spark ec2? >>> >>> >>> On Fri, Feb 28, 2014 at 2:24 AM, Tathagata Das < >>> [email protected]> wrote: >>> >>>> Yes, the default spark EC2 cluster runs the standalone deploy mode. >>>> Since Spark 0.9, the standalone deploy mode allows you to launch the driver >>>> app within the cluster itself and automatically restart it if it fails. You >>>> can read about launching your app inside the cluster >>>> here<http://spark.incubator.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster>. >>>> Using this you can launch your streaming app as well. >>>> >>>> TD >>>> >>>> >>>> On Thu, Feb 27, 2014 at 5:35 PM, Aureliano Buendia < >>>> [email protected]> wrote: >>>> >>>>> How about spark stream app itself? Does the ec2 script also provide >>>>> means for daemonizing and monitoring spark streaming apps which are >>>>> supposed to run 24/7? If not, any suggestions for how to do this? >>>>> >>>>> >>>>> On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das < >>>>> [email protected]> wrote: >>>>> >>>>>> Zookeeper is automatically set up in the cluster as Spark uses >>>>>> Zookeeper. However, you have to setup your own input source like Kafka or >>>>>> Flume. >>>>>> >>>>>> TD >>>>>> >>>>>> >>>>>> On Thu, Feb 27, 2014 at 10:32 AM, Aureliano Buendia < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Yes! Spark streaming programs are just like any spark program and >>>>>>>> so any ec2 cluster setup using the spark-ec2 scripts can be used to run >>>>>>>> spark streaming programs as well. >>>>>>>> >>>>>>> >>>>>>> Great. Does it come with any input source support as well? (Eg kafka >>>>>>> requires setting up zookeeper). >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does the ec2 support for spark 0.9 also include spark streaming? >>>>>>>>> If not, is there an equivalent? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
