Yeah, the Spark on EMR bootstrap scripts referenced
here<http://aws.amazon.com/articles/4926593393724923>need some
polishing. I had a lot of trouble just getting through that
tutorial. And yes, the version of Spark they're using is 0.8.1.


On Fri, Feb 28, 2014 at 2:39 PM, Aureliano Buendia <[email protected]>wrote:

> Unfortunately, that script is not under active maintenance. Given that
> spark is getting accelerated release cycles, solutions like this get
> outdated quickly.
>
>
> On Fri, Feb 28, 2014 at 7:36 PM, Mayur Rustagi <[email protected]>wrote:
>
>> Thr is a talk to install spark on Amazon ( not sure if its updated for
>> 0.9.0).
>> http://www.youtube.com/watch?v=G0lSWUqyOhw
>> In this case the bootstrap script will run on the new slave when it comes
>> up. I am not sure how clean & production quality this is. He seems to be
>> leveraging spot instances where this needs to be done properly.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Fri, Feb 28, 2014 at 10:52 AM, Aureliano Buendia <[email protected]
>> > wrote:
>>
>>> Also, in this talk http://www.youtube.com/watch?v=OhpjgaBVUtU on using
>>> spark streaming in production, the author seems to have missed the topic of
>>> how to manage cloud instances.
>>>
>>>
>>> On Fri, Feb 28, 2014 at 6:48 PM, Aureliano Buendia <[email protected]
>>> > wrote:
>>>
>>>> What's the updated way of deploying spark streaming apps on EMR? Using
>>>> YARN?
>>>>
>>>> There are some out of date solutions like
>>>> https://github.com/ianoc/SparkEMRBootstrap which setup mesos on EMR. I
>>>> wonder if this can be simplified by spark 0.9.
>>>>
>>>> Spark-ec2 comes with a considerable amount of configuration, and some
>>>> useful utilities like deploy to workers, porting it to a managed service
>>>> such as EMR is not as trivial as it might seem to be.
>>>>
>>>>
>>>> On Fri, Feb 28, 2014 at 6:19 PM, Mayur Rustagi <[email protected]
>>>> > wrote:
>>>>
>>>>> I think what you are looking for is sort of a managed service ala EMR
>>>>> or Qubole. Spark-ec2 is just software to boot up machines & integrate them
>>>>> together using Whirr.
>>>>> I agree a managed service for Streaming would be really useful.
>>>>> Regards
>>>>> Mayur
>>>>>
>>>>> Mayur Rustagi
>>>>> Ph: +1 (760) 203 3257
>>>>> http://www.sigmoidanalytics.com
>>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 28, 2014 at 8:50 AM, Aureliano Buendia <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Another subject that was not that important in spark, but it could be
>>>>>> crucial for 24/7 spark streaming, is reconstruction of lost nodes. By 
>>>>>> that,
>>>>>> I do not mean lost data reconstruction by self healing, but bringing up 
>>>>>> new
>>>>>> ec2 instances once they die for whatever reasons. Is this also supported 
>>>>>> in
>>>>>> spark ec2?
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 28, 2014 at 2:24 AM, Tathagata Das <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Yes, the default spark EC2 cluster runs the standalone deploy mode.
>>>>>>> Since Spark 0.9, the standalone deploy mode allows you to launch the 
>>>>>>> driver
>>>>>>> app within the cluster itself and automatically restart it if it fails. 
>>>>>>> You
>>>>>>> can read about launching your app inside the cluster 
>>>>>>> here<http://spark.incubator.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster>.
>>>>>>> Using this you can launch your streaming app as well.
>>>>>>>
>>>>>>> TD
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 27, 2014 at 5:35 PM, Aureliano Buendia <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> How about spark stream app itself? Does the ec2 script also provide
>>>>>>>> means for daemonizing and monitoring spark streaming apps which are
>>>>>>>> supposed to run 24/7? If not, any suggestions for how to do this?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Zookeeper is automatically set up in the cluster as Spark uses
>>>>>>>>> Zookeeper. However, you have to setup your own input source like 
>>>>>>>>> Kafka or
>>>>>>>>> Flume.
>>>>>>>>>
>>>>>>>>> TD
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Feb 27, 2014 at 10:32 AM, Aureliano Buendia <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes! Spark streaming programs are just like any spark program
>>>>>>>>>>> and so any ec2 cluster setup using the spark-ec2 scripts can be 
>>>>>>>>>>> used to run
>>>>>>>>>>> spark streaming programs as well.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Great. Does it come with any input source support as well? (Eg
>>>>>>>>>> kafka requires setting up zookeeper).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Does the ec2 support for spark 0.9 also include spark
>>>>>>>>>>>> streaming? If not, is there an equivalent?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to