Also, in this talk http://www.youtube.com/watch?v=OhpjgaBVUtU on using
spark streaming in production, the author seems to have missed the topic of
how to manage cloud instances.


On Fri, Feb 28, 2014 at 6:48 PM, Aureliano Buendia <[email protected]>wrote:

> What's the updated way of deploying spark streaming apps on EMR? Using
> YARN?
>
> There are some out of date solutions like
> https://github.com/ianoc/SparkEMRBootstrap which setup mesos on EMR. I
> wonder if this can be simplified by spark 0.9.
>
> Spark-ec2 comes with a considerable amount of configuration, and some
> useful utilities like deploy to workers, porting it to a managed service
> such as EMR is not as trivial as it might seem to be.
>
>
> On Fri, Feb 28, 2014 at 6:19 PM, Mayur Rustagi <[email protected]>wrote:
>
>> I think what you are looking for is sort of a managed service ala EMR or
>> Qubole. Spark-ec2 is just software to boot up machines & integrate them
>> together using Whirr.
>> I agree a managed service for Streaming would be really useful.
>> Regards
>> Mayur
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Fri, Feb 28, 2014 at 8:50 AM, Aureliano Buendia 
>> <[email protected]>wrote:
>>
>>> Another subject that was not that important in spark, but it could be
>>> crucial for 24/7 spark streaming, is reconstruction of lost nodes. By that,
>>> I do not mean lost data reconstruction by self healing, but bringing up new
>>> ec2 instances once they die for whatever reasons. Is this also supported in
>>> spark ec2?
>>>
>>>
>>> On Fri, Feb 28, 2014 at 2:24 AM, Tathagata Das <
>>> [email protected]> wrote:
>>>
>>>> Yes, the default spark EC2 cluster runs the standalone deploy mode.
>>>> Since Spark 0.9, the standalone deploy mode allows you to launch the driver
>>>> app within the cluster itself and automatically restart it if it fails. You
>>>> can read about launching your app inside the cluster 
>>>> here<http://spark.incubator.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster>.
>>>> Using this you can launch your streaming app as well.
>>>>
>>>> TD
>>>>
>>>>
>>>> On Thu, Feb 27, 2014 at 5:35 PM, Aureliano Buendia <
>>>> [email protected]> wrote:
>>>>
>>>>> How about spark stream app itself? Does the ec2 script also provide
>>>>> means for daemonizing and monitoring spark streaming apps which are
>>>>> supposed to run 24/7? If not, any suggestions for how to do this?
>>>>>
>>>>>
>>>>> On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Zookeeper is automatically set up in the cluster as Spark uses
>>>>>> Zookeeper. However, you have to setup your own input source like Kafka or
>>>>>> Flume.
>>>>>>
>>>>>> TD
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 27, 2014 at 10:32 AM, Aureliano Buendia <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Yes! Spark streaming programs are just like any spark program and
>>>>>>>> so any ec2 cluster setup using the spark-ec2 scripts can be used to run
>>>>>>>> spark streaming programs as well.
>>>>>>>>
>>>>>>>
>>>>>>> Great. Does it come with any input source support as well? (Eg kafka
>>>>>>> requires setting up zookeeper).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Does the ec2 support for spark 0.9 also include spark streaming?
>>>>>>>>> If not, is there an equivalent?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to