Re: Docker/Mesos with Spark

2016-01-19 Thread Sathish Kumaran Vairavelu
Hi Tim

Do you have any materials/blog for running Spark in a container in Mesos
cluster environment? I have googled it but couldn't find info on it. Spark
documentation says it is possible, but no details provided.. Please help


Thanks

Sathish



On Mon, Sep 21, 2015 at 11:54 AM Tim Chen  wrote:

> Hi John,
>
> There is no other blog post yet, I'm thinking to do a series of posts but
> so far haven't get time to do that yet.
>
> Running Spark in docker containers makes distributing spark versions easy,
> it's simple to upgrade and automatically caches on the slaves so the same
> image just runs right away. Most of the docker perf is usually related to
> network and filesystem overheads, but I think with recent changes in Spark
> to make Mesos sandbox the default temp dir filesystem won't be a big
> concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
> uses host network by default so network is affected much.
>
> Most of the cluster mode limitation is that you need to make the spark job
> files available somewhere that all the slaves can access remotely (http,
> s3, hdfs, etc) or available on all slaves locally by path.
>
> I'll try to make more doc efforts once I get my existing patches and
> testing infra work done.
>
> Let me know if you have more questions,
>
> Tim
>
> On Sat, Sep 19, 2015 at 5:42 AM, John Omernik  wrote:
>
>> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and
>> just found you CAN run it this way.  Are there any user posts, blog posts,
>> etc on why and how you'd do this?
>>
>> Basically, at first I was questioning why you'd run spark in a docker
>> container, i.e., if you run with tar balled executor, what are you really
>> gaining?  And in this setup, are you losing out on performance somehow? (I
>> am guessing smarter people than I have figured that out).
>>
>> Then I came along a situation where I wanted to use a python library with
>> spark, and it had to be installed on every node, and I realized one big
>> advantage of dockerized spark would be that spark apps that needed other
>> libraries could be contained and built well.
>>
>> OK, that's huge, let's do that.  For my next question there are lot of
>> "questions" have on how this actually works.  Does Clustermode/client mode
>> apply here? If so, how?  Is there a good walk through on getting this
>> setup? Limitations? Gotchas?  Should I just dive in an start working with
>> it? Has anyone done any stories/rough documentation? This seems like a
>> really helpful feature to scaling out spark, and letting developers truly
>> build what they need without tons of admin overhead, so I really want to
>> explore.
>>
>> Thanks!
>>
>> John
>>
>
>


Re: Docker/Mesos with Spark

2016-01-19 Thread Tim Chen
Hi Sathish,

Sorry about that, I think that's a good idea and I'll write up a section in
the Spark documentation page to explain how it can work. We (Mesosphere)
have been doing this for our DCOS spark for our past releases and has been
working well so far.

Thanks!

Tim

On Tue, Jan 19, 2016 at 12:28 PM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Hi Tim
>
> Do you have any materials/blog for running Spark in a container in Mesos
> cluster environment? I have googled it but couldn't find info on it. Spark
> documentation says it is possible, but no details provided.. Please help
>
>
> Thanks
>
> Sathish
>
>
>
>
> On Mon, Sep 21, 2015 at 11:54 AM Tim Chen  wrote:
>
>> Hi John,
>>
>> There is no other blog post yet, I'm thinking to do a series of posts but
>> so far haven't get time to do that yet.
>>
>> Running Spark in docker containers makes distributing spark versions
>> easy, it's simple to upgrade and automatically caches on the slaves so the
>> same image just runs right away. Most of the docker perf is usually related
>> to network and filesystem overheads, but I think with recent changes in
>> Spark to make Mesos sandbox the default temp dir filesystem won't be a big
>> concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
>> uses host network by default so network is affected much.
>>
>> Most of the cluster mode limitation is that you need to make the spark
>> job files available somewhere that all the slaves can access remotely
>> (http, s3, hdfs, etc) or available on all slaves locally by path.
>>
>> I'll try to make more doc efforts once I get my existing patches and
>> testing infra work done.
>>
>> Let me know if you have more questions,
>>
>> Tim
>>
>> On Sat, Sep 19, 2015 at 5:42 AM, John Omernik  wrote:
>>
>>> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities
>>> and just found you CAN run it this way.  Are there any user posts, blog
>>> posts, etc on why and how you'd do this?
>>>
>>> Basically, at first I was questioning why you'd run spark in a docker
>>> container, i.e., if you run with tar balled executor, what are you really
>>> gaining?  And in this setup, are you losing out on performance somehow? (I
>>> am guessing smarter people than I have figured that out).
>>>
>>> Then I came along a situation where I wanted to use a python library
>>> with spark, and it had to be installed on every node, and I realized one
>>> big advantage of dockerized spark would be that spark apps that needed
>>> other libraries could be contained and built well.
>>>
>>> OK, that's huge, let's do that.  For my next question there are lot of
>>> "questions" have on how this actually works.  Does Clustermode/client mode
>>> apply here? If so, how?  Is there a good walk through on getting this
>>> setup? Limitations? Gotchas?  Should I just dive in an start working with
>>> it? Has anyone done any stories/rough documentation? This seems like a
>>> really helpful feature to scaling out spark, and letting developers truly
>>> build what they need without tons of admin overhead, so I really want to
>>> explore.
>>>
>>> Thanks!
>>>
>>> John
>>>
>>
>>


Re: Docker/Mesos with Spark

2016-01-19 Thread Sathish Kumaran Vairavelu
Thank you! Looking forward for it..


On Tue, Jan 19, 2016 at 4:03 PM Tim Chen  wrote:

> Hi Sathish,
>
> Sorry about that, I think that's a good idea and I'll write up a section
> in the Spark documentation page to explain how it can work. We (Mesosphere)
> have been doing this for our DCOS spark for our past releases and has been
> working well so far.
>
> Thanks!
>
> Tim
>
> On Tue, Jan 19, 2016 at 12:28 PM, Sathish Kumaran Vairavelu <
> vsathishkuma...@gmail.com> wrote:
>
>> Hi Tim
>>
>> Do you have any materials/blog for running Spark in a container in Mesos
>> cluster environment? I have googled it but couldn't find info on it. Spark
>> documentation says it is possible, but no details provided.. Please help
>>
>>
>> Thanks
>>
>> Sathish
>>
>>
>>
>>
>> On Mon, Sep 21, 2015 at 11:54 AM Tim Chen  wrote:
>>
>>> Hi John,
>>>
>>> There is no other blog post yet, I'm thinking to do a series of posts
>>> but so far haven't get time to do that yet.
>>>
>>> Running Spark in docker containers makes distributing spark versions
>>> easy, it's simple to upgrade and automatically caches on the slaves so the
>>> same image just runs right away. Most of the docker perf is usually related
>>> to network and filesystem overheads, but I think with recent changes in
>>> Spark to make Mesos sandbox the default temp dir filesystem won't be a big
>>> concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
>>> uses host network by default so network is affected much.
>>>
>>> Most of the cluster mode limitation is that you need to make the spark
>>> job files available somewhere that all the slaves can access remotely
>>> (http, s3, hdfs, etc) or available on all slaves locally by path.
>>>
>>> I'll try to make more doc efforts once I get my existing patches and
>>> testing infra work done.
>>>
>>> Let me know if you have more questions,
>>>
>>> Tim
>>>
>>> On Sat, Sep 19, 2015 at 5:42 AM, John Omernik  wrote:
>>>
 I was searching in the 1.5.0 docs on the Docker on Mesos capabilities
 and just found you CAN run it this way.  Are there any user posts, blog
 posts, etc on why and how you'd do this?

 Basically, at first I was questioning why you'd run spark in a docker
 container, i.e., if you run with tar balled executor, what are you really
 gaining?  And in this setup, are you losing out on performance somehow? (I
 am guessing smarter people than I have figured that out).

 Then I came along a situation where I wanted to use a python library
 with spark, and it had to be installed on every node, and I realized one
 big advantage of dockerized spark would be that spark apps that needed
 other libraries could be contained and built well.

 OK, that's huge, let's do that.  For my next question there are lot of
 "questions" have on how this actually works.  Does Clustermode/client mode
 apply here? If so, how?  Is there a good walk through on getting this
 setup? Limitations? Gotchas?  Should I just dive in an start working with
 it? Has anyone done any stories/rough documentation? This seems like a
 really helpful feature to scaling out spark, and letting developers truly
 build what they need without tons of admin overhead, so I really want to
 explore.

 Thanks!

 John

>>>
>>>
>


Re: Docker/Mesos with Spark

2016-01-19 Thread Darren Govoni


I also would be interested in some best practice for making this work.
Where will the writeup be posted? On mesosphere website?


Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: Sathish Kumaran Vairavelu <vsathishkuma...@gmail.com> 
Date: 01/19/2016  7:00 PM  (GMT-05:00) 
To: Tim Chen <t...@mesosphere.io> 
Cc: John Omernik <j...@omernik.com>, user <user@spark.apache.org> 
Subject: Re: Docker/Mesos with Spark 

Thank you! Looking forward for it..

On Tue, Jan 19, 2016 at 4:03 PM Tim Chen <t...@mesosphere.io> wrote:
Hi Sathish,
Sorry about that, I think that's a good idea and I'll write up a section in the 
Spark documentation page to explain how it can work. We (Mesosphere) have been 
doing this for our DCOS spark for our past releases and has been working well 
so far.
Thanks!
Tim
On Tue, Jan 19, 2016 at 12:28 PM, Sathish Kumaran Vairavelu 
<vsathishkuma...@gmail.com> wrote:
Hi Tim

Do you have any materials/blog for running Spark in a container in Mesos 
cluster environment? I have googled it but couldn't find info on it. Spark 
documentation says it is possible, but no details provided.. Please help


Thanks 

Sathish



On Mon, Sep 21, 2015 at 11:54 AM Tim Chen <t...@mesosphere.io> wrote:
Hi John,
There is no other blog post yet, I'm thinking to do a series of posts but so 
far haven't get time to do that yet.
Running Spark in docker containers makes distributing spark versions easy, it's 
simple to upgrade and automatically caches on the slaves so the same image just 
runs right away. Most of the docker perf is usually related to network and 
filesystem overheads, but I think with recent changes in Spark to make Mesos 
sandbox the default temp dir filesystem won't be a big concern as it's mostly 
writing to the mounted in Mesos sandbox. Also Mesos uses host network by 
default so network is affected much.
Most of the cluster mode limitation is that you need to make the spark job 
files available somewhere that all the slaves can access remotely (http, s3, 
hdfs, etc) or available on all slaves locally by path. 
I'll try to make more doc efforts once I get my existing patches and testing 
infra work done.
Let me know if you have more questions,
Tim
On Sat, Sep 19, 2015 at 5:42 AM, John Omernik <j...@omernik.com> wrote:
I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and just 
found you CAN run it this way.  Are there any user posts, blog posts, etc on 
why and how you'd do this? 
Basically, at first I was questioning why you'd run spark in a docker 
container, i.e., if you run with tar balled executor, what are you really 
gaining?  And in this setup, are you losing out on performance somehow? (I am 
guessing smarter people than I have figured that out).  
Then I came along a situation where I wanted to use a python library with 
spark, and it had to be installed on every node, and I realized one big 
advantage of dockerized spark would be that spark apps that needed other 
libraries could be contained and built well.   
OK, that's huge, let's do that.  For my next question there are lot of 
"questions" have on how this actually works.  Does Clustermode/client mode 
apply here? If so, how?  Is there a good walk through on getting this setup? 
Limitations? Gotchas?  Should I just dive in an start working with it? Has 
anyone done any stories/rough documentation? This seems like a really helpful 
feature to scaling out spark, and letting developers truly build what they need 
without tons of admin overhead, so I really want to explore. 
Thanks!
John








Re: Docker/Mesos with Spark

2016-01-19 Thread Nagaraj Chandrashekar
Hi John,

I recently deployed Redis instances using Kubernetes framework on Apache Mesos. 
  Kubernetes uses POD concept and you can run your requirements (Redis/Spark) 
as a docker container and also adds up some of the HA concepts to the instances.

Cheers
Nagaraj C

From: Darren Govoni <dar...@ontrenet.com<mailto:dar...@ontrenet.com>>
Date: Wednesday, January 20, 2016 at 8:15 AM
To: Sathish Kumaran Vairavelu 
<vsathishkuma...@gmail.com<mailto:vsathishkuma...@gmail.com>>, Tim Chen 
<t...@mesosphere.io<mailto:t...@mesosphere.io>>
Cc: John Omernik <j...@omernik.com<mailto:j...@omernik.com>>, user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Docker/Mesos with Spark

I also would be interested in some best practice for making this work.

Where will the writeup be posted? On mesosphere website?



Sent from my Verizon Wireless 4G LTE smartphone


 Original message 
From: Sathish Kumaran Vairavelu 
<vsathishkuma...@gmail.com<mailto:vsathishkuma...@gmail.com>>
Date: 01/19/2016 7:00 PM (GMT-05:00)
To: Tim Chen <t...@mesosphere.io<mailto:t...@mesosphere.io>>
Cc: John Omernik <j...@omernik.com<mailto:j...@omernik.com>>, user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Docker/Mesos with Spark

Thank you! Looking forward for it..


On Tue, Jan 19, 2016 at 4:03 PM Tim Chen 
<t...@mesosphere.io<mailto:t...@mesosphere.io>> wrote:
Hi Sathish,

Sorry about that, I think that's a good idea and I'll write up a section in the 
Spark documentation page to explain how it can work. We (Mesosphere) have been 
doing this for our DCOS spark for our past releases and has been working well 
so far.

Thanks!

Tim

On Tue, Jan 19, 2016 at 12:28 PM, Sathish Kumaran Vairavelu 
<vsathishkuma...@gmail.com<mailto:vsathishkuma...@gmail.com>> wrote:
Hi Tim

Do you have any materials/blog for running Spark in a container in Mesos 
cluster environment? I have googled it but couldn't find info on it. Spark 
documentation says it is possible, but no details provided.. Please help


Thanks

Sathish




On Mon, Sep 21, 2015 at 11:54 AM Tim Chen 
<t...@mesosphere.io<mailto:t...@mesosphere.io>> wrote:
Hi John,

There is no other blog post yet, I'm thinking to do a series of posts but so 
far haven't get time to do that yet.

Running Spark in docker containers makes distributing spark versions easy, it's 
simple to upgrade and automatically caches on the slaves so the same image just 
runs right away. Most of the docker perf is usually related to network and 
filesystem overheads, but I think with recent changes in Spark to make Mesos 
sandbox the default temp dir filesystem won't be a big concern as it's mostly 
writing to the mounted in Mesos sandbox. Also Mesos uses host network by 
default so network is affected much.

Most of the cluster mode limitation is that you need to make the spark job 
files available somewhere that all the slaves can access remotely (http, s3, 
hdfs, etc) or available on all slaves locally by path.

I'll try to make more doc efforts once I get my existing patches and testing 
infra work done.

Let me know if you have more questions,

Tim

On Sat, Sep 19, 2015 at 5:42 AM, John Omernik 
<j...@omernik.com<mailto:j...@omernik.com>> wrote:
I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and just 
found you CAN run it this way.  Are there any user posts, blog posts, etc on 
why and how you'd do this?

Basically, at first I was questioning why you'd run spark in a docker 
container, i.e., if you run with tar balled executor, what are you really 
gaining?  And in this setup, are you losing out on performance somehow? (I am 
guessing smarter people than I have figured that out).

Then I came along a situation where I wanted to use a python library with 
spark, and it had to be installed on every node, and I realized one big 
advantage of dockerized spark would be that spark apps that needed other 
libraries could be contained and built well.

OK, that's huge, let's do that.  For my next question there are lot of 
"questions" have on how this actually works.  Does Clustermode/client mode 
apply here? If so, how?  Is there a good walk through on getting this setup? 
Limitations? Gotchas?  Should I just dive in an start working with it? Has 
anyone done any stories/rough documentation? This seems like a really helpful 
feature to scaling out spark, and letting developers truly build what they need 
without tons of admin overhead, so I really want to explore.

Thanks!

John




Re: Docker/Mesos with Spark

2015-09-21 Thread Tim Chen
Hi John,

There is no other blog post yet, I'm thinking to do a series of posts but
so far haven't get time to do that yet.

Running Spark in docker containers makes distributing spark versions easy,
it's simple to upgrade and automatically caches on the slaves so the same
image just runs right away. Most of the docker perf is usually related to
network and filesystem overheads, but I think with recent changes in Spark
to make Mesos sandbox the default temp dir filesystem won't be a big
concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
uses host network by default so network is affected much.

Most of the cluster mode limitation is that you need to make the spark job
files available somewhere that all the slaves can access remotely (http,
s3, hdfs, etc) or available on all slaves locally by path.

I'll try to make more doc efforts once I get my existing patches and
testing infra work done.

Let me know if you have more questions,

Tim

On Sat, Sep 19, 2015 at 5:42 AM, John Omernik  wrote:

> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and
> just found you CAN run it this way.  Are there any user posts, blog posts,
> etc on why and how you'd do this?
>
> Basically, at first I was questioning why you'd run spark in a docker
> container, i.e., if you run with tar balled executor, what are you really
> gaining?  And in this setup, are you losing out on performance somehow? (I
> am guessing smarter people than I have figured that out).
>
> Then I came along a situation where I wanted to use a python library with
> spark, and it had to be installed on every node, and I realized one big
> advantage of dockerized spark would be that spark apps that needed other
> libraries could be contained and built well.
>
> OK, that's huge, let's do that.  For my next question there are lot of
> "questions" have on how this actually works.  Does Clustermode/client mode
> apply here? If so, how?  Is there a good walk through on getting this
> setup? Limitations? Gotchas?  Should I just dive in an start working with
> it? Has anyone done any stories/rough documentation? This seems like a
> really helpful feature to scaling out spark, and letting developers truly
> build what they need without tons of admin overhead, so I really want to
> explore.
>
> Thanks!
>
> John
>