Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Timothy Chen Sat, 19 Sep 2015 11:04:34 -0700

You can still provide properties through the docker container by putting 
configuration in the conf directory, but we try to pass all properties 
submitted from the driver spark-submit through which I believe will override 
the defaults.


This is not what you are seeing?

Tim


> On Sep 19, 2015, at 9:01 AM, Alan Braithwaite <a...@cloudflare.com> wrote:
> 
> The assumption that the executor has no default properties set in it's 
> environment through the docker container.  Correct me if I'm wrong, but any 
> properties which are unset in the SparkContext will come from the environment 
> of the executor will it not?
> 
> Thanks,
> - Alan
> 
>> On Sat, Sep 19, 2015 at 1:09 AM, Tim Chen <t...@mesosphere.io> wrote:
>> I guess I need a bit more clarification, what kind of assumptions was the 
>> dispatcher making?
>> 
>> Tim
>> 
>> 
>>> On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com> 
>>> wrote:
>>> Hi Tim,
>>> 
>>> Thanks for the follow up.  It's not so much that I expect the executor to 
>>> inherit the configuration of the dispatcher as I don't expect the 
>>> dispatcher to make assumptions about the system environment of the executor 
>>> (since it lives in a docker).  I could potentially see a case where you 
>>> might want to explicitly forbid the defaults, but I can't think of any 
>>> right now.
>>> 
>>> Otherwise, I'm confused as to why the defaults in the docker image for the 
>>> executor are just ignored.  I suppose that it's the dispatchers job to 
>>> ensure the exact configuration of the executor, regardless of the defaults 
>>> set on the executors machine?  Is that the assumption being made?  I can 
>>> understand that in contexts which aren't docker driven since jobs could be 
>>> rolling out in the middle of a config update.  Trying to think of this 
>>> outside the terms of just mesos/docker (since I'm fully aware that docker 
>>> doesn't rule the world yet).
>>> 
>>> So I can see this from both perspectives now and passing in the properties 
>>> file will probably work just fine for me, but for my better understanding: 
>>> When the executor starts, will it read any of the environment that it's 
>>> executing in or will it just take only the properties given to it by the 
>>> dispatcher and nothing more?
>>> 
>>> Lemme know if anything needs more clarification and thanks for your mesos 
>>> contribution to spark!
>>> 
>>> - Alan
>>> 
>>>> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote:
>>>> Hi Alan,
>>>> 
>>>> If I understand correctly, you are setting executor home when you launch 
>>>> the dispatcher and not on the configuration when you submit job, and 
>>>> expect it to inherit that configuration?
>>>> 
>>>> When I worked on the dispatcher I was assuming all configuration is passed 
>>>> to the dispatcher to launch the job exactly how you will need to launch it 
>>>> with client mode.
>>>> 
>>>> But indeed it shouldn't crash dispatcher, I'll take a closer look when I 
>>>> get a chance.
>>>> 
>>>> Can you recommend changes on the documentation, either in email or a PR?
>>>> 
>>>> Thanks!
>>>> 
>>>> Tim
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> 
>>>>> wrote:
>>>>> 
>>>>> Hey All,
>>>>> 
>>>>> To bump this thread once again, I'm having some trouble using the 
>>>>> dispatcher as well.
>>>>> 
>>>>> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed the 
>>>>> dispatcher as Marathon job.  When I submit a job using spark submit, the 
>>>>> dispatcher writes back that the submission was successful and then 
>>>>> promptly dies in marathon.  Looking at the logs reveals it was hitting 
>>>>> the following line:
>>>>> 
>>>>> 398:          throw new SparkException("Executor Spark home 
>>>>> `spark.mesos.executor.home` is not set!")
>>>>> 
>>>>> Which is odd because it's set in multiple places (SPARK_HOME, 
>>>>> spark.mesos.executor.home, spark.home, etc).  Reading the code, it 
>>>>> appears that the driver desc pulls only from the request and disregards 
>>>>> any other properties that may be configured.  Testing by passing --conf 
>>>>> spark.mesos.executor.home=/usr/local/spark on the command line to 
>>>>> spark-submit confirms this.  We're trying to isolate the number of places 
>>>>> where we have to set properties within spark and were hoping that it will 
>>>>> be possible to have this pull in the spark-defaults.conf from somewhere, 
>>>>> or at least allow the user to inform the dispatcher through spark-submit 
>>>>> that those properties will be available once the job starts. 
>>>>> 
>>>>> Finally, I don't think the dispatcher should crash in this event.  It 
>>>>> seems not exceptional that a job is misconfigured when submitted.
>>>>> 
>>>>> Please direct me on the right path if I'm headed in the wrong direction.  
>>>>> Also let me know if I should open some tickets for these issues.
>>>>> 
>>>>> Thanks,
>>>>> - Alan
>>>>> 
>>>>>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>>>>>> Yes you can create an issue, or actually contribute a patch to update it 
>>>>>> :)
>>>>>> 
>>>>>> Sorry the docs is a bit light, I'm going to make it more complete along 
>>>>>> the way.
>>>>>> 
>>>>>> Tim
>>>>>> 
>>>>>> 
>>>>>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) 
>>>>>>> <tomwa...@cisco.com> wrote:
>>>>>>> Tim,
>>>>>>> 
>>>>>>> Thank you for the explanation.  You are correct, my Mesos experience is 
>>>>>>> very light, and I haven’t deployed anything via Marathon yet.  What you 
>>>>>>> have stated here makes sense, I will look into doing this.
>>>>>>> 
>>>>>>> Adding this info to the docs would be great.  Is the appropriate action 
>>>>>>> to create an issue regarding improvement of the docs?  For those of us 
>>>>>>> who are gaining the experience having such a pointer is very helpful.
>>>>>>> 
>>>>>>> Tom
>>>>>>> 
>>>>>>> From: Tim Chen <t...@mesosphere.io>
>>>>>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>>>>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>>>>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>>>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>>>>>> 
>>>>>>> Hi Tom,
>>>>>>> 
>>>>>>> Sorry the documentation isn't really rich, since it's probably assuming 
>>>>>>> users understands how Mesos and framework works.
>>>>>>> 
>>>>>>> First I need explain the rationale of why create the dispatcher. If 
>>>>>>> you're not familiar with Mesos yet, each node in your datacenter is 
>>>>>>> installed a Mesos slave where it's responsible for publishing resources 
>>>>>>> and running/watching tasks, and Mesos master is responsible for taking 
>>>>>>> the aggregated resources and scheduling them among frameworks. 
>>>>>>> 
>>>>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't 
>>>>>>> launch and maintain framework but assume they're launched and kept 
>>>>>>> running on its own. All the existing frameworks in the ecosystem 
>>>>>>> therefore all have their own ways to deploy, HA and persist state (e.g: 
>>>>>>> Aurora, Marathon, etc).
>>>>>>> 
>>>>>>> Therefore, to introduce cluster mode with Mesos, we must create a 
>>>>>>> framework that is long running that can be running in your datacenter, 
>>>>>>> and can handle launching spark drivers on demand and handle HA, etc. 
>>>>>>> This is what the dispatcher is all about.
>>>>>>> 
>>>>>>> So the idea is that you should launch the dispatcher not on the client, 
>>>>>>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all 
>>>>>>> frameworks and long running services with Marathon, and you can use 
>>>>>>> Marathon to launch the Spark dispatcher.
>>>>>>> 
>>>>>>> Then all clients instead of specifying the Mesos master URL (e.g: 
>>>>>>> mesos://mesos.master:2181), then just talks to the dispatcher only 
>>>>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then 
>>>>>>> start and watch the driver for you.
>>>>>>> 
>>>>>>> Tim
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) 
>>>>>>>> <tomwa...@cisco.com> wrote:
>>>>>>>> After spending most of yesterday scouring the Internet for sources of 
>>>>>>>> documentation for submitting Spark jobs in cluster mode to a Spark 
>>>>>>>> cluster managed by Mesos I was able to do just that, but I am not 
>>>>>>>> convinced that how I have things setup is correct.
>>>>>>>> 
>>>>>>>> I used the Mesos published instructions for setting up my Mesos 
>>>>>>>> cluster.  I have three Zookeeper instances, three Mesos master 
>>>>>>>> instances, and three Mesos slave instances.  This is all running in 
>>>>>>>> Openstack.
>>>>>>>> 
>>>>>>>> The documentation on the Spark documentation site states that “To use 
>>>>>>>> cluster mode, you must start the MesosClusterDispatcher in your 
>>>>>>>> cluster via the sbin/start-mesos-dispatcher.sh script, passing in the 
>>>>>>>> Mesos master url (e.g: mesos://host:5050).”  That is it, no more 
>>>>>>>> information than that.  So that is what I did: I have one machine that 
>>>>>>>> I use as the Spark client for submitting jobs.  I started the Mesos 
>>>>>>>> dispatcher with script as described, and using the client machine’s IP 
>>>>>>>> address and port as the target for the job submitted the job.
>>>>>>>> 
>>>>>>>> The job is currently running in Mesos as expected.  This is not 
>>>>>>>> however how I would have expected to configure the system.  As running 
>>>>>>>> there is one instance of the Spark Mesos dispatcher running outside of 
>>>>>>>> Mesos, so not a part of the sphere of Mesos resource management.  
>>>>>>>> 
>>>>>>>> I used the following Stack Overflow posts as guidelines:
>>>>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>>>>>> 
>>>>>>>> There must be better documentation on how to deploy Spark in Mesos 
>>>>>>>> with jobs able to be deployed in cluster mode.
>>>>>>>> 
>>>>>>>> I can follow up with more specific information regarding my deployment 
>>>>>>>> if necessary.
>>>>>>>> 
>>>>>>>> Tom
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to