You can still provide properties through the docker container by putting configuration in the conf directory, but we try to pass all properties submitted from the driver spark-submit through which I believe will override the defaults.
This is not what you are seeing? Tim > On Sep 19, 2015, at 9:01 AM, Alan Braithwaite <a...@cloudflare.com> wrote: > > The assumption that the executor has no default properties set in it's > environment through the docker container. Correct me if I'm wrong, but any > properties which are unset in the SparkContext will come from the environment > of the executor will it not? > > Thanks, > - Alan > >> On Sat, Sep 19, 2015 at 1:09 AM, Tim Chen <t...@mesosphere.io> wrote: >> I guess I need a bit more clarification, what kind of assumptions was the >> dispatcher making? >> >> Tim >> >> >>> On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com> >>> wrote: >>> Hi Tim, >>> >>> Thanks for the follow up. It's not so much that I expect the executor to >>> inherit the configuration of the dispatcher as I don't expect the >>> dispatcher to make assumptions about the system environment of the executor >>> (since it lives in a docker). I could potentially see a case where you >>> might want to explicitly forbid the defaults, but I can't think of any >>> right now. >>> >>> Otherwise, I'm confused as to why the defaults in the docker image for the >>> executor are just ignored. I suppose that it's the dispatchers job to >>> ensure the exact configuration of the executor, regardless of the defaults >>> set on the executors machine? Is that the assumption being made? I can >>> understand that in contexts which aren't docker driven since jobs could be >>> rolling out in the middle of a config update. Trying to think of this >>> outside the terms of just mesos/docker (since I'm fully aware that docker >>> doesn't rule the world yet). >>> >>> So I can see this from both perspectives now and passing in the properties >>> file will probably work just fine for me, but for my better understanding: >>> When the executor starts, will it read any of the environment that it's >>> executing in or will it just take only the properties given to it by the >>> dispatcher and nothing more? >>> >>> Lemme know if anything needs more clarification and thanks for your mesos >>> contribution to spark! >>> >>> - Alan >>> >>>> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote: >>>> Hi Alan, >>>> >>>> If I understand correctly, you are setting executor home when you launch >>>> the dispatcher and not on the configuration when you submit job, and >>>> expect it to inherit that configuration? >>>> >>>> When I worked on the dispatcher I was assuming all configuration is passed >>>> to the dispatcher to launch the job exactly how you will need to launch it >>>> with client mode. >>>> >>>> But indeed it shouldn't crash dispatcher, I'll take a closer look when I >>>> get a chance. >>>> >>>> Can you recommend changes on the documentation, either in email or a PR? >>>> >>>> Thanks! >>>> >>>> Tim >>>> >>>> Sent from my iPhone >>>> >>>>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> >>>>> wrote: >>>>> >>>>> Hey All, >>>>> >>>>> To bump this thread once again, I'm having some trouble using the >>>>> dispatcher as well. >>>>> >>>>> I'm using Mesos Cluster Manager with Docker Executors. I've deployed the >>>>> dispatcher as Marathon job. When I submit a job using spark submit, the >>>>> dispatcher writes back that the submission was successful and then >>>>> promptly dies in marathon. Looking at the logs reveals it was hitting >>>>> the following line: >>>>> >>>>> 398: throw new SparkException("Executor Spark home >>>>> `spark.mesos.executor.home` is not set!") >>>>> >>>>> Which is odd because it's set in multiple places (SPARK_HOME, >>>>> spark.mesos.executor.home, spark.home, etc). Reading the code, it >>>>> appears that the driver desc pulls only from the request and disregards >>>>> any other properties that may be configured. Testing by passing --conf >>>>> spark.mesos.executor.home=/usr/local/spark on the command line to >>>>> spark-submit confirms this. We're trying to isolate the number of places >>>>> where we have to set properties within spark and were hoping that it will >>>>> be possible to have this pull in the spark-defaults.conf from somewhere, >>>>> or at least allow the user to inform the dispatcher through spark-submit >>>>> that those properties will be available once the job starts. >>>>> >>>>> Finally, I don't think the dispatcher should crash in this event. It >>>>> seems not exceptional that a job is misconfigured when submitted. >>>>> >>>>> Please direct me on the right path if I'm headed in the wrong direction. >>>>> Also let me know if I should open some tickets for these issues. >>>>> >>>>> Thanks, >>>>> - Alan >>>>> >>>>>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote: >>>>>> Yes you can create an issue, or actually contribute a patch to update it >>>>>> :) >>>>>> >>>>>> Sorry the docs is a bit light, I'm going to make it more complete along >>>>>> the way. >>>>>> >>>>>> Tim >>>>>> >>>>>> >>>>>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) >>>>>>> <tomwa...@cisco.com> wrote: >>>>>>> Tim, >>>>>>> >>>>>>> Thank you for the explanation. You are correct, my Mesos experience is >>>>>>> very light, and I haven’t deployed anything via Marathon yet. What you >>>>>>> have stated here makes sense, I will look into doing this. >>>>>>> >>>>>>> Adding this info to the docs would be great. Is the appropriate action >>>>>>> to create an issue regarding improvement of the docs? For those of us >>>>>>> who are gaining the experience having such a pointer is very helpful. >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> From: Tim Chen <t...@mesosphere.io> >>>>>>> Date: Thursday, September 10, 2015 at 10:25 AM >>>>>>> To: Tom Waterhouse <tomwa...@cisco.com> >>>>>>> Cc: "user@spark.apache.org" <user@spark.apache.org> >>>>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation >>>>>>> >>>>>>> Hi Tom, >>>>>>> >>>>>>> Sorry the documentation isn't really rich, since it's probably assuming >>>>>>> users understands how Mesos and framework works. >>>>>>> >>>>>>> First I need explain the rationale of why create the dispatcher. If >>>>>>> you're not familiar with Mesos yet, each node in your datacenter is >>>>>>> installed a Mesos slave where it's responsible for publishing resources >>>>>>> and running/watching tasks, and Mesos master is responsible for taking >>>>>>> the aggregated resources and scheduling them among frameworks. >>>>>>> >>>>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't >>>>>>> launch and maintain framework but assume they're launched and kept >>>>>>> running on its own. All the existing frameworks in the ecosystem >>>>>>> therefore all have their own ways to deploy, HA and persist state (e.g: >>>>>>> Aurora, Marathon, etc). >>>>>>> >>>>>>> Therefore, to introduce cluster mode with Mesos, we must create a >>>>>>> framework that is long running that can be running in your datacenter, >>>>>>> and can handle launching spark drivers on demand and handle HA, etc. >>>>>>> This is what the dispatcher is all about. >>>>>>> >>>>>>> So the idea is that you should launch the dispatcher not on the client, >>>>>>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all >>>>>>> frameworks and long running services with Marathon, and you can use >>>>>>> Marathon to launch the Spark dispatcher. >>>>>>> >>>>>>> Then all clients instead of specifying the Mesos master URL (e.g: >>>>>>> mesos://mesos.master:2181), then just talks to the dispatcher only >>>>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then >>>>>>> start and watch the driver for you. >>>>>>> >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) >>>>>>>> <tomwa...@cisco.com> wrote: >>>>>>>> After spending most of yesterday scouring the Internet for sources of >>>>>>>> documentation for submitting Spark jobs in cluster mode to a Spark >>>>>>>> cluster managed by Mesos I was able to do just that, but I am not >>>>>>>> convinced that how I have things setup is correct. >>>>>>>> >>>>>>>> I used the Mesos published instructions for setting up my Mesos >>>>>>>> cluster. I have three Zookeeper instances, three Mesos master >>>>>>>> instances, and three Mesos slave instances. This is all running in >>>>>>>> Openstack. >>>>>>>> >>>>>>>> The documentation on the Spark documentation site states that “To use >>>>>>>> cluster mode, you must start the MesosClusterDispatcher in your >>>>>>>> cluster via the sbin/start-mesos-dispatcher.sh script, passing in the >>>>>>>> Mesos master url (e.g: mesos://host:5050).” That is it, no more >>>>>>>> information than that. So that is what I did: I have one machine that >>>>>>>> I use as the Spark client for submitting jobs. I started the Mesos >>>>>>>> dispatcher with script as described, and using the client machine’s IP >>>>>>>> address and port as the target for the job submitted the job. >>>>>>>> >>>>>>>> The job is currently running in Mesos as expected. This is not >>>>>>>> however how I would have expected to configure the system. As running >>>>>>>> there is one instance of the Spark Mesos dispatcher running outside of >>>>>>>> Mesos, so not a part of the sphere of Mesos resource management. >>>>>>>> >>>>>>>> I used the following Stack Overflow posts as guidelines: >>>>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher >>>>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos >>>>>>>> >>>>>>>> There must be better documentation on how to deploy Spark in Mesos >>>>>>>> with jobs able to be deployed in cluster mode. >>>>>>>> >>>>>>>> I can follow up with more specific information regarding my deployment >>>>>>>> if necessary. >>>>>>>> >>>>>>>> Tom >