Got it, you're right we shouldn't crash when something went wrong when creating the job.
This should be fixed soon. Thanks! Tim On Mon, Sep 21, 2015 at 11:24 AM, Alan Braithwaite <a...@cloudflare.com> wrote: > That could be the behavior but spark.mesos.executor.home being unset still > raises an exception inside the dispatcher preventing a docker from even > being started. I can see if other properties are inherited from the > default environment when that's set, if you'd like. > > I think the main problem is just that premature validation is being done > on the dispatcher and the dispatcher crashing in the event of bad config. > > - Alan > > On Sat, Sep 19, 2015 at 11:03 AM, Timothy Chen <t...@mesosphere.io> wrote: > >> You can still provide properties through the docker container by putting >> configuration in the conf directory, but we try to pass all properties >> submitted from the driver spark-submit through which I believe will >> override the defaults. >> >> This is not what you are seeing? >> >> Tim >> >> >> On Sep 19, 2015, at 9:01 AM, Alan Braithwaite <a...@cloudflare.com> >> wrote: >> >> The assumption that the executor has no default properties set in it's >> environment through the docker container. Correct me if I'm wrong, but any >> properties which are unset in the SparkContext will come from the >> environment of the executor will it not? >> >> Thanks, >> - Alan >> >> On Sat, Sep 19, 2015 at 1:09 AM, Tim Chen <t...@mesosphere.io> wrote: >> >>> I guess I need a bit more clarification, what kind of assumptions was >>> the dispatcher making? >>> >>> Tim >>> >>> >>> On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com> >>> wrote: >>> >>>> Hi Tim, >>>> >>>> Thanks for the follow up. It's not so much that I expect the executor >>>> to inherit the configuration of the dispatcher as I* don't *expect the >>>> dispatcher to make assumptions about the system environment of the executor >>>> (since it lives in a docker). I could potentially see a case where you >>>> might want to explicitly forbid the defaults, but I can't think of any >>>> right now. >>>> >>>> Otherwise, I'm confused as to why the defaults in the docker image for >>>> the executor are just ignored. I suppose that it's the dispatchers job to >>>> ensure the *exact* configuration of the executor, regardless of the >>>> defaults set on the executors machine? Is that the assumption being made? >>>> I can understand that in contexts which aren't docker driven since jobs >>>> could be rolling out in the middle of a config update. Trying to think of >>>> this outside the terms of just mesos/docker (since I'm fully aware that >>>> docker doesn't rule the world yet). >>>> >>>> So I can see this from both perspectives now and passing in the >>>> properties file will probably work just fine for me, but for my better >>>> understanding: When the executor starts, will it read any of the >>>> environment that it's executing in or will it just take only the properties >>>> given to it by the dispatcher and nothing more? >>>> >>>> Lemme know if anything needs more clarification and thanks for your >>>> mesos contribution to spark! >>>> >>>> - Alan >>>> >>>> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> >>>> wrote: >>>> >>>>> Hi Alan, >>>>> >>>>> If I understand correctly, you are setting executor home when you >>>>> launch the dispatcher and not on the configuration when you submit job, >>>>> and >>>>> expect it to inherit that configuration? >>>>> >>>>> When I worked on the dispatcher I was assuming all configuration is >>>>> passed to the dispatcher to launch the job exactly how you will need to >>>>> launch it with client mode. >>>>> >>>>> But indeed it shouldn't crash dispatcher, I'll take a closer look when >>>>> I get a chance. >>>>> >>>>> Can you recommend changes on the documentation, either in email or a >>>>> PR? >>>>> >>>>> Thanks! >>>>> >>>>> Tim >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> >>>>> wrote: >>>>> >>>>> Hey All, >>>>> >>>>> To bump this thread once again, I'm having some trouble using the >>>>> dispatcher as well. >>>>> >>>>> I'm using Mesos Cluster Manager with Docker Executors. I've deployed >>>>> the dispatcher as Marathon job. When I submit a job using spark submit, >>>>> the dispatcher writes back that the submission was successful and then >>>>> promptly dies in marathon. Looking at the logs reveals it was hitting the >>>>> following line: >>>>> >>>>> 398: throw new SparkException("Executor Spark home >>>>> `spark.mesos.executor.home` is not set!") >>>>> >>>>> Which is odd because it's set in multiple places (SPARK_HOME, >>>>> spark.mesos.executor.home, spark.home, etc). Reading the code, it >>>>> appears that the driver desc pulls only from the request and disregards >>>>> any >>>>> other properties that may be configured. Testing by passing --conf >>>>> spark.mesos.executor.home=/usr/local/spark on the command line to >>>>> spark-submit confirms this. We're trying to isolate the number of places >>>>> where we have to set properties within spark and were hoping that it will >>>>> be possible to have this pull in the spark-defaults.conf from somewhere, >>>>> or >>>>> at least allow the user to inform the dispatcher through spark-submit that >>>>> those properties will be available once the job starts. >>>>> >>>>> Finally, I don't think the dispatcher should crash in this event. It >>>>> seems not exceptional that a job is misconfigured when submitted. >>>>> >>>>> Please direct me on the right path if I'm headed in the wrong >>>>> direction. Also let me know if I should open some tickets for these >>>>> issues. >>>>> >>>>> Thanks, >>>>> - Alan >>>>> >>>>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote: >>>>> >>>>>> Yes you can create an issue, or actually contribute a patch to update >>>>>> it :) >>>>>> >>>>>> Sorry the docs is a bit light, I'm going to make it more complete >>>>>> along the way. >>>>>> >>>>>> Tim >>>>>> >>>>>> >>>>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) < >>>>>> tomwa...@cisco.com> wrote: >>>>>> >>>>>>> Tim, >>>>>>> >>>>>>> Thank you for the explanation. You are correct, my Mesos experience >>>>>>> is very light, and I haven’t deployed anything via Marathon yet. What >>>>>>> you >>>>>>> have stated here makes sense, I will look into doing this. >>>>>>> >>>>>>> Adding this info to the docs would be great. Is the appropriate >>>>>>> action to create an issue regarding improvement of the docs? For those >>>>>>> of >>>>>>> us who are gaining the experience having such a pointer is very helpful. >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> From: Tim Chen <t...@mesosphere.io> >>>>>>> Date: Thursday, September 10, 2015 at 10:25 AM >>>>>>> To: Tom Waterhouse <tomwa...@cisco.com> >>>>>>> Cc: "user@spark.apache.org" <user@spark.apache.org> >>>>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation >>>>>>> >>>>>>> Hi Tom, >>>>>>> >>>>>>> Sorry the documentation isn't really rich, since it's probably >>>>>>> assuming users understands how Mesos and framework works. >>>>>>> >>>>>>> First I need explain the rationale of why create the dispatcher. If >>>>>>> you're not familiar with Mesos yet, each node in your datacenter is >>>>>>> installed a Mesos slave where it's responsible for publishing resources >>>>>>> and >>>>>>> running/watching tasks, and Mesos master is responsible for taking the >>>>>>> aggregated resources and scheduling them among frameworks. >>>>>>> >>>>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't >>>>>>> launch and maintain framework but assume they're launched and kept >>>>>>> running >>>>>>> on its own. All the existing frameworks in the ecosystem therefore all >>>>>>> have >>>>>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, >>>>>>> etc). >>>>>>> >>>>>>> Therefore, to introduce cluster mode with Mesos, we must create a >>>>>>> framework that is long running that can be running in your datacenter, >>>>>>> and >>>>>>> can handle launching spark drivers on demand and handle HA, etc. This is >>>>>>> what the dispatcher is all about. >>>>>>> >>>>>>> So the idea is that you should launch the dispatcher not on the >>>>>>> client, but on a machine in your datacenter. In Mesosphere's DCOS we >>>>>>> launch >>>>>>> all frameworks and long running services with Marathon, and you can use >>>>>>> Marathon to launch the Spark dispatcher. >>>>>>> >>>>>>> Then all clients instead of specifying the Mesos master URL (e.g: >>>>>>> mesos://mesos.master:2181), then just talks to the dispatcher only >>>>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then >>>>>>> start >>>>>>> and watch the driver for you. >>>>>>> >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) < >>>>>>> tomwa...@cisco.com> wrote: >>>>>>> >>>>>>>> After spending most of yesterday scouring the Internet for sources >>>>>>>> of documentation for submitting Spark jobs in cluster mode to a Spark >>>>>>>> cluster managed by Mesos I was able to do just that, but I am not >>>>>>>> convinced >>>>>>>> that how I have things setup is correct. >>>>>>>> >>>>>>>> I used the Mesos published >>>>>>>> <https://open.mesosphere.com/getting-started/datacenter/install/> >>>>>>>> instructions for setting up my Mesos cluster. I have three Zookeeper >>>>>>>> instances, three Mesos master instances, and three Mesos slave >>>>>>>> instances. >>>>>>>> This is all running in Openstack. >>>>>>>> >>>>>>>> The documentation on the Spark documentation site states that “To >>>>>>>> use cluster mode, you must start the MesosClusterDispatcher in your >>>>>>>> cluster >>>>>>>> via the sbin/start-mesos-dispatcher.sh script, passing in the >>>>>>>> Mesos master url (e.g: mesos://host:5050).” That is it, no more >>>>>>>> information than that. So that is what I did: I have one machine that >>>>>>>> I >>>>>>>> use as the Spark client for submitting jobs. I started the Mesos >>>>>>>> dispatcher with script as described, and using the client machine’s IP >>>>>>>> address and port as the target for the job submitted the job. >>>>>>>> >>>>>>>> The job is currently running in Mesos as expected. This is not >>>>>>>> however how I would have expected to configure the system. As running >>>>>>>> there is one instance of the Spark Mesos dispatcher running outside of >>>>>>>> Mesos, so not a part of the sphere of Mesos resource management. >>>>>>>> >>>>>>>> I used the following Stack Overflow posts as guidelines: >>>>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher >>>>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos >>>>>>>> >>>>>>>> There must be better documentation on how to deploy Spark in Mesos >>>>>>>> with jobs able to be deployed in cluster mode. >>>>>>>> >>>>>>>> I can follow up with more specific information regarding my >>>>>>>> deployment if necessary. >>>>>>>> >>>>>>>> Tom >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >