One other piece of information: We're using zookeeper for persistence and when we brought the dispatcher back online, it crashed on the same exception after loading the config from zookeeper.
Cheers, - Alan On Thu, Sep 17, 2015 at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> wrote: > Hey All, > > To bump this thread once again, I'm having some trouble using the > dispatcher as well. > > I'm using Mesos Cluster Manager with Docker Executors. I've deployed the > dispatcher as Marathon job. When I submit a job using spark submit, the > dispatcher writes back that the submission was successful and then promptly > dies in marathon. Looking at the logs reveals it was hitting the following > line: > > 398: throw new SparkException("Executor Spark home > `spark.mesos.executor.home` is not set!") > > Which is odd because it's set in multiple places (SPARK_HOME, > spark.mesos.executor.home, spark.home, etc). Reading the code, it > appears that the driver desc pulls only from the request and disregards any > other properties that may be configured. Testing by passing --conf > spark.mesos.executor.home=/usr/local/spark on the command line to > spark-submit confirms this. We're trying to isolate the number of places > where we have to set properties within spark and were hoping that it will > be possible to have this pull in the spark-defaults.conf from somewhere, or > at least allow the user to inform the dispatcher through spark-submit that > those properties will be available once the job starts. > > Finally, I don't think the dispatcher should crash in this event. It > seems not exceptional that a job is misconfigured when submitted. > > Please direct me on the right path if I'm headed in the wrong direction. > Also let me know if I should open some tickets for these issues. > > Thanks, > - Alan > > On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote: > >> Yes you can create an issue, or actually contribute a patch to update it >> :) >> >> Sorry the docs is a bit light, I'm going to make it more complete along >> the way. >> >> Tim >> >> >> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) < >> tomwa...@cisco.com> wrote: >> >>> Tim, >>> >>> Thank you for the explanation. You are correct, my Mesos experience is >>> very light, and I haven’t deployed anything via Marathon yet. What you >>> have stated here makes sense, I will look into doing this. >>> >>> Adding this info to the docs would be great. Is the appropriate action >>> to create an issue regarding improvement of the docs? For those of us who >>> are gaining the experience having such a pointer is very helpful. >>> >>> Tom >>> >>> From: Tim Chen <t...@mesosphere.io> >>> Date: Thursday, September 10, 2015 at 10:25 AM >>> To: Tom Waterhouse <tomwa...@cisco.com> >>> Cc: "user@spark.apache.org" <user@spark.apache.org> >>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation >>> >>> Hi Tom, >>> >>> Sorry the documentation isn't really rich, since it's probably assuming >>> users understands how Mesos and framework works. >>> >>> First I need explain the rationale of why create the dispatcher. If >>> you're not familiar with Mesos yet, each node in your datacenter is >>> installed a Mesos slave where it's responsible for publishing resources and >>> running/watching tasks, and Mesos master is responsible for taking the >>> aggregated resources and scheduling them among frameworks. >>> >>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't >>> launch and maintain framework but assume they're launched and kept running >>> on its own. All the existing frameworks in the ecosystem therefore all have >>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, etc). >>> >>> Therefore, to introduce cluster mode with Mesos, we must create a >>> framework that is long running that can be running in your datacenter, and >>> can handle launching spark drivers on demand and handle HA, etc. This is >>> what the dispatcher is all about. >>> >>> So the idea is that you should launch the dispatcher not on the client, >>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all >>> frameworks and long running services with Marathon, and you can use >>> Marathon to launch the Spark dispatcher. >>> >>> Then all clients instead of specifying the Mesos master URL (e.g: >>> mesos://mesos.master:2181), then just talks to the dispatcher only >>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start >>> and watch the driver for you. >>> >>> Tim >>> >>> >>> >>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) < >>> tomwa...@cisco.com> wrote: >>> >>>> After spending most of yesterday scouring the Internet for sources of >>>> documentation for submitting Spark jobs in cluster mode to a Spark cluster >>>> managed by Mesos I was able to do just that, but I am not convinced that >>>> how I have things setup is correct. >>>> >>>> I used the Mesos published >>>> <https://open.mesosphere.com/getting-started/datacenter/install/> >>>> instructions for setting up my Mesos cluster. I have three Zookeeper >>>> instances, three Mesos master instances, and three Mesos slave instances. >>>> This is all running in Openstack. >>>> >>>> The documentation on the Spark documentation site states that “To use >>>> cluster mode, you must start the MesosClusterDispatcher in your cluster via >>>> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master >>>> url (e.g: mesos://host:5050).” That is it, no more information than >>>> that. So that is what I did: I have one machine that I use as the Spark >>>> client for submitting jobs. I started the Mesos dispatcher with script as >>>> described, and using the client machine’s IP address and port as the target >>>> for the job submitted the job. >>>> >>>> The job is currently running in Mesos as expected. This is not however >>>> how I would have expected to configure the system. As running there is one >>>> instance of the Spark Mesos dispatcher running outside of Mesos, so not a >>>> part of the sphere of Mesos resource management. >>>> >>>> I used the following Stack Overflow posts as guidelines: >>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher >>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos >>>> >>>> There must be better documentation on how to deploy Spark in Mesos with >>>> jobs able to be deployed in cluster mode. >>>> >>>> I can follow up with more specific information regarding my deployment >>>> if necessary. >>>> >>>> Tom >>>> >>> >>> >> >