Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Alan Braithwaite Thu, 17 Sep 2015 12:40:16 -0700

One other piece of information:

We're using zookeeper for persistence and when we brought the dispatcher
back online, it crashed on the same exception after loading the config from
zookeeper.


Cheers,
- Alan

On Thu, Sep 17, 2015 at 12:29 PM, Alan Braithwaite <a...@cloudflare.com>
wrote:

> Hey All,
>
> To bump this thread once again, I'm having some trouble using the
> dispatcher as well.
>
> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed the
> dispatcher as Marathon job.  When I submit a job using spark submit, the
> dispatcher writes back that the submission was successful and then promptly
> dies in marathon.  Looking at the logs reveals it was hitting the following
> line:
>
> 398:          throw new SparkException("Executor Spark home
> `spark.mesos.executor.home` is not set!")
>
> Which is odd because it's set in multiple places (SPARK_HOME,
> spark.mesos.executor.home, spark.home, etc).  Reading the code, it
> appears that the driver desc pulls only from the request and disregards any
> other properties that may be configured.  Testing by passing --conf
> spark.mesos.executor.home=/usr/local/spark on the command line to
> spark-submit confirms this.  We're trying to isolate the number of places
> where we have to set properties within spark and were hoping that it will
> be possible to have this pull in the spark-defaults.conf from somewhere, or
> at least allow the user to inform the dispatcher through spark-submit that
> those properties will be available once the job starts.
>
> Finally, I don't think the dispatcher should crash in this event.  It
> seems not exceptional that a job is misconfigured when submitted.
>
> Please direct me on the right path if I'm headed in the wrong direction.
> Also let me know if I should open some tickets for these issues.
>
> Thanks,
> - Alan
>
> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>
>> Yes you can create an issue, or actually contribute a patch to update it
>> :)
>>
>> Sorry the docs is a bit light, I'm going to make it more complete along
>> the way.
>>
>> Tim
>>
>>
>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
>> tomwa...@cisco.com> wrote:
>>
>>> Tim,
>>>
>>> Thank you for the explanation.  You are correct, my Mesos experience is
>>> very light, and I haven’t deployed anything via Marathon yet.  What you
>>> have stated here makes sense, I will look into doing this.
>>>
>>> Adding this info to the docs would be great.  Is the appropriate action
>>> to create an issue regarding improvement of the docs?  For those of us who
>>> are gaining the experience having such a pointer is very helpful.
>>>
>>> Tom
>>>
>>> From: Tim Chen <t...@mesosphere.io>
>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>>
>>> Hi Tom,
>>>
>>> Sorry the documentation isn't really rich, since it's probably assuming
>>> users understands how Mesos and framework works.
>>>
>>> First I need explain the rationale of why create the dispatcher. If
>>> you're not familiar with Mesos yet, each node in your datacenter is
>>> installed a Mesos slave where it's responsible for publishing resources and
>>> running/watching tasks, and Mesos master is responsible for taking the
>>> aggregated resources and scheduling them among frameworks.
>>>
>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't
>>> launch and maintain framework but assume they're launched and kept running
>>> on its own. All the existing frameworks in the ecosystem therefore all have
>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, etc).
>>>
>>> Therefore, to introduce cluster mode with Mesos, we must create a
>>> framework that is long running that can be running in your datacenter, and
>>> can handle launching spark drivers on demand and handle HA, etc. This is
>>> what the dispatcher is all about.
>>>
>>> So the idea is that you should launch the dispatcher not on the client,
>>> but on a machine in your datacenter. In Mesosphere's DCOS we launch all
>>> frameworks and long running services with Marathon, and you can use
>>> Marathon to launch the Spark dispatcher.
>>>
>>> Then all clients instead of specifying the Mesos master URL (e.g:
>>> mesos://mesos.master:2181), then just talks to the dispatcher only
>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start
>>> and watch the driver for you.
>>>
>>> Tim
>>>
>>>
>>>
>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
>>> tomwa...@cisco.com> wrote:
>>>
>>>> After spending most of yesterday scouring the Internet for sources of
>>>> documentation for submitting Spark jobs in cluster mode to a Spark cluster
>>>> managed by Mesos I was able to do just that, but I am not convinced that
>>>> how I have things setup is correct.
>>>>
>>>> I used the Mesos published
>>>> <https://open.mesosphere.com/getting-started/datacenter/install/>
>>>> instructions for setting up my Mesos cluster.  I have three Zookeeper
>>>> instances, three Mesos master instances, and three Mesos slave instances.
>>>> This is all running in Openstack.
>>>>
>>>> The documentation on the Spark documentation site states that “To use
>>>> cluster mode, you must start the MesosClusterDispatcher in your cluster via
>>>> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master
>>>> url (e.g: mesos://host:5050).”  That is it, no more information than
>>>> that.  So that is what I did: I have one machine that I use as the Spark
>>>> client for submitting jobs.  I started the Mesos dispatcher with script as
>>>> described, and using the client machine’s IP address and port as the target
>>>> for the job submitted the job.
>>>>
>>>> The job is currently running in Mesos as expected.  This is not however
>>>> how I would have expected to configure the system.  As running there is one
>>>> instance of the Spark Mesos dispatcher running outside of Mesos, so not a
>>>> part of the sphere of Mesos resource management.
>>>>
>>>> I used the following Stack Overflow posts as guidelines:
>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>>
>>>> There must be better documentation on how to deploy Spark in Mesos with
>>>> jobs able to be deployed in cluster mode.
>>>>
>>>> I can follow up with more specific information regarding my deployment
>>>> if necessary.
>>>>
>>>> Tom
>>>>
>>>
>>>
>>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to