Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Tim Chen Mon, 21 Sep 2015 23:16:57 -0700

Got it, you're right we shouldn't crash when something went wrong when
creating the job.


This should be fixed soon.

Thanks!

Tim

On Mon, Sep 21, 2015 at 11:24 AM, Alan Braithwaite <a...@cloudflare.com>
wrote:

> That could be the behavior but spark.mesos.executor.home being unset still
> raises an exception inside the dispatcher preventing a docker from even
> being started.  I can see if other properties are inherited from the
> default environment when that's set, if you'd like.
>
> I think the main problem is just that premature validation is being done
> on the dispatcher and the dispatcher crashing in the event of bad config.
>
> - Alan
>
> On Sat, Sep 19, 2015 at 11:03 AM, Timothy Chen <t...@mesosphere.io> wrote:
>
>> You can still provide properties through the docker container by putting
>> configuration in the conf directory, but we try to pass all properties
>> submitted from the driver spark-submit through which I believe will
>> override the defaults.
>>
>> This is not what you are seeing?
>>
>> Tim
>>
>>
>> On Sep 19, 2015, at 9:01 AM, Alan Braithwaite <a...@cloudflare.com>
>> wrote:
>>
>> The assumption that the executor has no default properties set in it's
>> environment through the docker container.  Correct me if I'm wrong, but any
>> properties which are unset in the SparkContext will come from the
>> environment of the executor will it not?
>>
>> Thanks,
>> - Alan
>>
>> On Sat, Sep 19, 2015 at 1:09 AM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> I guess I need a bit more clarification, what kind of assumptions was
>>> the dispatcher making?
>>>
>>> Tim
>>>
>>>
>>> On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com>
>>> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> Thanks for the follow up.  It's not so much that I expect the executor
>>>> to inherit the configuration of the dispatcher as I* don't *expect the
>>>> dispatcher to make assumptions about the system environment of the executor
>>>> (since it lives in a docker).  I could potentially see a case where you
>>>> might want to explicitly forbid the defaults, but I can't think of any
>>>> right now.
>>>>
>>>> Otherwise, I'm confused as to why the defaults in the docker image for
>>>> the executor are just ignored.  I suppose that it's the dispatchers job to
>>>> ensure the *exact* configuration of the executor, regardless of the
>>>> defaults set on the executors machine?  Is that the assumption being made?
>>>> I can understand that in contexts which aren't docker driven since jobs
>>>> could be rolling out in the middle of a config update.  Trying to think of
>>>> this outside the terms of just mesos/docker (since I'm fully aware that
>>>> docker doesn't rule the world yet).
>>>>
>>>> So I can see this from both perspectives now and passing in the
>>>> properties file will probably work just fine for me, but for my better
>>>> understanding: When the executor starts, will it read any of the
>>>> environment that it's executing in or will it just take only the properties
>>>> given to it by the dispatcher and nothing more?
>>>>
>>>> Lemme know if anything needs more clarification and thanks for your
>>>> mesos contribution to spark!
>>>>
>>>> - Alan
>>>>
>>>> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Hi Alan,
>>>>>
>>>>> If I understand correctly, you are setting executor home when you
>>>>> launch the dispatcher and not on the configuration when you submit job, 
>>>>> and
>>>>> expect it to inherit that configuration?
>>>>>
>>>>> When I worked on the dispatcher I was assuming all configuration is
>>>>> passed to the dispatcher to launch the job exactly how you will need to
>>>>> launch it with client mode.
>>>>>
>>>>> But indeed it shouldn't crash dispatcher, I'll take a closer look when
>>>>> I get a chance.
>>>>>
>>>>> Can you recommend changes on the documentation, either in email or a
>>>>> PR?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Tim
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com>
>>>>> wrote:
>>>>>
>>>>> Hey All,
>>>>>
>>>>> To bump this thread once again, I'm having some trouble using the
>>>>> dispatcher as well.
>>>>>
>>>>> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed
>>>>> the dispatcher as Marathon job.  When I submit a job using spark submit,
>>>>> the dispatcher writes back that the submission was successful and then
>>>>> promptly dies in marathon.  Looking at the logs reveals it was hitting the
>>>>> following line:
>>>>>
>>>>> 398:          throw new SparkException("Executor Spark home
>>>>> `spark.mesos.executor.home` is not set!")
>>>>>
>>>>> Which is odd because it's set in multiple places (SPARK_HOME,
>>>>> spark.mesos.executor.home, spark.home, etc).  Reading the code, it
>>>>> appears that the driver desc pulls only from the request and disregards 
>>>>> any
>>>>> other properties that may be configured.  Testing by passing --conf
>>>>> spark.mesos.executor.home=/usr/local/spark on the command line to
>>>>> spark-submit confirms this.  We're trying to isolate the number of places
>>>>> where we have to set properties within spark and were hoping that it will
>>>>> be possible to have this pull in the spark-defaults.conf from somewhere, 
>>>>> or
>>>>> at least allow the user to inform the dispatcher through spark-submit that
>>>>> those properties will be available once the job starts.
>>>>>
>>>>> Finally, I don't think the dispatcher should crash in this event.  It
>>>>> seems not exceptional that a job is misconfigured when submitted.
>>>>>
>>>>> Please direct me on the right path if I'm headed in the wrong
>>>>> direction.  Also let me know if I should open some tickets for these 
>>>>> issues.
>>>>>
>>>>> Thanks,
>>>>> - Alan
>>>>>
>>>>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>>>>>
>>>>>> Yes you can create an issue, or actually contribute a patch to update
>>>>>> it :)
>>>>>>
>>>>>> Sorry the docs is a bit light, I'm going to make it more complete
>>>>>> along the way.
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
>>>>>> tomwa...@cisco.com> wrote:
>>>>>>
>>>>>>> Tim,
>>>>>>>
>>>>>>> Thank you for the explanation.  You are correct, my Mesos experience
>>>>>>> is very light, and I haven’t deployed anything via Marathon yet.  What 
>>>>>>> you
>>>>>>> have stated here makes sense, I will look into doing this.
>>>>>>>
>>>>>>> Adding this info to the docs would be great.  Is the appropriate
>>>>>>> action to create an issue regarding improvement of the docs?  For those 
>>>>>>> of
>>>>>>> us who are gaining the experience having such a pointer is very helpful.
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>> From: Tim Chen <t...@mesosphere.io>
>>>>>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>>>>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>>>>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>>>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>>>>>>
>>>>>>> Hi Tom,
>>>>>>>
>>>>>>> Sorry the documentation isn't really rich, since it's probably
>>>>>>> assuming users understands how Mesos and framework works.
>>>>>>>
>>>>>>> First I need explain the rationale of why create the dispatcher. If
>>>>>>> you're not familiar with Mesos yet, each node in your datacenter is
>>>>>>> installed a Mesos slave where it's responsible for publishing resources 
>>>>>>> and
>>>>>>> running/watching tasks, and Mesos master is responsible for taking the
>>>>>>> aggregated resources and scheduling them among frameworks.
>>>>>>>
>>>>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't
>>>>>>> launch and maintain framework but assume they're launched and kept 
>>>>>>> running
>>>>>>> on its own. All the existing frameworks in the ecosystem therefore all 
>>>>>>> have
>>>>>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, 
>>>>>>> etc).
>>>>>>>
>>>>>>> Therefore, to introduce cluster mode with Mesos, we must create a
>>>>>>> framework that is long running that can be running in your datacenter, 
>>>>>>> and
>>>>>>> can handle launching spark drivers on demand and handle HA, etc. This is
>>>>>>> what the dispatcher is all about.
>>>>>>>
>>>>>>> So the idea is that you should launch the dispatcher not on the
>>>>>>> client, but on a machine in your datacenter. In Mesosphere's DCOS we 
>>>>>>> launch
>>>>>>> all frameworks and long running services with Marathon, and you can use
>>>>>>> Marathon to launch the Spark dispatcher.
>>>>>>>
>>>>>>> Then all clients instead of specifying the Mesos master URL (e.g:
>>>>>>> mesos://mesos.master:2181), then just talks to the dispatcher only
>>>>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then 
>>>>>>> start
>>>>>>> and watch the driver for you.
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
>>>>>>> tomwa...@cisco.com> wrote:
>>>>>>>
>>>>>>>> After spending most of yesterday scouring the Internet for sources
>>>>>>>> of documentation for submitting Spark jobs in cluster mode to a Spark
>>>>>>>> cluster managed by Mesos I was able to do just that, but I am not 
>>>>>>>> convinced
>>>>>>>> that how I have things setup is correct.
>>>>>>>>
>>>>>>>> I used the Mesos published
>>>>>>>> <https://open.mesosphere.com/getting-started/datacenter/install/>
>>>>>>>> instructions for setting up my Mesos cluster.  I have three Zookeeper
>>>>>>>> instances, three Mesos master instances, and three Mesos slave 
>>>>>>>> instances.
>>>>>>>> This is all running in Openstack.
>>>>>>>>
>>>>>>>> The documentation on the Spark documentation site states that “To
>>>>>>>> use cluster mode, you must start the MesosClusterDispatcher in your 
>>>>>>>> cluster
>>>>>>>> via the sbin/start-mesos-dispatcher.sh script, passing in the
>>>>>>>> Mesos master url (e.g: mesos://host:5050).”  That is it, no more
>>>>>>>> information than that.  So that is what I did: I have one machine that 
>>>>>>>> I
>>>>>>>> use as the Spark client for submitting jobs.  I started the Mesos
>>>>>>>> dispatcher with script as described, and using the client machine’s IP
>>>>>>>> address and port as the target for the job submitted the job.
>>>>>>>>
>>>>>>>> The job is currently running in Mesos as expected.  This is not
>>>>>>>> however how I would have expected to configure the system.  As running
>>>>>>>> there is one instance of the Spark Mesos dispatcher running outside of
>>>>>>>> Mesos, so not a part of the sphere of Mesos resource management.
>>>>>>>>
>>>>>>>> I used the following Stack Overflow posts as guidelines:
>>>>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>>>>>>
>>>>>>>> There must be better documentation on how to deploy Spark in Mesos
>>>>>>>> with jobs able to be deployed in cluster mode.
>>>>>>>>
>>>>>>>> I can follow up with more specific information regarding my
>>>>>>>> deployment if necessary.
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to