Re: Mesos fine-grained multi-user mode failed to allocate tasks

Rahul Palamuttam Wed, 13 Jul 2016 17:12:06 -0700

Thanks David.
We will definitely take a look at Cook.

I am curious by what you mean by true multi-tenancy.


Under coarse-grained mode with dynamic allocation enabled - what I see in
the mesos UI is that there are 3 tasks running by default (one on each of
the nodes nodes we have).
I also see the coarsegrainedexecutors being brought up.

*Another point is that I always see a spark-submit command being launched
even if I kill that command it comes back up and the exectors get
reallocated on the worker nodes.
However, I am able to launch multiple spark shells and have jobs run
concurrently - which we were very happy with.
Unfortunately, I don't understand why mesos only shows 3 tasks running. I
even see the spike in thread count when launching my jobs, but the task
count remains unchanged.
The mesos logs does show jobs coming in.
The three tasks just sit there in the webui - running.

Is this what is expected?
Does running coarsegrained with dynamic allocation make mesos look at each
running executor as a different task?




On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg <[email protected]>
wrote:

> You could also check out Cook from twosigma. It's open source on github,
> and offers true preemptive multitenancy with spark on Mesos, by
> intermediating the spark drivers to optimize the cluster overall.
> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam <[email protected]>
> wrote:
>
>> Thank you Joseph.
>>
>> We'll try to explore coarse grained mode with dynamic allocation.
>>
>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <[email protected]> wrote:
>>
>>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>>
>>> (The Spark website appears to be down right now, so here's the doc on
>>> Github:)
>>>
>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>>
>>> Note that while Spark tasks in fine-grained will relinquish cores as
>>>> they terminate, they will not relinquish memory, as the JVM does not give
>>>> memory back to the Operating System. Neither will executors terminate when
>>>> they're idle.
>>>
>>>
>>> You can follow some of the recommendations Spark has in that document
>>> for sharing resources, when using Mesos.
>>>
>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Our team has been tackling multi-tenancy related issues with Mesos for
>>>> quite some time.
>>>>
>>>> The problem is that tasks aren't being allocated properly when multiple
>>>> applications are trying to launch a job. If we launch application A, and
>>>> soon after application B, application B waits pretty much till the
>>>> completion of application A for tasks to even be staged in Mesos. Right now
>>>> these applications are the spark-shell or the zeppelin interpreter.
>>>>
>>>> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in two
>>>> different spark-shells results in the issue we're observing. One of the
>>>> counts waits (in fact we don't even see the tasks being staged in mesos)
>>>> until the current one finishes. This is the biggest issue we have been
>>>> experience and any help or advice would be greatly appreciated. We want to
>>>> be able to launch multiple jobs concurrently on our cluster and share
>>>> resources appropriately.
>>>>
>>>> Another issue we see is that the java heap-space on the mesos executor
>>>> backend process is not being cleaned up once a job has finished in the
>>>> spark shell.
>>>> I've attached a png file of the jvisualvm output showing that the
>>>> heapspace is still allocated on a worker node. If I force the GC from
>>>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>>>> because the spark-shell is still active - but if we've waited long enough
>>>> why doesn't GC just clean up the space? However, even after forcing GC the
>>>> mesos UI shows us that these resources are still being used.
>>>> There should be a way to bring down the memory utilization of the
>>>> executors once a task is finished. It shouldn't continue to have that
>>>> memory allocated, even if a spark-shell is active on the driver.
>>>>
>>>> We have mesos configured to use fine-grained mode.
>>>> The following are parameters we have set in our spark-defaults.conf
>>>> file.
>>>>
>>>>
>>>> spark.eventLog.enabled           true
>>>> spark.eventLog.dir               hdfs://frontend-system:8090/directory
>>>> <http://scispark1.jpl.nasa.gov:8090/directory>
>>>> spark.local.dir                    /data/cluster-local/SPARK_TMP
>>>>
>>>> spark.executor.memory            50g
>>>>
>>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>>>> spark.executor.uri      hdfs://frontend-system
>>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>>>> <http://scispark1.jpl.nasa.gov:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz>
>>>> spark.mesos.coarse      false
>>>>
>>>> Please let me know if there are any questions about our configuration.
>>>> Any advice or experience the mesos community can share pertaining to
>>>> issues with fine-grained mode would be greatly appreciated!
>>>>
>>>> I would also like to sincerely apologize for my previous test message
>>>> on the mailing list.
>>>> It was an ill-conceived idea since we are in a bit of a time crunch and
>>>> I needed to get this message posted. I forgot I needed to send reply on to
>>>> the user-subscribers email for me to be listed, resulting in message not
>>>> sent emails. I will not do that again.
>>>>
>>>> Thanks,
>>>>
>>>> Rahul Palamuttam
>>>>
>>>
>>>
>>

Re: Mesos fine-grained multi-user mode failed to allocate tasks

Reply via email to