Re: Mesos fine-grained multi-user mode failed to allocate tasks

David Greenberg Thu, 14 Jul 2016 19:50:29 -0700

By true multitenancy, I mean preemption, so that if a new user connects to
the cluster, their capacity is actually reclaimed and reallocated in
minutes or seconds instead of hours.
On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam <[email protected]>
wrote:


> Thanks David.
> We will definitely take a look at Cook.
>
> I am curious by what you mean by true multi-tenancy.
>
> Under coarse-grained mode with dynamic allocation enabled - what I see in
> the mesos UI is that there are 3 tasks running by default (one on each of
> the nodes nodes we have).
> I also see the coarsegrainedexecutors being brought up.
>
> *Another point is that I always see a spark-submit command being launched
> even if I kill that command it comes back up and the exectors get
> reallocated on the worker nodes.
> However, I am able to launch multiple spark shells and have jobs run
> concurrently - which we were very happy with.
> Unfortunately, I don't understand why mesos only shows 3 tasks running. I
> even see the spike in thread count when launching my jobs, but the task
> count remains unchanged.
> The mesos logs does show jobs coming in.
> The three tasks just sit there in the webui - running.
>
> Is this what is expected?
> Does running coarsegrained with dynamic allocation make mesos look at each
> running executor as a different task?
>
>
>
>
> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg <[email protected]>
> wrote:
>
>> You could also check out Cook from twosigma. It's open source on github,
>> and offers true preemptive multitenancy with spark on Mesos, by
>> intermediating the spark drivers to optimize the cluster overall.
>> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam <[email protected]>
>> wrote:
>>
>>> Thank you Joseph.
>>>
>>> We'll try to explore coarse grained mode with dynamic allocation.
>>>
>>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <[email protected]>
>>> wrote:
>>>
>>>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>>>
>>>> (The Spark website appears to be down right now, so here's the doc on
>>>> Github:)
>>>>
>>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>>>
>>>> Note that while Spark tasks in fine-grained will relinquish cores as
>>>>> they terminate, they will not relinquish memory, as the JVM does not give
>>>>> memory back to the Operating System. Neither will executors terminate when
>>>>> they're idle.
>>>>
>>>>
>>>> You can follow some of the recommendations Spark has in that document
>>>> for sharing resources, when using Mesos.
>>>>
>>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Our team has been tackling multi-tenancy related issues with Mesos for
>>>>> quite some time.
>>>>>
>>>>> The problem is that tasks aren't being allocated properly when
>>>>> multiple applications are trying to launch a job. If we launch application
>>>>> A, and soon after application B, application B waits pretty much till the
>>>>> completion of application A for tasks to even be staged in Mesos. Right 
>>>>> now
>>>>> these applications are the spark-shell or the zeppelin interpreter.
>>>>>
>>>>> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in two
>>>>> different spark-shells results in the issue we're observing. One of the
>>>>> counts waits (in fact we don't even see the tasks being staged in mesos)
>>>>> until the current one finishes. This is the biggest issue we have been
>>>>> experience and any help or advice would be greatly appreciated. We want to
>>>>> be able to launch multiple jobs concurrently on our cluster and share
>>>>> resources appropriately.
>>>>>
>>>>> Another issue we see is that the java heap-space on the mesos executor
>>>>> backend process is not being cleaned up once a job has finished in the
>>>>> spark shell.
>>>>> I've attached a png file of the jvisualvm output showing that the
>>>>> heapspace is still allocated on a worker node. If I force the GC from
>>>>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>>>>> because the spark-shell is still active - but if we've waited long enough
>>>>> why doesn't GC just clean up the space? However, even after forcing GC the
>>>>> mesos UI shows us that these resources are still being used.
>>>>> There should be a way to bring down the memory utilization of the
>>>>> executors once a task is finished. It shouldn't continue to have that
>>>>> memory allocated, even if a spark-shell is active on the driver.
>>>>>
>>>>> We have mesos configured to use fine-grained mode.
>>>>> The following are parameters we have set in our spark-defaults.conf
>>>>> file.
>>>>>
>>>>>
>>>>> spark.eventLog.enabled           true
>>>>> spark.eventLog.dir               hdfs://frontend-system:8090/directory
>>>>> <http://scispark1.jpl.nasa.gov:8090/directory>
>>>>> spark.local.dir                    /data/cluster-local/SPARK_TMP
>>>>>
>>>>> spark.executor.memory            50g
>>>>>
>>>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>>>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>>>>> spark.executor.uri      hdfs://frontend-system
>>>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>>>>> <http://scispark1.jpl.nasa.gov:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz>
>>>>> spark.mesos.coarse      false
>>>>>
>>>>> Please let me know if there are any questions about our configuration.
>>>>> Any advice or experience the mesos community can share pertaining to
>>>>> issues with fine-grained mode would be greatly appreciated!
>>>>>
>>>>> I would also like to sincerely apologize for my previous test message
>>>>> on the mailing list.
>>>>> It was an ill-conceived idea since we are in a bit of a time crunch
>>>>> and I needed to get this message posted. I forgot I needed to send reply 
>>>>> on
>>>>> to the user-subscribers email for me to be listed, resulting in message 
>>>>> not
>>>>> sent emails. I will not do that again.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rahul Palamuttam
>>>>>
>>>>
>>>>
>>>
>

Re: Mesos fine-grained multi-user mode failed to allocate tasks

Reply via email to