Re: Mesos fine-grained multi-user mode failed to allocate tasks

David Greenberg Fri, 15 Jul 2016 05:46:57 -0700

Cook launches shells with around one minute of latency. I believe there is
a project to reduce that to seconds.
On Thu, Jul 14, 2016 at 10:50 PM Rahul Palamuttam <[email protected]>
wrote:


> Hallelujah!
>
> We'll definitely take a look at cook.
> Right now we're observing in both fine grained and coarse grained jobs
> take quite a bit of time to even be staged by mesos.
>
> We're sitting there waiting on the interpreter/shell for quite a few
> minutes.
>
> On Jul 14, 2016, at 7:49 PM, David Greenberg <[email protected]>
> wrote:
>
> By true multitenancy, I mean preemption, so that if a new user connects to
> the cluster, their capacity is actually reclaimed and reallocated in
> minutes or seconds instead of hours.
> On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam <[email protected]>
> wrote:
>
>> Thanks David.
>> We will definitely take a look at Cook.
>>
>> I am curious by what you mean by true multi-tenancy.
>>
>> Under coarse-grained mode with dynamic allocation enabled - what I see in
>> the mesos UI is that there are 3 tasks running by default (one on each of
>> the nodes nodes we have).
>> I also see the coarsegrainedexecutors being brought up.
>>
>> *Another point is that I always see a spark-submit command being launched
>> even if I kill that command it comes back up and the exectors get
>> reallocated on the worker nodes.
>> However, I am able to launch multiple spark shells and have jobs run
>> concurrently - which we were very happy with.
>> Unfortunately, I don't understand why mesos only shows 3 tasks running. I
>> even see the spike in thread count when launching my jobs, but the task
>> count remains unchanged.
>> The mesos logs does show jobs coming in.
>> The three tasks just sit there in the webui - running.
>>
>> Is this what is expected?
>> Does running coarsegrained with dynamic allocation make mesos look at
>> each running executor as a different task?
>>
>>
>>
>>
>> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg <[email protected]>
>> wrote:
>>
>>> You could also check out Cook from twosigma. It's open source on github,
>>> and offers true preemptive multitenancy with spark on Mesos, by
>>> intermediating the spark drivers to optimize the cluster overall.
>>> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam <[email protected]>
>>> wrote:
>>>
>>>> Thank you Joseph.
>>>>
>>>> We'll try to explore coarse grained mode with dynamic allocation.
>>>>
>>>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <[email protected]>
>>>> wrote:
>>>>
>>>>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>>>>
>>>>> (The Spark website appears to be down right now, so here's the doc on
>>>>> Github:)
>>>>>
>>>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated
>>>>>
>>>>> Note that while Spark tasks in fine-grained will relinquish cores as
>>>>>> they terminate, they will not relinquish memory, as the JVM does not give
>>>>>> memory back to the Operating System. Neither will executors terminate 
>>>>>> when
>>>>>> they're idle.
>>>>>
>>>>>
>>>>> You can follow some of the recommendations Spark has in that document
>>>>> for sharing resources, when using Mesos.
>>>>>
>>>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Our team has been tackling multi-tenancy related issues with Mesos
>>>>>> for quite some time.
>>>>>>
>>>>>> The problem is that tasks aren't being allocated properly when
>>>>>> multiple applications are trying to launch a job. If we launch 
>>>>>> application
>>>>>> A, and soon after application B, application B waits pretty much till the
>>>>>> completion of application A for tasks to even be staged in Mesos. Right 
>>>>>> now
>>>>>> these applications are the spark-shell or the zeppelin interpreter.
>>>>>>
>>>>>> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in two
>>>>>> different spark-shells results in the issue we're observing. One of the
>>>>>> counts waits (in fact we don't even see the tasks being staged in mesos)
>>>>>> until the current one finishes. This is the biggest issue we have been
>>>>>> experience and any help or advice would be greatly appreciated. We want 
>>>>>> to
>>>>>> be able to launch multiple jobs concurrently on our cluster and share
>>>>>> resources appropriately.
>>>>>>
>>>>>> Another issue we see is that the java heap-space on the mesos
>>>>>> executor backend process is not being cleaned up once a job has finished 
>>>>>> in
>>>>>> the spark shell.
>>>>>> I've attached a png file of the jvisualvm output showing that the
>>>>>> heapspace is still allocated on a worker node. If I force the GC from
>>>>>> jvisualvm then nearly all of that memory gets cleaned up. This may be
>>>>>> because the spark-shell is still active - but if we've waited long enough
>>>>>> why doesn't GC just clean up the space? However, even after forcing GC 
>>>>>> the
>>>>>> mesos UI shows us that these resources are still being used.
>>>>>> There should be a way to bring down the memory utilization of the
>>>>>> executors once a task is finished. It shouldn't continue to have that
>>>>>> memory allocated, even if a spark-shell is active on the driver.
>>>>>>
>>>>>> We have mesos configured to use fine-grained mode.
>>>>>> The following are parameters we have set in our spark-defaults.conf
>>>>>> file.
>>>>>>
>>>>>>
>>>>>> spark.eventLog.enabled           true
>>>>>> spark.eventLog.dir               hdfs://frontend-system
>>>>>> :8090/directory <http://scispark1.jpl.nasa.gov:8090/directory>
>>>>>> spark.local.dir                    /data/cluster-local/SPARK_TMP
>>>>>>
>>>>>> spark.executor.memory            50g
>>>>>>
>>>>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>>>>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0
>>>>>> spark.executor.uri      hdfs://frontend-system
>>>>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>>>>>> <http://scispark1.jpl.nasa.gov:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz>
>>>>>> spark.mesos.coarse      false
>>>>>>
>>>>>> Please let me know if there are any questions about our configuration.
>>>>>> Any advice or experience the mesos community can share pertaining to
>>>>>> issues with fine-grained mode would be greatly appreciated!
>>>>>>
>>>>>> I would also like to sincerely apologize for my previous test message
>>>>>> on the mailing list.
>>>>>> It was an ill-conceived idea since we are in a bit of a time crunch
>>>>>> and I needed to get this message posted. I forgot I needed to send reply 
>>>>>> on
>>>>>> to the user-subscribers email for me to be listed, resulting in message 
>>>>>> not
>>>>>> sent emails. I will not do that again.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Rahul Palamuttam
>>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Mesos fine-grained multi-user mode failed to allocate tasks

Reply via email to