Thank you Joseph. We'll try to explore coarse grained mode with dynamic allocation.
On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <[email protected]> wrote: > Looks like you're running Spark in "fine-grained" mode (deprecated). > > (The Spark website appears to be down right now, so here's the doc on > Github:) > > https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated > > Note that while Spark tasks in fine-grained will relinquish cores as they >> terminate, they will not relinquish memory, as the JVM does not give memory >> back to the Operating System. Neither will executors terminate when they're >> idle. > > > You can follow some of the recommendations Spark has in that document for > sharing resources, when using Mesos. > > On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <[email protected] > > wrote: > >> Hi, >> >> Our team has been tackling multi-tenancy related issues with Mesos for >> quite some time. >> >> The problem is that tasks aren't being allocated properly when multiple >> applications are trying to launch a job. If we launch application A, and >> soon after application B, application B waits pretty much till the >> completion of application A for tasks to even be staged in Mesos. Right now >> these applications are the spark-shell or the zeppelin interpreter. >> >> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in two >> different spark-shells results in the issue we're observing. One of the >> counts waits (in fact we don't even see the tasks being staged in mesos) >> until the current one finishes. This is the biggest issue we have been >> experience and any help or advice would be greatly appreciated. We want to >> be able to launch multiple jobs concurrently on our cluster and share >> resources appropriately. >> >> Another issue we see is that the java heap-space on the mesos executor >> backend process is not being cleaned up once a job has finished in the >> spark shell. >> I've attached a png file of the jvisualvm output showing that the >> heapspace is still allocated on a worker node. If I force the GC from >> jvisualvm then nearly all of that memory gets cleaned up. This may be >> because the spark-shell is still active - but if we've waited long enough >> why doesn't GC just clean up the space? However, even after forcing GC the >> mesos UI shows us that these resources are still being used. >> There should be a way to bring down the memory utilization of the >> executors once a task is finished. It shouldn't continue to have that >> memory allocated, even if a spark-shell is active on the driver. >> >> We have mesos configured to use fine-grained mode. >> The following are parameters we have set in our spark-defaults.conf file. >> >> >> spark.eventLog.enabled true >> spark.eventLog.dir hdfs://frontend-system:8090/directory >> <http://scispark1.jpl.nasa.gov:8090/directory> >> spark.local.dir /data/cluster-local/SPARK_TMP >> >> spark.executor.memory 50g >> >> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP >> spark.executor.extraJavaOptions -XX:MaxTenuringThreshold=0 >> spark.executor.uri hdfs://frontend-system >> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz >> <http://scispark1.jpl.nasa.gov:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz> >> spark.mesos.coarse false >> >> Please let me know if there are any questions about our configuration. >> Any advice or experience the mesos community can share pertaining to >> issues with fine-grained mode would be greatly appreciated! >> >> I would also like to sincerely apologize for my previous test message on >> the mailing list. >> It was an ill-conceived idea since we are in a bit of a time crunch and I >> needed to get this message posted. I forgot I needed to send reply on to >> the user-subscribers email for me to be listed, resulting in message not >> sent emails. I will not do that again. >> >> Thanks, >> >> Rahul Palamuttam >> > >

