By true multitenancy, I mean preemption, so that if a new user connects to the cluster, their capacity is actually reclaimed and reallocated in minutes or seconds instead of hours. On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam <[email protected]> wrote:
> Thanks David. > We will definitely take a look at Cook. > > I am curious by what you mean by true multi-tenancy. > > Under coarse-grained mode with dynamic allocation enabled - what I see in > the mesos UI is that there are 3 tasks running by default (one on each of > the nodes nodes we have). > I also see the coarsegrainedexecutors being brought up. > > *Another point is that I always see a spark-submit command being launched > even if I kill that command it comes back up and the exectors get > reallocated on the worker nodes. > However, I am able to launch multiple spark shells and have jobs run > concurrently - which we were very happy with. > Unfortunately, I don't understand why mesos only shows 3 tasks running. I > even see the spike in thread count when launching my jobs, but the task > count remains unchanged. > The mesos logs does show jobs coming in. > The three tasks just sit there in the webui - running. > > Is this what is expected? > Does running coarsegrained with dynamic allocation make mesos look at each > running executor as a different task? > > > > > On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg <[email protected]> > wrote: > >> You could also check out Cook from twosigma. It's open source on github, >> and offers true preemptive multitenancy with spark on Mesos, by >> intermediating the spark drivers to optimize the cluster overall. >> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam <[email protected]> >> wrote: >> >>> Thank you Joseph. >>> >>> We'll try to explore coarse grained mode with dynamic allocation. >>> >>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <[email protected]> >>> wrote: >>> >>>> Looks like you're running Spark in "fine-grained" mode (deprecated). >>>> >>>> (The Spark website appears to be down right now, so here's the doc on >>>> Github:) >>>> >>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated >>>> >>>> Note that while Spark tasks in fine-grained will relinquish cores as >>>>> they terminate, they will not relinquish memory, as the JVM does not give >>>>> memory back to the Operating System. Neither will executors terminate when >>>>> they're idle. >>>> >>>> >>>> You can follow some of the recommendations Spark has in that document >>>> for sharing resources, when using Mesos. >>>> >>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Our team has been tackling multi-tenancy related issues with Mesos for >>>>> quite some time. >>>>> >>>>> The problem is that tasks aren't being allocated properly when >>>>> multiple applications are trying to launch a job. If we launch application >>>>> A, and soon after application B, application B waits pretty much till the >>>>> completion of application A for tasks to even be staged in Mesos. Right >>>>> now >>>>> these applications are the spark-shell or the zeppelin interpreter. >>>>> >>>>> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in two >>>>> different spark-shells results in the issue we're observing. One of the >>>>> counts waits (in fact we don't even see the tasks being staged in mesos) >>>>> until the current one finishes. This is the biggest issue we have been >>>>> experience and any help or advice would be greatly appreciated. We want to >>>>> be able to launch multiple jobs concurrently on our cluster and share >>>>> resources appropriately. >>>>> >>>>> Another issue we see is that the java heap-space on the mesos executor >>>>> backend process is not being cleaned up once a job has finished in the >>>>> spark shell. >>>>> I've attached a png file of the jvisualvm output showing that the >>>>> heapspace is still allocated on a worker node. If I force the GC from >>>>> jvisualvm then nearly all of that memory gets cleaned up. This may be >>>>> because the spark-shell is still active - but if we've waited long enough >>>>> why doesn't GC just clean up the space? However, even after forcing GC the >>>>> mesos UI shows us that these resources are still being used. >>>>> There should be a way to bring down the memory utilization of the >>>>> executors once a task is finished. It shouldn't continue to have that >>>>> memory allocated, even if a spark-shell is active on the driver. >>>>> >>>>> We have mesos configured to use fine-grained mode. >>>>> The following are parameters we have set in our spark-defaults.conf >>>>> file. >>>>> >>>>> >>>>> spark.eventLog.enabled true >>>>> spark.eventLog.dir hdfs://frontend-system:8090/directory >>>>> <http://scispark1.jpl.nasa.gov:8090/directory> >>>>> spark.local.dir /data/cluster-local/SPARK_TMP >>>>> >>>>> spark.executor.memory 50g >>>>> >>>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP >>>>> spark.executor.extraJavaOptions -XX:MaxTenuringThreshold=0 >>>>> spark.executor.uri hdfs://frontend-system >>>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz >>>>> <http://scispark1.jpl.nasa.gov:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz> >>>>> spark.mesos.coarse false >>>>> >>>>> Please let me know if there are any questions about our configuration. >>>>> Any advice or experience the mesos community can share pertaining to >>>>> issues with fine-grained mode would be greatly appreciated! >>>>> >>>>> I would also like to sincerely apologize for my previous test message >>>>> on the mailing list. >>>>> It was an ill-conceived idea since we are in a bit of a time crunch >>>>> and I needed to get this message posted. I forgot I needed to send reply >>>>> on >>>>> to the user-subscribers email for me to be listed, resulting in message >>>>> not >>>>> sent emails. I will not do that again. >>>>> >>>>> Thanks, >>>>> >>>>> Rahul Palamuttam >>>>> >>>> >>>> >>> >

