Cook launches shells with around one minute of latency. I believe there is a project to reduce that to seconds. On Thu, Jul 14, 2016 at 10:50 PM Rahul Palamuttam <[email protected]> wrote:
> Hallelujah! > > We'll definitely take a look at cook. > Right now we're observing in both fine grained and coarse grained jobs > take quite a bit of time to even be staged by mesos. > > We're sitting there waiting on the interpreter/shell for quite a few > minutes. > > On Jul 14, 2016, at 7:49 PM, David Greenberg <[email protected]> > wrote: > > By true multitenancy, I mean preemption, so that if a new user connects to > the cluster, their capacity is actually reclaimed and reallocated in > minutes or seconds instead of hours. > On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam <[email protected]> > wrote: > >> Thanks David. >> We will definitely take a look at Cook. >> >> I am curious by what you mean by true multi-tenancy. >> >> Under coarse-grained mode with dynamic allocation enabled - what I see in >> the mesos UI is that there are 3 tasks running by default (one on each of >> the nodes nodes we have). >> I also see the coarsegrainedexecutors being brought up. >> >> *Another point is that I always see a spark-submit command being launched >> even if I kill that command it comes back up and the exectors get >> reallocated on the worker nodes. >> However, I am able to launch multiple spark shells and have jobs run >> concurrently - which we were very happy with. >> Unfortunately, I don't understand why mesos only shows 3 tasks running. I >> even see the spike in thread count when launching my jobs, but the task >> count remains unchanged. >> The mesos logs does show jobs coming in. >> The three tasks just sit there in the webui - running. >> >> Is this what is expected? >> Does running coarsegrained with dynamic allocation make mesos look at >> each running executor as a different task? >> >> >> >> >> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg <[email protected]> >> wrote: >> >>> You could also check out Cook from twosigma. It's open source on github, >>> and offers true preemptive multitenancy with spark on Mesos, by >>> intermediating the spark drivers to optimize the cluster overall. >>> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam <[email protected]> >>> wrote: >>> >>>> Thank you Joseph. >>>> >>>> We'll try to explore coarse grained mode with dynamic allocation. >>>> >>>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <[email protected]> >>>> wrote: >>>> >>>>> Looks like you're running Spark in "fine-grained" mode (deprecated). >>>>> >>>>> (The Spark website appears to be down right now, so here's the doc on >>>>> Github:) >>>>> >>>>> https://github.com/apache/spark/blob/master/docs/running-on-mesos.md#fine-grained-deprecated >>>>> >>>>> Note that while Spark tasks in fine-grained will relinquish cores as >>>>>> they terminate, they will not relinquish memory, as the JVM does not give >>>>>> memory back to the Operating System. Neither will executors terminate >>>>>> when >>>>>> they're idle. >>>>> >>>>> >>>>> You can follow some of the recommendations Spark has in that document >>>>> for sharing resources, when using Mesos. >>>>> >>>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Our team has been tackling multi-tenancy related issues with Mesos >>>>>> for quite some time. >>>>>> >>>>>> The problem is that tasks aren't being allocated properly when >>>>>> multiple applications are trying to launch a job. If we launch >>>>>> application >>>>>> A, and soon after application B, application B waits pretty much till the >>>>>> completion of application A for tasks to even be staged in Mesos. Right >>>>>> now >>>>>> these applications are the spark-shell or the zeppelin interpreter. >>>>>> >>>>>> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in two >>>>>> different spark-shells results in the issue we're observing. One of the >>>>>> counts waits (in fact we don't even see the tasks being staged in mesos) >>>>>> until the current one finishes. This is the biggest issue we have been >>>>>> experience and any help or advice would be greatly appreciated. We want >>>>>> to >>>>>> be able to launch multiple jobs concurrently on our cluster and share >>>>>> resources appropriately. >>>>>> >>>>>> Another issue we see is that the java heap-space on the mesos >>>>>> executor backend process is not being cleaned up once a job has finished >>>>>> in >>>>>> the spark shell. >>>>>> I've attached a png file of the jvisualvm output showing that the >>>>>> heapspace is still allocated on a worker node. If I force the GC from >>>>>> jvisualvm then nearly all of that memory gets cleaned up. This may be >>>>>> because the spark-shell is still active - but if we've waited long enough >>>>>> why doesn't GC just clean up the space? However, even after forcing GC >>>>>> the >>>>>> mesos UI shows us that these resources are still being used. >>>>>> There should be a way to bring down the memory utilization of the >>>>>> executors once a task is finished. It shouldn't continue to have that >>>>>> memory allocated, even if a spark-shell is active on the driver. >>>>>> >>>>>> We have mesos configured to use fine-grained mode. >>>>>> The following are parameters we have set in our spark-defaults.conf >>>>>> file. >>>>>> >>>>>> >>>>>> spark.eventLog.enabled true >>>>>> spark.eventLog.dir hdfs://frontend-system >>>>>> :8090/directory <http://scispark1.jpl.nasa.gov:8090/directory> >>>>>> spark.local.dir /data/cluster-local/SPARK_TMP >>>>>> >>>>>> spark.executor.memory 50g >>>>>> >>>>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP >>>>>> spark.executor.extraJavaOptions -XX:MaxTenuringThreshold=0 >>>>>> spark.executor.uri hdfs://frontend-system >>>>>> :8090/spark/spark-1.6.0-bin-hadoop2.4.tgz >>>>>> <http://scispark1.jpl.nasa.gov:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz> >>>>>> spark.mesos.coarse false >>>>>> >>>>>> Please let me know if there are any questions about our configuration. >>>>>> Any advice or experience the mesos community can share pertaining to >>>>>> issues with fine-grained mode would be greatly appreciated! >>>>>> >>>>>> I would also like to sincerely apologize for my previous test message >>>>>> on the mailing list. >>>>>> It was an ill-conceived idea since we are in a bit of a time crunch >>>>>> and I needed to get this message posted. I forgot I needed to send reply >>>>>> on >>>>>> to the user-subscribers email for me to be listed, resulting in message >>>>>> not >>>>>> sent emails. I will not do that again. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Rahul Palamuttam >>>>>> >>>>> >>>>> >>>> >>

