Awesome! This is exactly what I'd need. Unfortunately, I am not a programmer of any talent or skill, but how could I assist with this JIRA? From a User perspective, this is really the next step for my org taking our Mesos cluster to user land with Spark. I don't want to be pushy, but is there any sort of time frame I could possibly communicate to my team? Anything I can do?
Thanks! On Fri, Feb 20, 2015 at 4:36 AM, Iulian DragoČ™ <iulian.dra...@typesafe.com> wrote: > > > On Thu, Feb 19, 2015 at 2:49 PM, John Omernik <j...@omernik.com> wrote: >> >> I am running Spark on Mesos and it works quite well. I have three >> users, all who setup iPython notebooks to instantiate a spark instance >> to work with on the notebooks. I love it so far. >> >> Since I am "auto" instantiating (I don't want a user to have to >> "think" about instantiating and submitting a spark app to do adhoc >> analysis, I want the environment setup ahead of time) this is done >> whenever an iPython notebook is open. So far it's working pretty >> good, save one issue: >> >> Every notebook is a new driver. I.e. every time they open a notebook, >> a new spark submit is called, and the driver resources are allocated, >> regardless if they are used or not. Yes, it's only the driver, but >> even that I find starts slowing down my queries for the notebooks that >> using spark. (I am running in Mesos Fined Grained mode). >> >> >> I have three users on my system, ideally, I would love to find a way >> so that on the first notebook being opened, a driver is started for >> that user, and then can be used for any notebook the user has open. So >> if they open a new notebook, I can check that yes, the user has a >> spark driver running, and thus, that notebook, if there is a query, >> will run it through that driver. That allows me to understand the >> resource allocation better, and it limits users from running 10 >> notebooks and having a lot of resources. >> >> The other thing I was wondering is could the driver actually be run on >> the mesos cluster? Right now, I have a "edge" node as an iPython >> server, the drivers all exist on that server, so as I get more and >> more drivers, the box's local resources get depleted with unused >> drivers. Obviously if I could reuse the drivers per user, on that >> box, that is great first step, but if I could reuse drivers, and run >> them on the cluster, that would be ideal. looking through the docs I >> was not clear on those options. If anyone could point me in the right >> direction, I would greatly appreciate it! > > > Cluster mode support for Spark is tracked under > [SPARK-5338](https://issues.apache.org/jira/browse/SPARK-5338). I know Tim > Chen is working on it, so there will be progress soon. > > iulian > >> >> >> John >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > > > -- > > -- > Iulian Dragos > > ------ > Reactive Apps on the JVM > www.typesafe.com > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org