So in my instance, instead having a bunch of drivers on one machine, at least each of the drivers would be out in cluster land... That's a bit better, however I see your point on not sharing drivers between apps, going to have to think that one through. Are there no cases where having a single driver supporting requests for a group apps makes sense or am I missing something there? It seems like a logical way to put some limitations on groups of apps, but I may be missing something in how it's designed to be run.
On Fri, Feb 20, 2015 at 10:22 AM, Tim Chen <[email protected]> wrote: > Hi John, > > I'm currently working on a cluster mode design a PoC, but it is also not > sharing drivers as Spark AFAIK is designed to not share drivers between > apps. > > The cluster mode for Mesos is going to be a way to submit apps to your > cluster, and each app will be running in the cluster as a new driver that is > managed by a cluster dispatcher, and you don't need to wait for the client > to finish to get all the results. > > I'll be updating the JIRA and PR once I have this ready, which is aimed for > this next release. > > Tim > > On Fri, Feb 20, 2015 at 8:09 AM, John Omernik <[email protected]> wrote: >> >> Tim - on the Spark list your name was brought up in relation to >> https://issues.apache.org/jira/browse/SPARK-5338 I asked this question >> there but I'll ask it here too, what can I do to help on this. I am >> not a coder unfortunately, but I am user willing to try things :) This >> looks really cool for what we would like to do with Spark and Mesos >> and I'd love to be able to contribute and/or get an understanding of a >> (even tentative) timeline. I am not trying to be pushy, I understand >> lots of things are likely on your agenda :) >> >> John >> >> >> >> On Tue, Feb 17, 2015 at 6:33 AM, John Omernik <[email protected]> wrote: >> > Tim, thanks, that makes sense, the checking for ports and incrementing >> > was new to me, so hearing about that helps. Next question.... is it >> > possible, for a driver to be shared by the same user some how? This >> > would be desirable from the standpoint of running an iPython notebook >> > server (Jupyter Hub). I have it setup that every time a notebook is >> > opened, that the imports for spark are run, (the idea is the >> > environment is ready to go for analysis) however, if each user, has 5 >> > notebooks open at any time, that would be a lot of spark drivers! But, >> > I suppose before asking that, I should ask about the sequence of >> > drivers... are they serial? i.e. can one driver server only one query >> > at a time? What is the optimal size for a driver (in memory) what >> > does the memory affect in the driver? I.e. is a driver with smaller >> > amounts of memory limited in the number of results etc? >> > >> > Lots of questions here, if these are more spark related questions, let >> > me know, I can hop over to spark users, but since I am curious on >> > spark on mesos, I figured I'd try here first. >> > >> > Thanks for your help! >> > >> > >> > >> > On Mon, Feb 16, 2015 at 10:30 AM, Tim Chen <[email protected]> wrote: >> >> Hi John, >> >> >> >> With Spark on Mesos, each client (spark-submit) starts a SparkContext >> >> which >> >> initializes its own SparkUI and framework. There is a default 4040 for >> >> the >> >> Spark UI port, but if it's occupied Spark automatically tries ports >> >> incrementally for you, so your next could be 4041 if it's available. >> >> >> >> Driver is not shared between user, each user creates its own driver. >> >> >> >> About slowness it's hard to say without any information, you need to >> >> tell us >> >> your cluster setup, what mode you're Mesos with and if there is >> >> anything >> >> else running in the cluster, the job, etc. >> >> >> >> Tim >> >> >> >> On Sat, Feb 14, 2015 at 5:06 PM, John Omernik <[email protected]> wrote: >> >>> >> >>> Hello all, I am running Spark on Mesos and I think I am love, but I >> >>> have some questions. I am running the python shell via iPython >> >>> Notebooks (Jupyter) and it works great, but I am trying to figure out >> >>> how things are actually submitted... like for example, when I submit >> >>> the spark app from the iPython notebook server, I am opening a new >> >>> kernel and I see a new spark submit (similar to the below) for each >> >>> kernel... but, how is that actually working on the cluster, I can >> >>> connect to the spark server UI on 4040, but shouldn't there be a >> >>> different one for each driver? Is that causing conflicts? after a >> >>> while things seem to run slow is this due to some weird conflicts? >> >>> Should I be specifying unique ports for each server? Is the driver >> >>> shared between users? what about between kerne's for the same user? >> >>> Curious if anyone has any insight. >> >>> >> >>> Thanks! >> >>> >> >>> >> >>> java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --master >> >>> mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M >> >>> pyspark-shell >> >> >> >> > >

