Tim, thanks, that makes sense, the checking for ports and incrementing was new to me, so hearing about that helps. Next question.... is it possible, for a driver to be shared by the same user some how? This would be desirable from the standpoint of running an iPython notebook server (Jupyter Hub). I have it setup that every time a notebook is opened, that the imports for spark are run, (the idea is the environment is ready to go for analysis) however, if each user, has 5 notebooks open at any time, that would be a lot of spark drivers! But, I suppose before asking that, I should ask about the sequence of drivers... are they serial? i.e. can one driver server only one query at a time? What is the optimal size for a driver (in memory) what does the memory affect in the driver? I.e. is a driver with smaller amounts of memory limited in the number of results etc?
Lots of questions here, if these are more spark related questions, let me know, I can hop over to spark users, but since I am curious on spark on mesos, I figured I'd try here first. Thanks for your help! On Mon, Feb 16, 2015 at 10:30 AM, Tim Chen <[email protected]> wrote: > Hi John, > > With Spark on Mesos, each client (spark-submit) starts a SparkContext which > initializes its own SparkUI and framework. There is a default 4040 for the > Spark UI port, but if it's occupied Spark automatically tries ports > incrementally for you, so your next could be 4041 if it's available. > > Driver is not shared between user, each user creates its own driver. > > About slowness it's hard to say without any information, you need to tell us > your cluster setup, what mode you're Mesos with and if there is anything > else running in the cluster, the job, etc. > > Tim > > On Sat, Feb 14, 2015 at 5:06 PM, John Omernik <[email protected]> wrote: >> >> Hello all, I am running Spark on Mesos and I think I am love, but I >> have some questions. I am running the python shell via iPython >> Notebooks (Jupyter) and it works great, but I am trying to figure out >> how things are actually submitted... like for example, when I submit >> the spark app from the iPython notebook server, I am opening a new >> kernel and I see a new spark submit (similar to the below) for each >> kernel... but, how is that actually working on the cluster, I can >> connect to the spark server UI on 4040, but shouldn't there be a >> different one for each driver? Is that causing conflicts? after a >> while things seem to run slow is this due to some weird conflicts? >> Should I be specifying unique ports for each server? Is the driver >> shared between users? what about between kerne's for the same user? >> Curious if anyone has any insight. >> >> Thanks! >> >> >> java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --master >> mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M >> pyspark-shell > >

