Tim, thanks, that makes sense, the checking for ports and incrementing
was new to me, so hearing about that helps.  Next question.... is it
possible, for a driver to be shared by the same user some how? This
would be desirable from the standpoint of running an iPython notebook
server (Jupyter Hub).  I have it setup that every time a notebook is
opened, that the imports for spark are run, (the idea is the
environment is ready to go for analysis) however, if each user, has 5
notebooks open at any time, that would be a lot of spark drivers! But,
I suppose before asking that, I should ask about the sequence of
drivers... are they serial? i.e. can one driver server only one query
at a time?   What is the optimal size for a driver (in memory) what
does the memory affect in the driver? I.e. is a driver with smaller
amounts of memory limited in the number of results etc?

Lots of questions here, if these are more spark related questions, let
me know, I can hop over to spark users, but since I am curious on
spark on mesos, I figured I'd try here first.

Thanks for your help!



On Mon, Feb 16, 2015 at 10:30 AM, Tim Chen <[email protected]> wrote:
> Hi John,
>
> With Spark on Mesos, each client (spark-submit) starts a SparkContext which
> initializes its own SparkUI and framework. There is a default 4040 for the
> Spark UI port, but if it's occupied Spark automatically tries ports
> incrementally for you, so your next could be 4041 if it's available.
>
> Driver is not shared between user, each user creates its own driver.
>
> About slowness it's hard to say without any information, you need to tell us
> your cluster setup, what mode you're Mesos with and if there is anything
> else running in the cluster, the job, etc.
>
> Tim
>
> On Sat, Feb 14, 2015 at 5:06 PM, John Omernik <[email protected]> wrote:
>>
>> Hello all, I am running Spark on Mesos and I think I am love, but I
>> have some questions. I am running the python shell via iPython
>> Notebooks (Jupyter) and it works great, but I am trying to figure out
>> how things are actually submitted... like for example, when I submit
>> the spark app from the iPython notebook server, I am opening a new
>> kernel and I see a new spark submit (similar to the below) for each
>> kernel... but, how is that actually working on the cluster, I can
>> connect to the spark server UI on 4040, but shouldn't there be a
>> different one for each driver? Is that causing conflicts? after a
>> while things seem to run slow is this due to some weird conflicts?
>> Should I be specifying unique ports for each server? Is the driver
>> shared between users? what about between kerne's for the same user?
>> Curious if anyone has any insight.
>>
>> Thanks!
>>
>>
>> java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --master
>> mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M
>> pyspark-shell
>
>

Reply via email to