Re: Spark on Mesos Submitted from multiple users

Timothy Chen Sat, 21 Feb 2015 14:46:10 -0800

Hi John,

Having drivers launched on the cluster where you can query/kill is what I'm 
currently working on.


As for sharing drivers I will let others chime in if that ever makes sense.

Tim


> On Feb 21, 2015, at 11:29 AM, John Omernik <[email protected]> wrote:
> 
> So in my instance, instead having a bunch of drivers on one machine,
> at least each of the drivers would be out in cluster land... That's a
> bit better, however I see your point on not sharing drivers between
> apps, going to have to think that one through. Are there no cases
> where having a single driver supporting requests for a group apps
> makes sense or am I missing something there?   It seems like a logical
> way to put some limitations on groups of apps, but I may be missing
> something in how it's designed to be run.
> 
>> On Fri, Feb 20, 2015 at 10:22 AM, Tim Chen <[email protected]> wrote:
>> Hi John,
>> 
>> I'm currently working on a cluster mode design a PoC, but it is also not
>> sharing drivers as Spark AFAIK is designed to not share drivers between
>> apps.
>> 
>> The cluster mode for Mesos is going to be a way to submit apps to your
>> cluster, and each app will be running in the cluster as a new driver that is
>> managed by a cluster dispatcher, and you don't need to wait for the client
>> to finish to get all the results.
>> 
>> I'll be updating the JIRA and PR once I have this ready, which is aimed for
>> this next release.
>> 
>> Tim
>> 
>>> On Fri, Feb 20, 2015 at 8:09 AM, John Omernik <[email protected]> wrote:
>>> 
>>> Tim - on the Spark list your name was brought up in relation to
>>> https://issues.apache.org/jira/browse/SPARK-5338 I asked this question
>>> there but I'll ask it here too, what can I do to help on this. I am
>>> not a coder unfortunately, but I am user willing to try things :) This
>>> looks really cool for what we would like to do with Spark and Mesos
>>> and I'd love to be able to contribute and/or get an understanding of a
>>> (even tentative) timeline.  I am not trying to be pushy, I understand
>>> lots of things are likely on your agenda :)
>>> 
>>> John
>>> 
>>> 
>>> 
>>>> On Tue, Feb 17, 2015 at 6:33 AM, John Omernik <[email protected]> wrote:
>>>> Tim, thanks, that makes sense, the checking for ports and incrementing
>>>> was new to me, so hearing about that helps.  Next question.... is it
>>>> possible, for a driver to be shared by the same user some how? This
>>>> would be desirable from the standpoint of running an iPython notebook
>>>> server (Jupyter Hub).  I have it setup that every time a notebook is
>>>> opened, that the imports for spark are run, (the idea is the
>>>> environment is ready to go for analysis) however, if each user, has 5
>>>> notebooks open at any time, that would be a lot of spark drivers! But,
>>>> I suppose before asking that, I should ask about the sequence of
>>>> drivers... are they serial? i.e. can one driver server only one query
>>>> at a time?   What is the optimal size for a driver (in memory) what
>>>> does the memory affect in the driver? I.e. is a driver with smaller
>>>> amounts of memory limited in the number of results etc?
>>>> 
>>>> Lots of questions here, if these are more spark related questions, let
>>>> me know, I can hop over to spark users, but since I am curious on
>>>> spark on mesos, I figured I'd try here first.
>>>> 
>>>> Thanks for your help!
>>>> 
>>>> 
>>>> 
>>>>> On Mon, Feb 16, 2015 at 10:30 AM, Tim Chen <[email protected]> wrote:
>>>>> Hi John,
>>>>> 
>>>>> With Spark on Mesos, each client (spark-submit) starts a SparkContext
>>>>> which
>>>>> initializes its own SparkUI and framework. There is a default 4040 for
>>>>> the
>>>>> Spark UI port, but if it's occupied Spark automatically tries ports
>>>>> incrementally for you, so your next could be 4041 if it's available.
>>>>> 
>>>>> Driver is not shared between user, each user creates its own driver.
>>>>> 
>>>>> About slowness it's hard to say without any information, you need to
>>>>> tell us
>>>>> your cluster setup, what mode you're Mesos with and if there is
>>>>> anything
>>>>> else running in the cluster, the job, etc.
>>>>> 
>>>>> Tim
>>>>> 
>>>>>> On Sat, Feb 14, 2015 at 5:06 PM, John Omernik <[email protected]> wrote:
>>>>>> 
>>>>>> Hello all, I am running Spark on Mesos and I think I am love, but I
>>>>>> have some questions. I am running the python shell via iPython
>>>>>> Notebooks (Jupyter) and it works great, but I am trying to figure out
>>>>>> how things are actually submitted... like for example, when I submit
>>>>>> the spark app from the iPython notebook server, I am opening a new
>>>>>> kernel and I see a new spark submit (similar to the below) for each
>>>>>> kernel... but, how is that actually working on the cluster, I can
>>>>>> connect to the spark server UI on 4040, but shouldn't there be a
>>>>>> different one for each driver? Is that causing conflicts? after a
>>>>>> while things seem to run slow is this due to some weird conflicts?
>>>>>> Should I be specifying unique ports for each server? Is the driver
>>>>>> shared between users? what about between kerne's for the same user?
>>>>>> Curious if anyone has any insight.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> 
>>>>>> java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --master
>>>>>> mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M
>>>>>> pyspark-shell
>> 
>>

Re: Spark on Mesos Submitted from multiple users

Reply via email to