Hi John,

I think there are limitations with the way drivers are designed that is 
required a seperate JVM process per driver, therefore it's not possible without 
any code and design change AFAIK.

A driver shouldn't stay open past your job life time though, so while not 
sharing between apps it shouldn't be wasting as much as you described.

Tim 


> On Feb 27, 2015, at 7:50 AM, John Omernik <j...@omernik.com> wrote:
> 
> All - I've asked this question before, and probably due to my own poor 
> comprehension or my clumsy way I ask the question, I am still unclear on the 
> answer. I'll try again this time using crude visual aids. 
> 
> I am using iPython Notebooks with Jupyter Hub. (Multi-User notebook server). 
> To make an environment really smooth for data exploration, I am creating a 
> spark context every time a notebook is opened.   (See image below) 
> 
> This can cause issues on my "analysis" (Jupyter Hub) server as say the driver 
> uses 1024MB, each notebook, regardless of how much spark is used, opens up a 
> driver.  Yes, I should probably set it up to only create the context on 
> demand, however, that will cause additional delay. Another issue is once they 
> are created, they are not closed until the notebook is halted. Users could 
> leave notebook kernels running causing additional wasted resources.  
> 
> 
> 
> <Current.png>
> ​
> 
> What I would like to do is share context per user.  Basically, each user on 
> the system would only get one Spark context. And all adhoc queries or work 
> would be sent through one driver.  This makes sense to me, as users will 
> often want Spark adhoc capabilities, and this allows them to sit open, ready 
> for adhoc work, while at the same time, not be over the top in resource 
> usage, especially if a kernel is left open. 
> 
> <Shared.png>
> ​
> 
> On the mesos list I was made aware of SPARK-5338 which Tim Chen is working 
> on. Based on conversations with him,  this wouldn't actually completely 
> achieve what I am looking for. in that each notebook would likely still start 
> a spark context, but at least in this case, the spark driver would be 
> residing on the cluster, and thereby be resource managed by the cluster. One 
> thing to note here, if the deisgn is similar to the YARN cluster design, then 
> my iPython stuff may not work at all with Tim's approach, in that the shells 
> (if I am remember correctly) don't work in cluster mode on Yarn. 
> 
> <SPARK-5338.png>
> ​
> 
> 
> Barring that though, (the pyshell not working in cluster mode), I was 
> thinking drivers could be shared per user like I initially proposed, ran on 
> the cluster as Tim proposed, and the shells still work in cluster mode, that 
> would be ideal. We'd have everything running on the cluster, and we wouldn't 
> have wasted drivers or left open drivers utilizing resources. 
> 
> <Shared-SPARK5338.png>
> 
> 
> 
> 
> So I guess, ideally, what keeps us from 
> 
> A. (in Yarn Cluster mode) using the driver in the cluster
> B. Sharing drivers 
> 
> My guess is I may be missing something fundamental here in how Spark is 
> supposed to work, but I see this as a more efficient use of resources for 
> this type of work.  I may also looking into creating some docker containers 
> and see how those work, but ideally I'd like to understand this at a base 
> level... i.e.  why can't cluster (Yarn and Mesos) contexts be connected  to 
> like a Spark stand alone cluster context can?
> 
> Thanks!
> 
> 
> John
> 
> ​

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to