Hey Pranav, Did you make any progress on this?
-- Rohit On Sunday, August 16, 2015, moon soo Lee <m...@apache.org> wrote: > Pranav, proposal looks awesome! > > I have a question and feedback, > > You said you tested 1,2 and 3. To create SparkIMain per notebook, you need > information of notebook id. Did you get it from InterpreterContext? > Then how did you handle destroying of SparkIMain (when notebook is > deleting)? > As far as i know, interpreter not able to get information of notebook > deletion. > > >> 4. Build a queue inside interpreter to allow only one paragraph > execution > >> at a time per notebook. > > One downside of this approach is, GUI will display RUNNING instead of > PENDING for jobs inside of queue in interpreter. > > Best, > moon > > On Sun, Aug 16, 2015 at 12:55 AM IT CTO <goi....@gmail.com > <javascript:_e(%7B%7D,'cvml','goi....@gmail.com');>> wrote: > >> +1 for "to re-factor the Zeppelin architecture so that it can handle >> multi-tenancy easily" >> >> On Sun, Aug 16, 2015 at 9:47 AM DuyHai Doan <doanduy...@gmail.com >> <javascript:_e(%7B%7D,'cvml','doanduy...@gmail.com');>> wrote: >> >>> Agree with Joel, we may think to re-factor the Zeppelin architecture so >>> that it can handle multi-tenancy easily. The technical solution proposed by >>> Pranav >>> is great but it only applies to Spark. Right now, each interpreter has to >>> manage multi-tenancy its own way. Ultimately Zeppelin can propose a >>> multi-tenancy contract/info (like UserContext, similar to >>> InterpreterContext) so that each interpreter can choose to use or not. >>> >>> >>> On Sun, Aug 16, 2015 at 3:09 AM, Joel Zambrano <djo...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','djo...@gmail.com');>> wrote: >>> >>>> I think while the idea of running multiple notes simultaneously is >>>> great. It is really dancing around the lack of true multi user support in >>>> Zeppelin. While the proposed solution would work if the applications >>>> resources are those of the whole cluster, if the app is limited (say they >>>> are 8 cores of 16, with some distribution in memory) then potentially your >>>> note can hog all the resources and the scheduler will have to throttle all >>>> other executions leaving you exactly where you are now. >>>> While I think the solution is a good one, maybe this question makes us >>>> think in adding true multiuser support. >>>> Where we isolate resources (cluster and the notebooks themselves), have >>>> separate login/identity and (I don't know if it's possible) share the same >>>> context. >>>> >>>> Thanks, >>>> Joel >>>> >>>> > On Aug 15, 2015, at 1:58 PM, Rohit Agarwal <mindpri...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','mindpri...@gmail.com');>> wrote: >>>> > >>>> > If the problem is that multiple users have to wait for each other >>>> while >>>> > using Zeppelin, the solution already exists: they can create a new >>>> > interpreter by going to the interpreter page and attach it to their >>>> > notebook - then they don't have to wait for others to submit their >>>> job. >>>> > >>>> > But I agree, having paragraphs from one note wait for paragraphs from >>>> other >>>> > notes is a confusing default. We can get around that in two ways: >>>> > >>>> > 1. Create a new interpreter for each note and attach that >>>> interpreter to >>>> > that note. This approach would require the least amount of code >>>> changes but >>>> > is resource heavy and doesn't let you share Spark Context between >>>> different >>>> > notes. >>>> > 2. If we want to share the Spark Context between different notes, >>>> we can >>>> > submit jobs from different notes into different fairscheduler pools >>>> ( >>>> > >>>> https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application >>>> ). >>>> > This can be done by submitting jobs from different notes in >>>> different >>>> > threads. This will make sure that jobs from one note are run >>>> sequentially >>>> > but jobs from different notes will be able to run in parallel. >>>> > >>>> > Neither of these options require any change in the Spark code. >>>> > >>>> > -- >>>> > Thanks & Regards >>>> > Rohit Agarwal >>>> > https://www.linkedin.com/in/rohitagarwal003 >>>> > >>>> > On Sat, Aug 15, 2015 at 12:01 PM, Pranav Kumar Agarwal < >>>> praag...@gmail.com <javascript:_e(%7B%7D,'cvml','praag...@gmail.com');> >>>> > >>>> > wrote: >>>> > >>>> >> If someone can share about the idea of sharing single SparkContext >>>> through >>>> >>> multiple SparkILoop safely, it'll be really helpful. >>>> >> Here is a proposal: >>>> >> 1. In Spark code, change SparkIMain.scala to allow setting the >>>> virtual >>>> >> directory. While creating new instances of SparkIMain per notebook >>>> from >>>> >> zeppelin spark interpreter set all the instances of SparkIMain to >>>> the same >>>> >> virtual directory. >>>> >> 2. Start HTTP server on that virtual directory and set this HTTP >>>> server in >>>> >> Spark Context using classserverUri method >>>> >> 3. Scala generated code has a notion of packages. The default >>>> package name >>>> >> is "line$<linenumber>". Package name can be controlled using System >>>> >> Property scala.repl.name.line. Setting this property to "notebook id" >>>> >> ensures that code generated by individual instances of SparkIMain is >>>> >> isolated from other instances of SparkIMain >>>> >> 4. Build a queue inside interpreter to allow only one paragraph >>>> execution >>>> >> at a time per notebook. >>>> >> >>>> >> I have tested 1, 2, and 3 and it seems to provide isolation across >>>> >> classnames. I'll work towards submitting a formal patch soon - Is >>>> there any >>>> >> Jira already for the same that I can uptake? Also I need to >>>> understand: >>>> >> 1. How does Zeppelin uptake Spark fixes? OR do I need to first work >>>> >> towards getting Spark changes merged in Apache Spark github? >>>> >> >>>> >> Any suggestions on comments on the proposal are highly welcome. >>>> >> >>>> >> Regards, >>>> >> -Pranav. >>>> >> >>>> >>> On 10/08/15 11:36 pm, moon soo Lee wrote: >>>> >>> >>>> >>> Hi piyush, >>>> >>> >>>> >>> Separate instance of SparkILoop SparkIMain for each notebook while >>>> >>> sharing the SparkContext sounds great. >>>> >>> >>>> >>> Actually, i tried to do it, found problem that multiple SparkILoop >>>> could >>>> >>> generates the same class name, and spark executor confuses >>>> classname since >>>> >>> they're reading classes from single SparkContext. >>>> >>> >>>> >>> If someone can share about the idea of sharing single SparkContext >>>> >>> through multiple SparkILoop safely, it'll be really helpful. >>>> >>> >>>> >>> Thanks, >>>> >>> moon >>>> >>> >>>> >>> >>>> >>> On Mon, Aug 10, 2015 at 1:21 AM Piyush Mukati (Data Platform) < >>>> >>> piyush.muk...@flipkart.com >>>> <javascript:_e(%7B%7D,'cvml','piyush.muk...@flipkart.com');> <mailto: >>>> piyush.muk...@flipkart.com >>>> <javascript:_e(%7B%7D,'cvml','piyush.muk...@flipkart.com');>>> wrote: >>>> >>> >>>> >>> Hi Moon, >>>> >>> Any suggestion on it, have to wait lot when multiple people >>>> working >>>> >>> with spark. >>>> >>> Can we create separate instance of SparkILoop SparkIMain and >>>> >>> printstrems for each notebook while sharing theSparkContext >>>> >>> ZeppelinContext SQLContext and DependencyResolver and then use >>>> parallel >>>> >>> scheduler ? >>>> >>> thanks >>>> >>> >>>> >>> -piyush >>>> >>> >>>> >>> Hi Moon, >>>> >>> >>>> >>> How about tracking dedicated SparkContext for a notebook in >>>> Spark's >>>> >>> remote interpreter - this will allow multiple users to run their >>>> spark >>>> >>> paragraphs in parallel. Also, within a notebook only one >>>> paragraph is >>>> >>> executed at a time. >>>> >>> >>>> >>> Regards, >>>> >>> -Pranav. >>>> >>> >>>> >>> >>>> >>>> On 15/07/15 7:15 pm, moon soo Lee wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> Thanks for asking question. >>>> >>>> >>>> >>>> The reason is simply because of it is running code statements. The >>>> >>>> statements can have order and dependency. Imagine i have two >>>> >>> paragraphs >>>> >>>> >>>> >>>> %spark >>>> >>>> val a = 1 >>>> >>>> >>>> >>>> %spark >>>> >>>> print(a) >>>> >>>> >>>> >>>> If they're not running one by one, that means they possibly runs in >>>> >>>> random order and the output will be always different. Either '1' or >>>> >>>> 'val a can not found'. >>>> >>>> >>>> >>>> This is the reason why. But if there are nice idea to handle this >>>> >>>> problem i agree using parallel scheduler would help a lot. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> moon >>>> >>>> On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng >>>> >>>> <linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');> <mailto: >>>> linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');>> >>>> >>> <mailto:linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');> <mailto: >>>> linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');>>>> >>>> >>> wrote: >>>> >>>> >>>> >>>> any one who have the same question with me? or this is not a >>>> >>> question? >>>> >>>> >>>> >>>> 2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');> >>>> >>> <mailto:linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');>> >>>> >>>> <mailto:linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');> <mailto: >>>> >>> linxizeng0...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','linxizeng0...@gmail.com');>>>>: >>>> >>>> >>>> >>>> hi, Moon: >>>> >>>> I notice that the getScheduler function in the >>>> >>>> SparkInterpreter.java return a FIFOScheduler which makes the >>>> >>>> spark interpreter run spark job one by one. It's not a good >>>> >>>> experience when couple of users do some work on zeppelin at >>>> >>>> the same time, because they have to wait for each other. >>>> >>>> And at the same time, SparkSqlInterpreter can chose what >>>> >>>> scheduler to use by "zeppelin.spark.concurrentSQL". >>>> >>>> My question is, what kind of consideration do you based on >>>> >>> to >>>> >>>> make such a decision? >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------ >>>> >>> >>>> >>> This email and any files transmitted with it are confidential and >>>> >>> intended solely for the use of the individual or entity to whom >>>> >>> they are addressed. If you have received this email in error >>>> >>> please notify the system manager. This message contains >>>> >>> confidential information and is intended only for the individual >>>> >>> named. If you are not the named addressee you should not >>>> >>> disseminate, distribute or copy this e-mail. Please notify the >>>> >>> sender immediately by e-mail if you have received this e-mail by >>>> >>> mistake and delete this e-mail from your system. If you are not >>>> >>> the intended recipient you are notified that disclosing, copying, >>>> >>> distributing or taking any action in reliance on the contents of >>>> >>> this information is strictly prohibited. Although Flipkart has >>>> >>> taken reasonable precautions to ensure no viruses are present in >>>> >>> this email, the company cannot accept responsibility for any loss >>>> >>> or damage arising from the use of this email or attachments >>>> >> >>>> >>> >>> -- Sent from a mobile device. Excuse my thumbs.