Hi Pranav, Thanks for sharing the plan. I think passing InterpreterContext to completion() make sense. Although it changes interpreter api, changing now looks better than later.
Thanks. moon On Tue, Aug 25, 2015 at 11:22 PM Pranav Kumar Agarwal <praag...@gmail.com> wrote: > Hi Moon, > > > I think releasing SparkIMain and related objects > By packaging I meant to ask what is the process to "release SparkIMain > and related objects"? for Zeppelin's code uptake? > > I have one more question: > Most the changes to allow SparkInterpreter support ParallelScheduler are > implemented but I'm struggling with the completion feature. Since I have > SparkIMain interpreter for each notebook, completion functionality is > not working as expected cause the completion method doesn't have > InterpreterContext. I need to be able to pull notebook specific > SparkIMain interpreter to return correct completion results, and for > that I need to know the notbook-id at the time of completion call. > > I'm planning to change the Interpreter.java abstract method completion > to pass InterpreterContext along with buffer and cursor location. This > will require refactoring all the Interpreter's. It's a change in the > contract, so thought will run with you before embarking on it... > > Please let me know your thoughts. > > Regards, > -Pranav. > > On 18/08/15 8:04 am, moon soo Lee wrote: > > Could you explain little bit more about package changes you mean? > > > > Thanks, > > moon > > > > On Mon, Aug 17, 2015 at 10:27 AM Pranav Agarwal <praag...@gmail.com > > <mailto:praag...@gmail.com>> wrote: > > > > Any thoughts on how to package changes related to Spark? > > > > On 17-Aug-2015 7:58 pm, "moon soo Lee" <m...@apache.org > > <mailto:m...@apache.org>> wrote: > > > > I think releasing SparkIMain and related objects after > > configurable inactivity would be good for now. > > > > About scheduler, I can help implementing such scheduler. > > > > Thanks, > > moon > > > > On Sun, Aug 16, 2015 at 11:54 PM Pranav Kumar Agarwal > > <praag...@gmail.com <mailto:praag...@gmail.com>> wrote: > > > > Hi Moon, > > > > Yes, the notebookid comes from InterpreterContext. At the > > moment destroying SparkIMain on deletion of notebook is > > not handled. I think SparkIMain is a lightweight object, > > do you see a concern having these objects in a map? One > > possible option could be to destroy notebook related > > objects when the inactivity on a notebook is greater than > > say 8 hours. > > > > > >> >> 4. Build a queue inside interpreter to allow only one > >> paragraph execution > >> >> at a time per notebook. > >> > >> One downside of this approach is, GUI will display > >> RUNNING instead of PENDING for jobs inside of queue in > >> interpreter. > > Yes that's an good point. Having a scheduler at Zeppelin > > server to build a scheduler that is parallel across > > notebook's and FIFO across paragraph's will be nice. Is > > there any plan for having such a scheduler? > > > > Regards, > > -Pranav. > > > > > > On 17/08/15 5:38 am, moon soo Lee wrote: > >> Pranav, proposal looks awesome! > >> > >> I have a question and feedback, > >> > >> You said you tested 1,2 and 3. To create SparkIMain per > >> notebook, you need information of notebook id. Did you > >> get it from InterpreterContext? > >> Then how did you handle destroying of SparkIMain (when > >> notebook is deleting)? > >> As far as i know, interpreter not able to get information > >> of notebook deletion. > >> > >> >> 4. Build a queue inside interpreter to allow only one > >> paragraph execution > >> >> at a time per notebook. > >> > >> One downside of this approach is, GUI will display > >> RUNNING instead of PENDING for jobs inside of queue in > >> interpreter. > >> > >> Best, > >> moon > >> > >> On Sun, Aug 16, 2015 at 12:55 AM IT CTO > >> <goi....@gmail.com <mailto:goi....@gmail.com>> wrote: > >> > >> +1 for "to re-factor the Zeppelin architecture so > >> that it can handle multi-tenancy easily" > >> > >> On Sun, Aug 16, 2015 at 9:47 AM DuyHai Doan > >> <doanduy...@gmail.com <mailto:doanduy...@gmail.com>> > >> wrote: > >> > >> Agree with Joel, we may think to re-factor the > >> Zeppelin architecture so that it can handle > >> multi-tenancy easily. The technical solution > >> proposed by Pranav is great but it only applies > >> to Spark. Right now, each interpreter has to > >> manage multi-tenancy its own way. Ultimately > >> Zeppelin can propose a multi-tenancy > >> contract/info (like UserContext, similar to > >> InterpreterContext) so that each interpreter can > >> choose to use or not. > >> > >> > >> On Sun, Aug 16, 2015 at 3:09 AM, Joel Zambrano > >> <djo...@gmail.com <mailto:djo...@gmail.com>> wrote: > >> > >> I think while the idea of running multiple > >> notes simultaneously is great. It is really > >> dancing around the lack of true multi user > >> support in Zeppelin. While the proposed > >> solution would work if the applications > >> resources are those of the whole cluster, if > >> the app is limited (say they are 8 cores of > >> 16, with some distribution in memory) then > >> potentially your note can hog all the > >> resources and the scheduler will have to > >> throttle all other executions leaving you > >> exactly where you are now. > >> While I think the solution is a good one, > >> maybe this question makes us think in adding > >> true multiuser support. > >> Where we isolate resources (cluster and the > >> notebooks themselves), have separate > >> login/identity and (I don't know if it's > >> possible) share the same context. > >> > >> Thanks, > >> Joel > >> > >> > On Aug 15, 2015, at 1:58 PM, Rohit Agarwal > >> <mindpri...@gmail.com > >> <mailto:mindpri...@gmail.com>> wrote: > >> > > >> > If the problem is that multiple users have > >> to wait for each other while > >> > using Zeppelin, the solution already > >> exists: they can create a new > >> > interpreter by going to the interpreter > >> page and attach it to their > >> > notebook - then they don't have to wait for > >> others to submit their job. > >> > > >> > But I agree, having paragraphs from one > >> note wait for paragraphs from other > >> > notes is a confusing default. We can get > >> around that in two ways: > >> > > >> > 1. Create a new interpreter for each note > >> and attach that interpreter to > >> > that note. This approach would require the > least amount > >> of code changes but > >> > is resource heavy and doesn't let you > >> share Spark Context between different > >> > notes. > >> > 2. If we want to share the Spark Context > >> between different notes, we can > >> > submit jobs from different notes into > >> different fairscheduler pools ( > >> > > >> > https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application > ). > >> > This can be done by submitting jobs from > >> different notes in different > >> > threads. This will make sure that jobs > >> from one note are run sequentially > >> > but jobs from different notes will be > >> able to run in parallel. > >> > > >> > Neither of these options require any change > >> in the Spark code. > >> > > >> > -- > >> > Thanks & Regards > >> > Rohit Agarwal > >> > https://www.linkedin.com/in/rohitagarwal003 > >> > > >> > On Sat, Aug 15, 2015 at 12:01 PM, Pranav > >> Kumar Agarwal <praag...@gmail.com > >> <mailto:praag...@gmail.com>> > >> > wrote: > >> > > >> >> If someone can share about the idea of > >> sharing single SparkContext through > >> >>> multiple SparkILoop safely, it'll be > >> really helpful. > >> >> Here is a proposal: > >> >> 1. In Spark code, change SparkIMain.scala > >> to allow setting the virtual > >> >> directory. While creating new instances of > >> SparkIMain per notebook from > >> >> zeppelin spark interpreter set all the > >> instances of SparkIMain to the same > >> >> virtual directory. > >> >> 2. Start HTTP server on that virtual > >> directory and set this HTTP server in > >> >> Spark Context using classserverUri method > >> >> 3. Scala generated code has a notion of > >> packages. The default package name > >> >> is "line$<linenumber>". Package name can > >> be controlled using System > >> >> Property scala.repl.name.line. Setting > >> this property to "notebook id" > >> >> ensures that code generated by individual > >> instances of SparkIMain is > >> >> isolated from other instances of SparkIMain > >> >> 4. Build a queue inside interpreter to > >> allow only one paragraph execution > >> >> at a time per notebook. > >> >> > >> >> I have tested 1, 2, and 3 and it seems to > >> provide isolation across > >> >> classnames. I'll work towards submitting a > >> formal patch soon - Is there any > >> >> Jira already for the same that I can > >> uptake? Also I need to understand: > >> >> 1. How does Zeppelin uptake Spark fixes? > >> OR do I need to first work > >> >> towards getting Spark changes merged in > >> Apache Spark github? > >> >> > >> >> Any suggestions on comments on the > >> proposal are highly welcome. > >> >> > >> >> Regards, > >> >> -Pranav. > >> >> > >> >>> On 10/08/15 11:36 pm, moon soo Lee wrote: > >> >>> > >> >>> Hi piyush, > >> >>> > >> >>> Separate instance of SparkILoop > >> SparkIMain for each notebook while > >> >>> sharing the SparkContext sounds great. > >> >>> > >> >>> Actually, i tried to do it, found problem > >> that multiple SparkILoop could > >> >>> generates the same class name, and spark > >> executor confuses classname since > >> >>> they're reading classes from single > >> SparkContext. > >> >>> > >> >>> If someone can share about the idea of > >> sharing single SparkContext > >> >>> through multiple SparkILoop safely, it'll > >> be really helpful. > >> >>> > >> >>> Thanks, > >> >>> moon > >> >>> > >> >>> > >> >>> On Mon, Aug 10, 2015 at 1:21 AM Piyush > >> Mukati (Data Platform) < > >> >>> piyush.muk...@flipkart.com > >> <mailto:piyush.muk...@flipkart.com> > >> <mailto:piyush.muk...@flipkart.com > >> <mailto:piyush.muk...@flipkart.com>>> wrote: > >> >>> > >> >>> Hi Moon, > >> >>> Any suggestion on it, have to wait lot > >> when multiple people working > >> >>> with spark. > >> >>> Can we create separate instance of > >> SparkILoop SparkIMain and > >> >>> printstrems for each notebook while > >> sharing theSparkContext > >> >>> ZeppelinContext SQLContext and > >> DependencyResolver and then use parallel > >> >>> scheduler ? > >> >>> thanks > >> >>> > >> >>> -piyush > >> >>> > >> >>> Hi Moon, > >> >>> > >> >>> How about tracking dedicated > >> SparkContext for a notebook in Spark's > >> >>> remote interpreter - this will allow > >> multiple users to run their spark > >> >>> paragraphs in parallel. Also, within a > >> notebook only one paragraph is > >> >>> executed at a time. > >> >>> > >> >>> Regards, > >> >>> -Pranav. > >> >>> > >> >>> > >> >>>> On 15/07/15 7:15 pm, moon soo Lee wrote: > >> >>>> Hi, > >> >>>> > >> >>>> Thanks for asking question. > >> >>>> > >> >>>> The reason is simply because of it is > >> running code statements. The > >> >>>> statements can have order and > >> dependency. Imagine i have two > >> >>> paragraphs > >> >>>> > >> >>>> %spark > >> >>>> val a = 1 > >> >>>> > >> >>>> %spark > >> >>>> print(a) > >> >>>> > >> >>>> If they're not running one by one, that > >> means they possibly runs in > >> >>>> random order and the output will be > >> always different. Either '1' or > >> >>>> 'val a can not found'. > >> >>>> > >> >>>> This is the reason why. But if there are > >> nice idea to handle this > >> >>>> problem i agree using parallel scheduler > >> would help a lot. > >> >>>> > >> >>>> Thanks, > >> >>>> moon > >> >>>> On 2015년 7월 14일 (화) at 오후 7:59 > >> linxi zeng > >> >>>> <linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com> > >> <mailto:linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com>> > >> >>> <mailto:linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com> > >> <mailto:linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com>>>> > >> >>> wrote: > >> >>>> > >> >>>> any one who have the same question with > >> me? or this is not a > >> >>> question? > >> >>>> > >> >>>> 2015-07-14 11:47 GMT+08:00 linxi zeng > >> <linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com> > >> >>> <mailto:linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com>> > >> >>>> <mailto:linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com> <mailto: > >> >>> linxizeng0...@gmail.com > >> <mailto:linxizeng0...@gmail.com>>>>: > >> >>>> > >> >>>> hi, Moon: > >> >>>> I notice that the getScheduler > >> function in the > >> >>>> SparkInterpreter.java return a > >> FIFOScheduler which makes the > >> >>>> spark interpreter run spark job one > >> by one. It's not a good > >> >>>> experience when couple of users do > >> some work on zeppelin at > >> >>>> the same time, because they have to > >> wait for each other. > >> >>>> And at the same time, > >> SparkSqlInterpreter can chose what > >> >>>> scheduler to use by > >> "zeppelin.spark.concurrentSQL". > >> >>>> My question is, what kind of > >> consideration do you based on > >> >>> to > >> >>>> make such a decision? > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> > > ------------------------------------------------------------------------------------------------------------------------------------------ > >> >>> > >> >>> This email and any files transmitted > >> with it are confidential and > >> >>> intended solely for the use of the > >> individual or entity to whom > >> >>> they are addressed. If you have > >> received this email in error > >> >>> please notify the system manager. This > >> message contains > >> >>> confidential information and is intended > >> only for the individual > >> >>> named. If you are not the named addressee > >> you should not > >> >>> disseminate, distribute or copy this > >> e-mail. Please notify the > >> >>> sender immediately by e-mail if you have > >> received this e-mail by > >> >>> mistake and delete this e-mail from your > >> system. If you are not > >> >>> the intended recipient you are > >> notified that disclosing, copying, > >> >>> distributing or taking any action in > >> reliance on the contents of > >> >>> this information is strictly > >> prohibited. Although Flipkart has > >> >>> taken reasonable precautions to ensure no > >> viruses are present in > >> >>> this email, the company cannot accept > >> responsibility for any loss > >> >>> or damage arising from the use of this > >> email or attachments > >> >> > >> > >> > > > >