Hi, I'm not sure what could be wrong. can you see any existing notebook?
Best, moon On Mon, Aug 31, 2015 at 8:48 PM Piyush Mukati (Data Platform) < piyush.muk...@flipkart.com> wrote: > Hi, > we have passed the InterpreterContext to completion() , it is working > good on my local dev setup. > but after > mvn clean package -P build-distr -Pspark-1.4 -Dhadoop.version=2.6.0 > -Phadoop-2.6 -Pyarn > I copied zeppelin-0.6.0-incubating-SNAPSHOT.tar.gz to some other machine, > while running from there it always shows disconnected and no notebook are > shown, even i am not able to create any notebook as well. > > [image: Screenshot 2015-09-01 09.14.54.png] > i am not seeing anything in logs. can anyone please suggest me how can i > further debug into it. > thanks. > > On Wed, Aug 26, 2015 at 8:27 PM, moon soo Lee <m...@apache.org> wrote: > >> Hi Pranav, >> >> Thanks for sharing the plan. >> I think passing InterpreterContext to completion() make sense. >> Although it changes interpreter api, changing now looks better than later. >> >> Thanks. >> moon >> >> On Tue, Aug 25, 2015 at 11:22 PM Pranav Kumar Agarwal <praag...@gmail.com> >> wrote: >> >>> Hi Moon, >>> >>> > I think releasing SparkIMain and related objects >>> By packaging I meant to ask what is the process to "release SparkIMain >>> and related objects"? for Zeppelin's code uptake? >>> >>> I have one more question: >>> Most the changes to allow SparkInterpreter support ParallelScheduler are >>> implemented but I'm struggling with the completion feature. Since I have >>> SparkIMain interpreter for each notebook, completion functionality is >>> not working as expected cause the completion method doesn't have >>> InterpreterContext. I need to be able to pull notebook specific >>> SparkIMain interpreter to return correct completion results, and for >>> that I need to know the notbook-id at the time of completion call. >>> >>> I'm planning to change the Interpreter.java abstract method completion >>> to pass InterpreterContext along with buffer and cursor location. This >>> will require refactoring all the Interpreter's. It's a change in the >>> contract, so thought will run with you before embarking on it... >>> >>> Please let me know your thoughts. >>> >>> Regards, >>> -Pranav. >>> >>> On 18/08/15 8:04 am, moon soo Lee wrote: >>> > Could you explain little bit more about package changes you mean? >>> > >>> > Thanks, >>> > moon >>> > >>> > On Mon, Aug 17, 2015 at 10:27 AM Pranav Agarwal <praag...@gmail.com >>> > <mailto:praag...@gmail.com>> wrote: >>> > >>> > Any thoughts on how to package changes related to Spark? >>> > >>> > On 17-Aug-2015 7:58 pm, "moon soo Lee" <m...@apache.org >>> > <mailto:m...@apache.org>> wrote: >>> > >>> > I think releasing SparkIMain and related objects after >>> > configurable inactivity would be good for now. >>> > >>> > About scheduler, I can help implementing such scheduler. >>> > >>> > Thanks, >>> > moon >>> > >>> > On Sun, Aug 16, 2015 at 11:54 PM Pranav Kumar Agarwal >>> > <praag...@gmail.com <mailto:praag...@gmail.com>> wrote: >>> > >>> > Hi Moon, >>> > >>> > Yes, the notebookid comes from InterpreterContext. At the >>> > moment destroying SparkIMain on deletion of notebook is >>> > not handled. I think SparkIMain is a lightweight object, >>> > do you see a concern having these objects in a map? One >>> > possible option could be to destroy notebook related >>> > objects when the inactivity on a notebook is greater than >>> > say 8 hours. >>> > >>> > >>> >> >> 4. Build a queue inside interpreter to allow only one >>> >> paragraph execution >>> >> >> at a time per notebook. >>> >> >>> >> One downside of this approach is, GUI will display >>> >> RUNNING instead of PENDING for jobs inside of queue in >>> >> interpreter. >>> > Yes that's an good point. Having a scheduler at Zeppelin >>> > server to build a scheduler that is parallel across >>> > notebook's and FIFO across paragraph's will be nice. Is >>> > there any plan for having such a scheduler? >>> > >>> > Regards, >>> > -Pranav. >>> > >>> > >>> > On 17/08/15 5:38 am, moon soo Lee wrote: >>> >> Pranav, proposal looks awesome! >>> >> >>> >> I have a question and feedback, >>> >> >>> >> You said you tested 1,2 and 3. To create SparkIMain per >>> >> notebook, you need information of notebook id. Did you >>> >> get it from InterpreterContext? >>> >> Then how did you handle destroying of SparkIMain (when >>> >> notebook is deleting)? >>> >> As far as i know, interpreter not able to get information >>> >> of notebook deletion. >>> >> >>> >> >> 4. Build a queue inside interpreter to allow only one >>> >> paragraph execution >>> >> >> at a time per notebook. >>> >> >>> >> One downside of this approach is, GUI will display >>> >> RUNNING instead of PENDING for jobs inside of queue in >>> >> interpreter. >>> >> >>> >> Best, >>> >> moon >>> >> >>> >> On Sun, Aug 16, 2015 at 12:55 AM IT CTO >>> >> <goi....@gmail.com <mailto:goi....@gmail.com>> wrote: >>> >> >>> >> +1 for "to re-factor the Zeppelin architecture so >>> >> that it can handle multi-tenancy easily" >>> >> >>> >> On Sun, Aug 16, 2015 at 9:47 AM DuyHai Doan >>> >> <doanduy...@gmail.com <mailto:doanduy...@gmail.com>> >>> >> wrote: >>> >> >>> >> Agree with Joel, we may think to re-factor the >>> >> Zeppelin architecture so that it can handle >>> >> multi-tenancy easily. The technical solution >>> >> proposed by Pranav is great but it only applies >>> >> to Spark. Right now, each interpreter has to >>> >> manage multi-tenancy its own way. Ultimately >>> >> Zeppelin can propose a multi-tenancy >>> >> contract/info (like UserContext, similar to >>> >> InterpreterContext) so that each interpreter can >>> >> choose to use or not. >>> >> >>> >> >>> >> On Sun, Aug 16, 2015 at 3:09 AM, Joel Zambrano >>> >> <djo...@gmail.com <mailto:djo...@gmail.com>> >>> wrote: >>> >> >>> >> I think while the idea of running multiple >>> >> notes simultaneously is great. It is really >>> >> dancing around the lack of true multi user >>> >> support in Zeppelin. While the proposed >>> >> solution would work if the applications >>> >> resources are those of the whole cluster, if >>> >> the app is limited (say they are 8 cores of >>> >> 16, with some distribution in memory) then >>> >> potentially your note can hog all the >>> >> resources and the scheduler will have to >>> >> throttle all other executions leaving you >>> >> exactly where you are now. >>> >> While I think the solution is a good one, >>> >> maybe this question makes us think in adding >>> >> true multiuser support. >>> >> Where we isolate resources (cluster and the >>> >> notebooks themselves), have separate >>> >> login/identity and (I don't know if it's >>> >> possible) share the same context. >>> >> >>> >> Thanks, >>> >> Joel >>> >> >>> >> > On Aug 15, 2015, at 1:58 PM, Rohit Agarwal >>> >> <mindpri...@gmail.com >>> >> <mailto:mindpri...@gmail.com>> wrote: >>> >> > >>> >> > If the problem is that multiple users have >>> >> to wait for each other while >>> >> > using Zeppelin, the solution already >>> >> exists: they can create a new >>> >> > interpreter by going to the interpreter >>> >> page and attach it to their >>> >> > notebook - then they don't have to wait for >>> >> others to submit their job. >>> >> > >>> >> > But I agree, having paragraphs from one >>> >> note wait for paragraphs from other >>> >> > notes is a confusing default. We can get >>> >> around that in two ways: >>> >> > >>> >> > 1. Create a new interpreter for each note >>> >> and attach that interpreter to >>> >> > that note. This approach would require >>> the least amount >>> >> of code changes but >>> >> > is resource heavy and doesn't let you >>> >> share Spark Context between different >>> >> > notes. >>> >> > 2. If we want to share the Spark Context >>> >> between different notes, we can >>> >> > submit jobs from different notes into >>> >> different fairscheduler pools ( >>> >> > >>> >> >>> https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application >>> ). >>> >> > This can be done by submitting jobs from >>> >> different notes in different >>> >> > threads. This will make sure that jobs >>> >> from one note are run sequentially >>> >> > but jobs from different notes will be >>> >> able to run in parallel. >>> >> > >>> >> > Neither of these options require any change >>> >> in the Spark code. >>> >> > >>> >> > -- >>> >> > Thanks & Regards >>> >> > Rohit Agarwal >>> >> > https://www.linkedin.com/in/rohitagarwal003 >>> >> > >>> >> > On Sat, Aug 15, 2015 at 12:01 PM, Pranav >>> >> Kumar Agarwal <praag...@gmail.com >>> >> <mailto:praag...@gmail.com>> >>> >>> >> > wrote: >>> >> > >>> >> >> If someone can share about the idea of >>> >> sharing single SparkContext through >>> >> >>> multiple SparkILoop safely, it'll be >>> >> really helpful. >>> >> >> Here is a proposal: >>> >> >> 1. In Spark code, change SparkIMain.scala >>> >> to allow setting the virtual >>> >> >> directory. While creating new instances of >>> >> SparkIMain per notebook from >>> >> >> zeppelin spark interpreter set all the >>> >> instances of SparkIMain to the same >>> >> >> virtual directory. >>> >> >> 2. Start HTTP server on that virtual >>> >> directory and set this HTTP server in >>> >> >> Spark Context using classserverUri method >>> >> >> 3. Scala generated code has a notion of >>> >> packages. The default package name >>> >> >> is "line$<linenumber>". Package name can >>> >> be controlled using System >>> >> >> Property scala.repl.name.line. Setting >>> >> this property to "notebook id" >>> >> >> ensures that code generated by individual >>> >> instances of SparkIMain is >>> >> >> isolated from other instances of SparkIMain >>> >> >> 4. Build a queue inside interpreter to >>> >> allow only one paragraph execution >>> >> >> at a time per notebook. >>> >> >> >>> >> >> I have tested 1, 2, and 3 and it seems to >>> >> provide isolation across >>> >> >> classnames. I'll work towards submitting a >>> >> formal patch soon - Is there any >>> >> >> Jira already for the same that I can >>> >> uptake? Also I need to understand: >>> >> >> 1. How does Zeppelin uptake Spark fixes? >>> >> OR do I need to first work >>> >> >> towards getting Spark changes merged in >>> >> Apache Spark github? >>> >> >> >>> >> >> Any suggestions on comments on the >>> >> proposal are highly welcome. >>> >> >> >>> >> >> Regards, >>> >> >> -Pranav. >>> >> >> >>> >> >>> On 10/08/15 11:36 pm, moon soo Lee wrote: >>> >> >>> >>> >> >>> Hi piyush, >>> >> >>> >>> >> >>> Separate instance of SparkILoop >>> >> SparkIMain for each notebook while >>> >> >>> sharing the SparkContext sounds great. >>> >> >>> >>> >> >>> Actually, i tried to do it, found problem >>> >> that multiple SparkILoop could >>> >> >>> generates the same class name, and spark >>> >> executor confuses classname since >>> >> >>> they're reading classes from single >>> >> SparkContext. >>> >> >>> >>> >> >>> If someone can share about the idea of >>> >> sharing single SparkContext >>> >> >>> through multiple SparkILoop safely, it'll >>> >> be really helpful. >>> >> >>> >>> >> >>> Thanks, >>> >> >>> moon >>> >> >>> >>> >> >>> >>> >> >>> On Mon, Aug 10, 2015 at 1:21 AM Piyush >>> >> Mukati (Data Platform) < >>> >> >>> piyush.muk...@flipkart.com >>> >> <mailto:piyush.muk...@flipkart.com> >>> >> <mailto:piyush.muk...@flipkart.com >>> >>> >> <mailto:piyush.muk...@flipkart.com>>> wrote: >>> >> >>> >>> >> >>> Hi Moon, >>> >> >>> Any suggestion on it, have to wait lot >>> >> when multiple people working >>> >> >>> with spark. >>> >> >>> Can we create separate instance of >>> >> SparkILoop SparkIMain and >>> >> >>> printstrems for each notebook while >>> >> sharing theSparkContext >>> >> >>> ZeppelinContext SQLContext and >>> >> DependencyResolver and then use parallel >>> >> >>> scheduler ? >>> >> >>> thanks >>> >> >>> >>> >> >>> -piyush >>> >> >>> >>> >> >>> Hi Moon, >>> >> >>> >>> >> >>> How about tracking dedicated >>> >> SparkContext for a notebook in Spark's >>> >> >>> remote interpreter - this will allow >>> >> multiple users to run their spark >>> >> >>> paragraphs in parallel. Also, within a >>> >> notebook only one paragraph is >>> >> >>> executed at a time. >>> >> >>> >>> >> >>> Regards, >>> >> >>> -Pranav. >>> >> >>> >>> >> >>> >>> >> >>>> On 15/07/15 7:15 pm, moon soo Lee wrote: >>> >> >>>> Hi, >>> >> >>>> >>> >> >>>> Thanks for asking question. >>> >> >>>> >>> >> >>>> The reason is simply because of it is >>> >> running code statements. The >>> >> >>>> statements can have order and >>> >> dependency. Imagine i have two >>> >> >>> paragraphs >>> >> >>>> >>> >> >>>> %spark >>> >> >>>> val a = 1 >>> >> >>>> >>> >> >>>> %spark >>> >> >>>> print(a) >>> >> >>>> >>> >> >>>> If they're not running one by one, that >>> >> means they possibly runs in >>> >> >>>> random order and the output will be >>> >> always different. Either '1' or >>> >> >>>> 'val a can not found'. >>> >> >>>> >>> >> >>>> This is the reason why. But if there are >>> >> nice idea to handle this >>> >> >>>> problem i agree using parallel scheduler >>> >> would help a lot. >>> >> >>>> >>> >> >>>> Thanks, >>> >> >>>> moon >>> >> >>>> On 2015년 7월 14일 (화) at 오후 7:59 >>> >> linxi zeng >>> >> >>>> <linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com> >>> >> <mailto:linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com>> >>> >> >>> <mailto:linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com> >>> >> <mailto:linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com>>>> >>> >> >>> wrote: >>> >> >>>> >>> >> >>>> any one who have the same question with >>> >> me? or this is not a >>> >> >>> question? >>> >> >>>> >>> >> >>>> 2015-07-14 11:47 GMT+08:00 linxi zeng >>> >> <linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com> >>> >> >>> <mailto:linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com>> >>> >> >>>> <mailto:linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com> <mailto: >>> >> >>> linxizeng0...@gmail.com >>> >> <mailto:linxizeng0...@gmail.com>>>>: >>> >> >>>> >>> >> >>>> hi, Moon: >>> >> >>>> I notice that the getScheduler >>> >> function in the >>> >> >>>> SparkInterpreter.java return a >>> >> FIFOScheduler which makes the >>> >> >>>> spark interpreter run spark job one >>> >> by one. It's not a good >>> >> >>>> experience when couple of users do >>> >> some work on zeppelin at >>> >> >>>> the same time, because they have to >>> >> wait for each other. >>> >> >>>> And at the same time, >>> >> SparkSqlInterpreter can chose what >>> >> >>>> scheduler to use by >>> >> "zeppelin.spark.concurrentSQL". >>> >> >>>> My question is, what kind of >>> >> consideration do you based on >>> >> >>> to >>> >> >>>> make such a decision? >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------------ >>> >> >>> >>> >> >>> This email and any files transmitted >>> >> with it are confidential and >>> >> >>> intended solely for the use of the >>> >> individual or entity to whom >>> >> >>> they are addressed. If you have >>> >> received this email in error >>> >> >>> please notify the system manager. This >>> >> message contains >>> >> >>> confidential information and is intended >>> >> only for the individual >>> >> >>> named. If you are not the named addressee >>> >> you should not >>> >> >>> disseminate, distribute or copy this >>> >> e-mail. Please notify the >>> >> >>> sender immediately by e-mail if you have >>> >> received this e-mail by >>> >> >>> mistake and delete this e-mail from your >>> >> system. If you are not >>> >> >>> the intended recipient you are >>> >> notified that disclosing, copying, >>> >> >>> distributing or taking any action in >>> >> reliance on the contents of >>> >> >>> this information is strictly >>> >> prohibited. Although Flipkart has >>> >> >>> taken reasonable precautions to ensure no >>> >> viruses are present in >>> >> >>> this email, the company cannot accept >>> >> responsibility for any loss >>> >> >>> or damage arising from the use of this >>> >> email or attachments >>> >> >> >>> >> >>> >> >>> > >>> >>> > > > ------------------------------------------------------------------------------------------------------------------------------------------ > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you have received this email in error please notify the system manager. > This message contains confidential information and is intended only for the > individual named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and > delete this e-mail from your system. If you are not the intended recipient > you are notified that disclosing, copying, distributing or taking any > action in reliance on the contents of this information is strictly > prohibited. Although Flipkart has taken reasonable precautions to ensure no > viruses are present in this email, the company cannot accept responsibility > for any loss or damage arising from the use of this email or attachments >