Thanks Piyush. Do we have any ETA for this to be sent for review? Dimple
On Wed, Jan 13, 2016 at 6:23 PM, Piyush Mukati (Data Platform) < piyush.muk...@flipkart.com> wrote: > Hi, > The code is available here > > https://github.com/piyush-mukati/incubator-zeppelin/tree/parallel_scheduler_support_spark > > > some testing part is left. > > On Wed, Jan 13, 2016 at 11:47 PM, Dimp Bhat <dimp201...@gmail.com> wrote: > > > Hi Pranav, > > When do you plan to send out the code for running notebooks in parallel ? > > > > Dimple > > > > On Tue, Nov 17, 2015 at 3:27 AM, Pranav Kumar Agarwal < > praag...@gmail.com> > > wrote: > > > >> Hi Rohit, > >> > >> We implemented the proposal and are able to run Zeppelin as a hosted > >> service inside my organization. Our internal forked version has > pluggable > >> authentication and type ahead. > >> > >> I need to get the work ported to the latest and chop out the auth > changes > >> portion. We'll be submitting it soon. > >> > >> We'll target to get this out for review by 11/26. > >> > >> Regards, > >> -Pranav. > >> > >> > >> > >> On 17/11/15 4:34 am, Rohit Agarwal wrote: > >> > >> Hey Pranav, > >> > >> Did you make any progress on this? > >> > >> -- > >> Rohit > >> > >> On Sunday, August 16, 2015, moon soo Lee <m...@apache.org> wrote: > >> > >>> Pranav, proposal looks awesome! > >>> > >>> I have a question and feedback, > >>> > >>> You said you tested 1,2 and 3. To create SparkIMain per notebook, you > >>> need information of notebook id. Did you get it from > InterpreterContext? > >>> Then how did you handle destroying of SparkIMain (when notebook is > >>> deleting)? > >>> As far as i know, interpreter not able to get information of notebook > >>> deletion. > >>> > >>> >> 4. Build a queue inside interpreter to allow only one paragraph > >>> execution > >>> >> at a time per notebook. > >>> > >>> One downside of this approach is, GUI will display RUNNING instead of > >>> PENDING for jobs inside of queue in interpreter. > >>> > >>> Best, > >>> moon > >>> > >>> On Sun, Aug 16, 2015 at 12:55 AM IT CTO <goi....@gmail.com> wrote: > >>> > >>>> +1 for "to re-factor the Zeppelin architecture so that it can handle > >>>> multi-tenancy easily" > >>>> > >>>> On Sun, Aug 16, 2015 at 9:47 AM DuyHai Doan <doanduy...@gmail.com> > >>>> wrote: > >>>> > >>>>> Agree with Joel, we may think to re-factor the Zeppelin architecture > >>>>> so that it can handle multi-tenancy easily. The technical solution > proposed > >>>>> by Pranav is great but it only applies to Spark. Right now, each > >>>>> interpreter has to manage multi-tenancy its own way. Ultimately > Zeppelin > >>>>> can propose a multi-tenancy contract/info (like UserContext, similar > to > >>>>> InterpreterContext) so that each interpreter can choose to use or > not. > >>>>> > >>>>> > >>>>> On Sun, Aug 16, 2015 at 3:09 AM, Joel Zambrano <djo...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> I think while the idea of running multiple notes simultaneously is > >>>>>> great. It is really dancing around the lack of true multi user > support in > >>>>>> Zeppelin. While the proposed solution would work if the applications > >>>>>> resources are those of the whole cluster, if the app is limited > (say they > >>>>>> are 8 cores of 16, with some distribution in memory) then > potentially your > >>>>>> note can hog all the resources and the scheduler will have to > throttle all > >>>>>> other executions leaving you exactly where you are now. > >>>>>> While I think the solution is a good one, maybe this question makes > >>>>>> us think in adding true multiuser support. > >>>>>> Where we isolate resources (cluster and the notebooks themselves), > >>>>>> have separate login/identity and (I don't know if it's possible) > share the > >>>>>> same context. > >>>>>> > >>>>>> Thanks, > >>>>>> Joel > >>>>>> > >>>>>> > On Aug 15, 2015, at 1:58 PM, Rohit Agarwal <mindpri...@gmail.com> > >>>>>> wrote: > >>>>>> > > >>>>>> > If the problem is that multiple users have to wait for each other > >>>>>> while > >>>>>> > using Zeppelin, the solution already exists: they can create a new > >>>>>> > interpreter by going to the interpreter page and attach it to > their > >>>>>> > notebook - then they don't have to wait for others to submit their > >>>>>> job. > >>>>>> > > >>>>>> > But I agree, having paragraphs from one note wait for paragraphs > >>>>>> from other > >>>>>> > notes is a confusing default. We can get around that in two ways: > >>>>>> > > >>>>>> > 1. Create a new interpreter for each note and attach that > >>>>>> interpreter to > >>>>>> > that note. This approach would require the least amount of code > >>>>>> changes but > >>>>>> > is resource heavy and doesn't let you share Spark Context > between > >>>>>> different > >>>>>> > notes. > >>>>>> > 2. If we want to share the Spark Context between different > notes, > >>>>>> we can > >>>>>> > submit jobs from different notes into different fairscheduler > >>>>>> pools ( > >>>>>> > > >>>>>> > https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application > >>>>>> ). > >>>>>> > This can be done by submitting jobs from different notes in > >>>>>> different > >>>>>> > threads. This will make sure that jobs from one note are run > >>>>>> sequentially > >>>>>> > but jobs from different notes will be able to run in parallel. > >>>>>> > > >>>>>> > Neither of these options require any change in the Spark code. > >>>>>> > > >>>>>> > -- > >>>>>> > Thanks & Regards > >>>>>> > Rohit Agarwal > >>>>>> > https://www.linkedin.com/in/rohitagarwal003 > >>>>>> > > >>>>>> > On Sat, Aug 15, 2015 at 12:01 PM, Pranav Kumar Agarwal < > >>>>>> praag...@gmail.com> > >>>>>> > wrote: > >>>>>> > > >>>>>> >> If someone can share about the idea of sharing single > SparkContext > >>>>>> through > >>>>>> >>> multiple SparkILoop safely, it'll be really helpful. > >>>>>> >> Here is a proposal: > >>>>>> >> 1. In Spark code, change SparkIMain.scala to allow setting the > >>>>>> virtual > >>>>>> >> directory. While creating new instances of SparkIMain per > notebook > >>>>>> from > >>>>>> >> zeppelin spark interpreter set all the instances of SparkIMain to > >>>>>> the same > >>>>>> >> virtual directory. > >>>>>> >> 2. Start HTTP server on that virtual directory and set this HTTP > >>>>>> server in > >>>>>> >> Spark Context using classserverUri method > >>>>>> >> 3. Scala generated code has a notion of packages. The default > >>>>>> package name > >>>>>> >> is "line$<linenumber>". Package name can be controlled using > System > >>>>>> >> Property scala.repl.name.line. Setting this property to "notebook > >>>>>> id" > >>>>>> >> ensures that code generated by individual instances of SparkIMain > >>>>>> is > >>>>>> >> isolated from other instances of SparkIMain > >>>>>> >> 4. Build a queue inside interpreter to allow only one paragraph > >>>>>> execution > >>>>>> >> at a time per notebook. > >>>>>> >> > >>>>>> >> I have tested 1, 2, and 3 and it seems to provide isolation > across > >>>>>> >> classnames. I'll work towards submitting a formal patch soon - Is > >>>>>> there any > >>>>>> >> Jira already for the same that I can uptake? Also I need to > >>>>>> understand: > >>>>>> >> 1. How does Zeppelin uptake Spark fixes? OR do I need to first > work > >>>>>> >> towards getting Spark changes merged in Apache Spark github? > >>>>>> >> > >>>>>> >> Any suggestions on comments on the proposal are highly welcome. > >>>>>> >> > >>>>>> >> Regards, > >>>>>> >> -Pranav. > >>>>>> >> > >>>>>> >>> On 10/08/15 11:36 pm, moon soo Lee wrote: > >>>>>> >>> > >>>>>> >>> Hi piyush, > >>>>>> >>> > >>>>>> >>> Separate instance of SparkILoop SparkIMain for each notebook > while > >>>>>> >>> sharing the SparkContext sounds great. > >>>>>> >>> > >>>>>> >>> Actually, i tried to do it, found problem that multiple > >>>>>> SparkILoop could > >>>>>> >>> generates the same class name, and spark executor confuses > >>>>>> classname since > >>>>>> >>> they're reading classes from single SparkContext. > >>>>>> >>> > >>>>>> >>> If someone can share about the idea of sharing single > SparkContext > >>>>>> >>> through multiple SparkILoop safely, it'll be really helpful. > >>>>>> >>> > >>>>>> >>> Thanks, > >>>>>> >>> moon > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> On Mon, Aug 10, 2015 at 1:21 AM Piyush Mukati (Data Platform) < > >>>>>> >>> piyush.muk...@flipkart.com <mailto:piyush.muk...@flipkart.com>> > >>>>>> wrote: > >>>>>> >>> > >>>>>> >>> Hi Moon, > >>>>>> >>> Any suggestion on it, have to wait lot when multiple people > >>>>>> working > >>>>>> >>> with spark. > >>>>>> >>> Can we create separate instance of SparkILoop SparkIMain > and > >>>>>> >>> printstrems for each notebook while sharing theSparkContext > >>>>>> >>> ZeppelinContext SQLContext and DependencyResolver and then use > >>>>>> parallel > >>>>>> >>> scheduler ? > >>>>>> >>> thanks > >>>>>> >>> > >>>>>> >>> -piyush > >>>>>> >>> > >>>>>> >>> Hi Moon, > >>>>>> >>> > >>>>>> >>> How about tracking dedicated SparkContext for a notebook in > >>>>>> Spark's > >>>>>> >>> remote interpreter - this will allow multiple users to run > >>>>>> their spark > >>>>>> >>> paragraphs in parallel. Also, within a notebook only one > >>>>>> paragraph is > >>>>>> >>> executed at a time. > >>>>>> >>> > >>>>>> >>> Regards, > >>>>>> >>> -Pranav. > >>>>>> >>> > >>>>>> >>> > >>>>>> >>>> On 15/07/15 7:15 pm, moon soo Lee wrote: > >>>>>> >>>> Hi, > >>>>>> >>>> > >>>>>> >>>> Thanks for asking question. > >>>>>> >>>> > >>>>>> >>>> The reason is simply because of it is running code statements. > >>>>>> The > >>>>>> >>>> statements can have order and dependency. Imagine i have two > >>>>>> >>> paragraphs > >>>>>> >>>> > >>>>>> >>>> %spark > >>>>>> >>>> val a = 1 > >>>>>> >>>> > >>>>>> >>>> %spark > >>>>>> >>>> print(a) > >>>>>> >>>> > >>>>>> >>>> If they're not running one by one, that means they possibly > runs > >>>>>> in > >>>>>> >>>> random order and the output will be always different. Either > '1' > >>>>>> or > >>>>>> >>>> 'val a can not found'. > >>>>>> >>>> > >>>>>> >>>> This is the reason why. But if there are nice idea to handle > this > >>>>>> >>>> problem i agree using parallel scheduler would help a lot. > >>>>>> >>>> > >>>>>> >>>> Thanks, > >>>>>> >>>> moon > >>>>>> >>>> On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng > >>>>>> >>>> <linxizeng0...@gmail.com <mailto:linxizeng0...@gmail.com> > >>>>>> >>> <mailto:linxizeng0...@gmail.com <mailto: > linxizeng0...@gmail.com > >>>>>> >>> > >>>>>> >>> wrote: > >>>>>> >>>> > >>>>>> >>>> any one who have the same question with me? or this is not a > >>>>>> >>> question? > >>>>>> >>>> > >>>>>> >>>> 2015-07-14 11:47 GMT+08:00 linxi zeng < > >>>>>> linxizeng0...@gmail.com > >>>>>> >>> <mailto:linxizeng0...@gmail.com> > >>>>>> >>>> <mailto:linxizeng0...@gmail.com <mailto: > >>>>>> >>> linxizeng0...@gmail.com>>>: > >>>>>> >>>> > >>>>>> >>>> hi, Moon: > >>>>>> >>>> I notice that the getScheduler function in the > >>>>>> >>>> SparkInterpreter.java return a FIFOScheduler which makes > >>>>>> the > >>>>>> >>>> spark interpreter run spark job one by one. It's not a > >>>>>> good > >>>>>> >>>> experience when couple of users do some work on zeppelin > >>>>>> at > >>>>>> >>>> the same time, because they have to wait for each other. > >>>>>> >>>> And at the same time, SparkSqlInterpreter can chose what > >>>>>> >>>> scheduler to use by "zeppelin.spark.concurrentSQL". > >>>>>> >>>> My question is, what kind of consideration do you based > on > >>>>>> >>> to > >>>>>> >>>> make such a decision? > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> > ------------------------------------------------------------------------------------------------------------------------------------------ > >>>>>> >>> > >>>>>> >>> This email and any files transmitted with it are confidential > >>>>>> and > >>>>>> >>> intended solely for the use of the individual or entity to > whom > >>>>>> >>> they are addressed. If you have received this email in error > >>>>>> >>> please notify the system manager. This message contains > >>>>>> >>> confidential information and is intended only for the > >>>>>> individual > >>>>>> >>> named. If you are not the named addressee you should not > >>>>>> >>> disseminate, distribute or copy this e-mail. Please notify > the > >>>>>> >>> sender immediately by e-mail if you have received this e-mail > >>>>>> by > >>>>>> >>> mistake and delete this e-mail from your system. If you are > not > >>>>>> >>> the intended recipient you are notified that disclosing, > >>>>>> copying, > >>>>>> >>> distributing or taking any action in reliance on the contents > >>>>>> of > >>>>>> >>> this information is strictly prohibited. Although Flipkart > has > >>>>>> >>> taken reasonable precautions to ensure no viruses are present > >>>>>> in > >>>>>> >>> this email, the company cannot accept responsibility for any > >>>>>> loss > >>>>>> >>> or damage arising from the use of this email or attachments > >>>>>> >> > >>>>>> > >>>>> > >>>>> > >> > >> -- > >> Sent from a mobile device. Excuse my thumbs. > >> > >> > >> > > >